Only applications submitted through the provided link will be taken into consideration.
Your Role at a Glance :
We are hiring a Senior Staff Backend Engineer Site Reliability for our Code Name : SORIN, a global leader building high-scale observability platforms. In this high-impact leadership role, youll architect, scale, and optimize the systems that drive how enterprises monitor their distributed applications. Youll collaborate across teams, mentor engineers, and shape the technical direction of mission-critical backend services in a modern cloud-native environment.
This role is ideal for experienced backend engineers who thrive in distributed systems, care deeply about system performance and observability, and
About the Company (Code Name : SORIN)
Our client is a global enterprise software company focused on simplifying IT management through powerful, secure, and scalable platforms. With a strong commitment to innovation and customer-centricity, they help organizations accelerate digital transformation across observability, incident response, and performance monitoring. Youll join a team of passionate engineers working on critical systems trusted by Fortune 500 companies and growing SaaS-fi rst businesses alike.
Role & responsibilities "
- Collaborate with software engineering teams to defi ne infrastructure requirements, drive best practices in reliability, monitoring, incident response, and automation, ensuring seamless integration and optimal performance of applications and systems
- Lead and mentor a team of SREs, providing technical guidance and support to ensure the ongoing reliability and performance of our systems
- Play a key role in driving the automation, tools, and observability initiatives, assuming ownership of designing and implementing scalable and efficient solutions.
- Leading the response to production incidents, conducting comprehensive learning reviews, driving continuous improvement initiatives, and actively participating in an on-call rotation, fostering a culture of learning, resilience, and ongoing enhancement within our systems.
- Establish and drive operations performance through SLOs
- Demonstrate proficiency in technical skills, exhibit an expert-level understanding of relevant technologies and tools, and use this knowledge to mentor and support team members, helping them improve their skills and succeed in their roles.
Preferred candidate profile :
At least 10 + years of experience designing, building & maintaining SAAS environments8+ years of experience designing, building & maintaining AWS / AZURE infrastructure with Terraform5+ years of experience building and running Kubernetes clustersStrong experience working with data platform infrastructure.Experience with coding in a high-level programming language like Python, Go (Golang). Knowledge of writing shell scripts, SQL queries.Experience with observability (monitoring logging, tracing, metrics)Experience with SQL / NoSQL database technologies.Experience with GitOps and CI / CD processes is a plusExperience with security operations – security policies, infrastructure, key management, setup of encryption at rest and transport.Experience in mentoring and fostering the professional development of team members