Hiring hybrid Site Reliability Engineers for a fast-growing product company building scalable tech solutions and transforming how businesses run mission-critical operations. Our SaaS platform is designed for high performance, reliability, and automation at scale.Your ImpactAs a Site Reliability Engineer , you’ll play a key role in ensuring reliability, scalability, and automation across our cloud-native SaaS platform. You’ll apply engineering principles to operational challenges, automate manual processes, and improve observability.Key Responsibilities : Ensure system stability, performance, and high availability through proactive monitoring and incident response.Lead and participate in 24x7 on-call rotations , driving effective RCA and blameless post-mortems.Define and enforce SLOs and error budgets , balancing speed and reliability.Automate infrastructure and deployments using Terraform, Helm, and Kubernetes (EKS) .Modernize environments from Docker / Docker Swarm to Kubernetes .Design and maintain CI / CD pipelines for continuous delivery.Build comprehensive observability strategies across metrics, logs, and traces.Leverage tools like Datadog, Grafana, and Prometheus to improve monitoring and system insights.Collaborate with cross-functional teams to ensure seamless customer experience.Your ExperienceWe’d love to hear from you if you have : ✅ Strong hands-on experience with AWS Cloud (IAM, networking, cloud-native services).✅ Proficiency in Kubernetes (EKS) , Terraform , and Helm .✅ Experience with Docker and managing systems on Docker Swarm.✅ Hands-on expertise with Observability platforms – Datadog (preferred), Grafana, Prometheus.✅ Scripting skills in Bash and / or Python for automation and troubleshooting.✅ Solid understanding of REST APIs, systems architecture, and databases .✅ Experience supporting high-availability SaaS products .✅ Strong communication skills to explain complex issues clearly to both technical and non-technical teams.Why Join Us?Work in a fast-paced product environment building scalable cloud-native solutions .Take ownership of critical infrastructure and reliability initiatives .Collaborate with talented teams focused on innovation and customer success.Be part of a culture that values curiosity, ownership, teamwork, and respect .
Site Reliability Engineer • Hyderabad, Telangana, India