About the Role :
We're looking for a highly skilled and experienced Senior DevOps / SRE Engineer to join our team.
In this role, you'll be responsible for building and maintaining the infrastructure that powers our large-scale, high-availability systems.
You'll work on everything from CI / CD pipelines to geo-redundant deployments, ensuring our platform is scalable, reliable, and performant.
This is a critical position for someone who thrives on solving complex, production-grade challenges and has a passion for automation and operational excellence.
Key Responsibilities :
- Design, implement, and maintain CI / CD pipelines for global, multi-region deployments.
- Administer and manage our Kubernetes clusters, including multi-region deployments and scaling strategies to handle high queries per second (QPS).
- Develop and manage Infrastructure as Code (IaC) using tools like Terraform or CloudFormation.
- Manage and optimize our cloud infrastructure on platforms like AWS, GCP, or Azure, with a focus on geo-redundant architecture.
- Proactively monitor, troubleshoot, and resolve issues in large-scale distributed systems.
- Collaborate with development teams to improve application performance, scalability, and reliability.
- Mentor junior team members and provide technical leadership on complex projects.
- Ensure system security, compliance, and best practices are followed.
Required Qualifications :
6 to 10 years of professional experience in a DevOps, SRE, or similar role, with a focus on managing large-scale, high-availability systems.Proven, hands-on expertise in Kubernetes administration, including scaling for high QPS and managing multi-region deployments.Deep experience with IaC tools, specifically Terraform or CloudFormation.Strong background in building and maintaining CI / CD pipelines for complex, multi-region environments.Proficiency with cloud platforms such as AWS, GCP, or Azure and a solid understanding of geo-redundant architecture.Strong knowledge of Linux and expertise in scripting languages like Bash and Python.Extensive experience with troubleshooting and debugging production issues in large-scale distributed systems.Demonstrated experience leading teams or projects and a strong ability to solve challenging technical problems(ref : hirist.tech)