Description
We are seeking an experienced SRE Lead to join our team in India. The ideal candidate will be responsible for leading a team of SREs to ensure the reliability, availability, and performance of our services. This role requires a strong technical background and leadership skills to drive operational excellence.
Responsibilities
- Lead and mentor a team of Site Reliability Engineers (SREs) to ensure high availability and performance of services.
- Design and implement scalable and reliable systems and infrastructure.
- Monitor system performance and troubleshoot issues proactively.
- Develop and maintain automation tools for deployment, monitoring, and incident response.
- Collaborate with development teams to improve application reliability and performance.
- Establish and enforce best practices for SRE processes and methodologies.
- Participate in on-call rotation and incident management.
Skills and Qualifications
6-9 years of experience in Site Reliability Engineering or related field.Strong proficiency in cloud computing platforms such as AWS, Azure, or GCP.Experience with containerization and orchestration technologies like Docker and Kubernetes.Solid understanding of networking concepts and protocols.Proficient in scripting languages such as Python, Bash, or Ruby.Experience with monitoring and logging tools like Prometheus, Grafana, ELK stack, or similar.Knowledge of CI / CD pipelines and related tools (e.g., Jenkins, GitLab CI).Excellent problem-solving skills and the ability to work under pressure.Skills Required
Site Reliability Engineering, Cloud Infrastructure, Automation Tools, Performance Tuning