We are seeking a Senior Site Reliability Engineering (SRE) Lead to ensure the optimal performance, scalability, and reliability of our applications. This role involves collaborating with various teams, adopting and implementing SRE practices, and mentoring SRE engineers.
Key Responsibilities :
- Lead SRE Adoption : Guide teams on best SRE practices and improve application reliability.
- Monitor System Performance : Ensure system reliability by tracking key user journeys, setting performance targets (SLOs), and addressing risks.
- Automation & Process Improvement : Automate tasks and eliminate repetitive work to increase efficiency.
- Incident Management : Lead efforts during outages, conduct root cause analysis, and develop automated responses.
- Collaboration : Work closely with cross-functional teams to improve application performance and mentor them on SRE principles.
Skills & Experience :
15+ years of experience in SRE and application support.Technical proficiency : Experience with SRE concepts, monitoring tools, and automation (e.g., Python, Bash, Terraform).Problem-solving : Ability to analyze application performance and troubleshoot issues.Mentorship : Guide teams and foster a culture of reliability.Agile experience : Knowledge of agile practices and collaborative teamwork.