Description :
We are seeking a Site Reliability Engineer with expertise in OpenTelemetry to join our team in India. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our systems while implementing best practices for observability and monitoring.
Responsibilities :
- Design, implement, and maintain reliable systems using OpenTelemetry for observability in cloud environments.
- Monitor and troubleshoot system performance, availability, and security issues.
- Collaborate with development teams to integrate telemetry into applications and services.
- Develop automated solutions for deploying and managing applications in production environments.
- Create and maintain documentation for systems and processes related to Site Reliability Engineering.
- Participate in on-call rotation to respond to incidents and ensure system reliability.
Skills and Qualifications :
5-10 years of experience in Site Reliability Engineering or related fields.Strong knowledge of OpenTelemetry and its implementation for monitoring and observability.Proficiency in cloud platforms such as AWS, Azure, or Google Cloud.Experience with containerization technologies like Docker and orchestration tools such as Kubernetes.Familiarity with programming languages such as Python, Go, or Java.Understanding of CI / CD pipelines and DevOps practices.Excellent problem-solving skills and the ability to work under pressure.Strong communication skills and ability to collaborate with cross-functional teams.(ref : hirist.tech)