Talent.com
This job offer is not available in your country.
Site Reliability Engineer - Observability Services

Site Reliability Engineer - Observability Services

TeamWare SolutionsBangalore
30+ days ago
Job description

Role Summary :

We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong focus on observability. The ideal candidate will have 5-8 years of experience in implementing and managing monitoring, logging, and alerting systems. This role requires expertise in the Kubernetes stack, as well as a solid foundation in coding and Infrastructure as Code to ensure the reliability and health of our systems.

Key Responsibilities :

  • Observability Implementation : Design and implement comprehensive observability solutions, including monitoring, logging, and alerting.
  • Kubernetes Stack Management : Work extensively with the Kubernetes stack and related tools such as Prometheus, Loki, Grafana, and Alert Manager to ensure system performance and reliability.
  • Coding & Automation : Apply proficiency in Python & Go to solve complex problems, automate tasks, and contribute to the development of tools and systems.
  • Infrastructure & CI / CD : Utilize Infrastructure as Code and manage CI / CD pipelines to ensure continuous and reliable deployments.
  • Troubleshooting : Apply strong troubleshooting and problem-solving skills to diagnose and resolve issues efficiently and proactively.

Required Skills :

  • Observability : Expertise in all aspects of observability, including Monitoring, Logging, and Alerting.
  • Kubernetes Stack : Deep knowledge and hands-on experience with Prometheus, Loki, Grafana, and Alert Manager.
  • Programming : Strong coding skills in Python & Go, sufficient for technical challenges.
  • DevOps : Experience with CI / CD pipelines and Infrastructure as Code (IaC).
  • Problem-Solving : Strong troubleshooting and problem-solving abilities.
  • Cloud : Experience with AWS is mandatory.
  • Nice to Have Skills :

  • Incident Management : Familiarity with PagerDuty.
  • Integrations : Experience with the Zoom Developer Platform.
  • Education & Experience :

    Education : A Bachelor's degree in Computer Science, Information Technology, or a related field is preferred.

    Experience : A minimum of 5-8 years of experience in a Site Reliability or DevOps engineering role, with a focus on observability.

    (ref : hirist.tech)

    Create a job alert for this search

    Site Reliability Engineer • Bangalore