Description
We are seeking a skilled Site Reliability Engineer (SRE) to join our team in India. The ideal candidate will be responsible for ensuring the reliability, performance, and availability of our production systems. You will work closely with development teams to design scalable systems and implement automation to improve operational efficiency.
Responsibilities
- Maintain and improve the performance, reliability, and availability of our production systems.
- Implement and manage monitoring and alerting systems to proactively identify and resolve issues.
- Collaborate with development teams to design and implement scalable and efficient systems.
- Automate repetitive tasks to improve operational efficiency.
- Conduct postmortems and root cause analysis for incidents to improve system reliability.
- Participate in on-call rotations to provide support for production systems.
- Create and maintain documentation for systems and processes.
Skills and Qualifications
Bachelor's degree in Computer Science, Engineering, or related field.2-6 years of experience in Site Reliability Engineering, DevOps, or a related field.Strong understanding of cloud platforms such as AWS, Azure, or Google Cloud.Proficiency in scripting languages such as Python, Bash, or Ruby.Experience with containerization technologies like Docker and orchestration tools like Kubernetes.Familiarity with CI / CD pipelines and tools such as Jenkins, GitLab CI, or CircleCI.Solid understanding of networking concepts and protocols.Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or similar.Strong problem-solving skills and the ability to work under pressure.Skills Required
Kubernetes, Docker, Python, Terraform, Monitoring, Networking, Linux, Scripting, Incident Management, Cloud Services