We are seeking a Site Reliability Engineer (SRE) to join our team in India. The ideal candidate will have 5-9 years of experience in managing production systems, ensuring reliability and performance while collaborating with cross-functional teams to drive software engineering best practices.
- Design and implement scalable and reliable systems to ensure high availability and performance.
- Monitor system performance and troubleshoot issues to maintain optimal operation.
- Collaborate with development teams to ensure that software is reliable and meets performance requirements.
- Automate operational processes to improve efficiency and reduce manual intervention.
- Participate in on-call rotations to provide support for production incidents.
Skills and Qualifications
Bachelor's degree in Computer Science, Engineering, or related field.Strong experience with cloud platforms such as AWS, Google Cloud, or Azure.Proficient in scripting languages such as Python, Bash, or Ruby.Experience with containerization technologies like Docker and orchestration tools like Kubernetes.Familiarity with CI / CD pipelines and tools like Jenkins or GitLab CI.Solid understanding of networking concepts and protocols (TCP / IP, DNS, HTTP, etc.).Experience with monitoring and logging tools such as Prometheus, Grafana, ELK stack, or similar.Skills Required
Kubernetes, Prometheus, Grafana, Terraform, Docker, Python, Linux