Description
We are seeking a skilled Site Reliability Engineer (SRE) to join our dynamic team in India. The ideal candidate will have a strong background in managing and improving production systems, ensuring reliability, scalability, and performance.
Responsibilities
- Design, implement, and maintain scalable and reliable systems.
- Monitor system performance and troubleshoot issues.
- Automate operational processes to reduce manual intervention.
- Collaborate with development teams to improve system architecture and deployment processes.
- Participate in on-call rotations to provide support for production systems.
- Perform capacity planning and ensure system scalability.
- Develop and maintain documentation for systems and processes.
Skills and Qualifications
5-7 years of experience in Site Reliability Engineering or related field.Strong knowledge of Linux / Unix systems and shell scripting.Experience with cloud platforms such as AWS, Azure, or Google Cloud.Proficiency in programming languages such as Python, Go, or Java.Familiarity with containerization technologies like Docker and orchestration tools like Kubernetes.Experience with monitoring tools such as Prometheus, Grafana, or Nagios.Strong understanding of networking concepts and protocols.Knowledge of CI / CD pipelines and tools like Jenkins, GitLab CI, or CircleCI.Skills Required
Kubernetes, Docker, Terraform, Prometheus, Grafana, Linux Administration, Cloud Services, Networking, Scripting