Description
We are seeking a skilled Site Reliability Engineer (SRE) to join our team in India. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our production systems. You will work closely with development teams to build and maintain scalable applications while implementing automation tools to enhance operational efficiency.
Responsibilities
- Monitor and maintain the availability and performance of production systems.
- Implement automation tools and frameworks to optimize system reliability.
- Collaborate with development teams to design and deploy scalable applications.
- Troubleshoot and resolve incidents in a timely manner to minimize downtime.
- Develop and maintain documentation for systems and processes.
- Participate in on-call rotations and respond to alerts and incidents as needed.
- Continuously improve system performance and reliability through proactive monitoring and maintenance.
Skills and Qualifications
Bachelor's degree in Computer Science, Engineering, or a related field.3-8 years of experience in site reliability engineering, DevOps, or a related field.Strong knowledge of Linux / Unix systems and administration.Experience with cloud platforms such as AWS, GCP, or Azure.Proficient in scripting languages like Python, Bash, or Ruby.Familiarity with containerization technologies (Docker, Kubernetes).Understanding of CI / CD pipelines and tools (Jenkins, GitLab CI, etc.).Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack).Strong problem-solving skills and ability to work under pressure.Education
Master in Computer Application (M.C.A), Post Graduate Diploma in Computer Applications (PGDCA), Masters in Technology (M.Tech / M.E), Bachelor Of Computer Application (B.C.A), Bachelor Of Technology (B.Tech / B.E)
Skills Required
Linux Administration, Python Programming, Kubernetes, Cloud Services, Monitoring Tools, Incident Management, Scripting Languages, Networking Concepts, Configuration Management