Job Role : Senior DevOps / SRE Engineer
Location : work model : 12 days from Infosys office every month and rest of the days work from home
Mode of Interview : In person in Infosys Office.
Job Description :
We are looking for a highly skilled and motivated Senior DevOps / Site Reliability Engineer to join our growing infrastructure and platform engineering team. The ideal candidate will be responsible for designing, implementing, and managing scalable, secure, and highly available cloud-native infrastructure across AWS or GCP platforms. This role will also focus on automation, observability, container orchestration, and incident management to ensure production reliability and performance.
We are seeking a highly experienced and driven Senior DevOps / SRE Engineer to join our engineering team. This role will focus on enabling infrastructure automation, ensuring high system reliability, and driving performance optimization across cloud-native environments. You will play a key role in managing production systems and implementing DevOps best practices using modern cloud and container orchestration technologies.
Key Responsibilities :
- Design, implement, and manage scalable, secure, and resilient cloud infrastructure using AWS or GCP.
- Deploy and maintain containerized applications using Docker and Kubernetes.
- Automate infrastructure provisioning and configuration management using tools like Terraform, CloudFormation, or Ansible.
- Develop and manage CI / CD pipelines for continuous integration, delivery, and deployment using tools such as Jenkins, GitLab CI, or similar.
- Monitor system health, application performance, and availability using tools like Grafana, Prometheus, Dynatrace, etc.
- Troubleshoot production issues, perform root cause analysis, and implement long-term solutions to improve system reliability.
- Write custom scripts and automation tools using Python, Java, or other relevant languages.
- Collaborate with cross-functional teams including developers, architects, QA, and security to integrate DevOps practices into the SDLC.
- Implement SRE practices, define and monitor SLIs / SLOs, and support incident response and post-mortem processes.
- Ensure cloud environments meet security, compliance, and governance standards.
Required Skills & Qualifications :
5+ years of hands-on experience in DevOps or Site Reliability Engineering roles.Strong expertise in public cloud platforms AWS and / or GCP.Proven experience with Docker, Kubernetes, and managing containerized microservices at scale.Proficient in at least one programming language : Java or Python.Solid understanding of Linux systems, networking, and system security.Experience with monitoring, logging, and alerting tools such as Grafana, Prometheus, Dynatrace, etc.Hands-on experience with CI / CD pipelines and tools like Jenkins or GitLab CI.Familiarity with version control systems (Git) and GitOps Skills :Experience with incident management and SRE best practices.Familiarity with log aggregation and analysis tools such as ELK, Fluentd, or Splunk.Exposure to database operations (PostgreSQL, MySQL, or NoSQL systems).Experience working in Agile / Scrum teams.(ref : hirist.tech)