Experience : 5-6 : Bangalore
We are looking for an experienced Senior DevOps / Site Reliability Engineer with 5-6 years of hands-on experience to ensure the robustness, scalability, and security of our infrastructure and services. You will lead efforts in automating deployment pipelines, optimizing system reliability, and driving best practices across development and operations teams.
Key Responsibilities :
- Architect, build, and maintain scalable and resilient infrastructure solutions on cloud platforms (AWS, GCP, Azure).
- Lead the design and implementation of CI / CD pipelines, ensuring fast, reliable, and secure deployments.
- Automate infrastructure provisioning and management using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation.
- Monitor system health, establish SLIs / SLOs, and respond swiftly to incidents with root cause analysis and remediation.
- Collaborate closely with software engineering teams to improve service reliability, scalability, and performance.
- Implement and enhance monitoring, logging, and alerting solutions using Prometheus, Grafana, ELK stack, or similar tools.
- Drive security best practices and compliance across infrastructure and deployment workflows.
- Mentor junior engineers and contribute to the continuous improvement of DevOps processes and tooling.
- Participate in and help manage on-call rotations, incident response, and post-mortem analysis.
Qualifications :
Bachelors degree in Computer Science, Engineering, or related field, or equivalent professional experience.5-6 years of relevant experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles.Strong expertise with cloud providers (AWS, GCP, Azure) and container orchestration platforms like Kubernetes and Docker.Proven experience with CI / CD tools (Jenkins, GitLab CI, CircleCI) and automation scripting (Python, Bash, Go).Hands-on knowledge of Infrastructure as Code (Terraform, Ansible, CloudFormation).Proficient with monitoring and alerting tools (Prometheus, Grafana, ELK stack, Datadog).Solid understanding of networking, security principles, and system architecture.Excellent analytical and problem-solving skills with a proactive approach.Strong communication skills and ability to collaborate effectively across teams.(ref : hirist.tech)