Description
About the Role :
We are seeking a proactive and skilled DevOps / SRE Engineer to own the reliability, scalability, and automation of our cloud infrastructure and deployment processes.
You'll be bridging the gap between development and operations, implementing Infrastructure as Code (IaC), and ensuring our services meet stringent availability and performance metrics.
Key Responsibilities
- Design, implement, and manage robust, fully automated CI / CD pipelines (e.g., Jenkins Pipelines, GitLab CI, Azure DevOps) for rapid and reliable software releases.
- Manage and optimize our cloud infrastructure primarily on AWS or GCP using Infrastructure as Code (IaC) tools like Terraform or Ansible.
- Develop, deploy, and manage containerized applications using Docker and orchestrate them efficiently with Kubernetes (EKS / GKE / AKS).
- Implement and maintain comprehensive monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK Stack / Datadog) to proactively identify and resolve production issues.
- Enforce security best practices across the infrastructure, including IAM policies, network security groups, and vulnerability management.
- Participate in an on-call rotation to provide rapid response and resolution to critical production incidents and perform effective Root Cause Analysis (RCA).
Required Technical Skills & Experience (Mandatory)
Experience : 4-8 years in a DevOps, SRE, or Cloud Engineering role.Cloud : Strong hands-on experience with at least one major cloud provider (AWS, Azure, or GCP).Containerization & Orchestration : Expert-level knowledge of Docker and production experience with Kubernetes (K8s).CI / CD : Proven experience building and maintaining automated deployment pipelines using tools like Jenkins, GitLab CI, or Azure DevOps.IaC / Configuration Management : Proficiency with Terraform for infrastructure provisioning and / or Ansible for configuration management.Scripting : Solid expertise in scripting languages (Bash, Python, or Go) for automation tasks and system administration.Operating Systems : Strong understanding of Linux system administration, networking, and security concepts.Monitoring : Experience with setting up and managing monitoring tools such as Prometheus / Grafana or log analysis platforms like the ELK Stack.Preferred Qualifications (Good To Have)
Experience with service mesh technologies (e.g., Istio, Linkerd).Certifications such as AWS Certified DevOps Engineer or Certified Kubernetes Administrator (CKA).Familiarity with database administration tasks (e.g., MySQL, PostgreSQL on RDS / Cloud SQL).Knowledge of site reliability engineering principles like SLOs, SLIs, and Error Budgets(ref : hirist.tech)
Skills Required
Prometheus, Go, Bash, Grafana, Elk Stack, Jenkins, Gcp, Docker, Terraform, Linux, Ansible, Azure, Kubernetes, Python, Azure Devops, Aws