Site Reliability Engineer - IAC TerraformHashone Careers • India

Site Reliability Engineer - IAC Terraform

Hashone Careers • India

30+ days ago

Job description

Job Summary :

The Site Reliability Engineer specializing in Infrastructure as Code (IaC) and Terraform is responsible for designing, building, automating, and maintaining cloud infrastructure using modern DevOps and SRE practices.

The role ensures system reliability, scalability, high availability, and operational excellence across production environments.

The engineer will focus heavily on automation, monitoring, CI / CD, incident response, and performance engineering while working closely with developers and platform teams.

Key Responsibilities :

Design, create, and maintain scalable cloud infrastructure using Terraform.
Develop reusable Terraform modules, pipelines, and automation frameworks.
Implement infrastructure provisioning, updates, and rollback workflows through version-controlled IaC.
Ensure compliance with infrastructure standards, security policies, and cloud governance frameworks.
Build and manage cloud infrastructure on AWS / Azure / GCP (customize as needed).
Implement scalable architecture patterns (auto-scaling, load balancing, container orchestration).
Optimize resource utilization and cost-efficiency.
Manage VPCs, subnets, security groups, firewalls, IAM, and other cloud services.
Ensure reliability, resiliency, scalability, and performance of production systems.
Implement chaos engineering practices, fault injection, and resiliency tests.
Conduct root cause analysis (RCA) and develop permanent fixes for system failures.
Define and maintain SLOs, SLIs, SLAs, and error budgets.
Build and enhance CI / CD pipelines using GitHub Actions, GitLab CI, Jenkins, Azure DevOps, or similar.
Automate testing, security checks, deployments, and environment provisioning.
Implement GitOps workflows with tools like ArgoCD or Flux (optional).
Deploy and manage containerized applications using Docker and Kubernetes.
Manage clusters (EKS, AKS, GKE, or self-hosted Kubernetes).
Implement Service Mesh (Istio / Linkerd) is an advantage.
Manage Helm charts, Kustomize, and Kubernetes controllers.
Implement and maintain monitoring solutions (Prometheus, Grafana, Datadog, New Relic, CloudWatch, etc.
Set up centralized logging using ELK / EFK, Cloud Logging, or Splunk.
Monitor system health, performance metrics, and application behavior.
Build alerting strategies and auto-remediation systems.
Implement security best practices across infrastructure and deployments.
Manage secrets, encryption, access control, and network security.
Use Terraform Cloud / Enterprise, Sentinel policies, and linting tools for compliance enforcement.
Participate in security audits, pen tests, and cloud hardening initiatives.
Participate in on-call rotations and respond to production incidents.
Troubleshoot and resolve system outages, latency issues, and performance problems.
Develop runbooks, playbooks, and post-incident reports.
Automate repetitive operational tasks.
Work collaboratively with developers, QA, product teams, and other SRE members.
Assist teams in adopting cloud-native, scalable, and automated practices.
Maintain up-to-date system documentation, diagrams, and operational SOPs.
Provide technical guidance and mentorship to junior engineers.

Required Skills & Competencies :

Technical Skills :

Strong experience in Terraform and IaC best practices.

Hands-on expertise with major cloud providers (AWS / Azure / GCP).

Solid knowledge of Linux administration, networking, and distributed systems.

Strong scripting skills (Python, Bash, Shell).

Excellent understanding of Kubernetes, Docker, and container orchestration.

Strong CI / CD experience.

Solid experience with monitoring tools (Grafana, Prometheus, Datadog, ELK).

Knowledge of GitOps, configuration management (Ansible), or cloud-native patterns (preferred).

Understanding of SRE concepts (SLIs, SLOs, error budgets, toil reduction)

(ref : hirist.tech)

Create a job alert for this search

Site Reliability Engineer • India

Related jobs

Site Reliability Engineer Ii

RecRoots • Republic Of India, IN

Key Job Responsibilities and Duties : .The core premise for the SRE lies in treating operational issues as a software problem. We code our way out of problems where operations are concerned addressing...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Yum! India Global Services Private Limited • Patna, Republic Of India, IN

Design, test, implement, deploy, and support continuous integration pipelines that build and deploy to cloud-based environments (development, stage / testing, production). In this role, you will help ...Show more

Last updated: 13 days ago • Promoted

Site Reliability Engineer

Tata Consultancy Services • Republic Of India, IN

Kubernetes (Any cloud) + PostgresSQL, SQL(Must).Linux (Optional), Java (Optional), Kubernetes (CLI), Prior Production support experience, Release Management, Prior Deployment experience,.Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Sails Software Inc • India

We are looking for an experienced and driven Senior Site Reliability Engineer (SRE) to architect, implement, and maintain robust cloud infrastructure. This role demands a deep understanding of AWS, ...Show more

Last updated: 2 days ago • Promoted

Aws Site Reliability Engineer

HTC Global Services • Chennai, Republic Of India, IN

Troy, Michigan, is a leading global Information Technology solution and BPO provider.HTC assists clients across multiple industry verticals, offering turnkey project lifecycle in, e-business, data ...Show more

Last updated: 21 days ago • Promoted

Senior Site Reliability Engineer

o9 Solutions, Inc. • India

Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show more

Last updated: 2 days ago • Promoted

Site Reliability Engineer

HRhelpdesk • Indore, Republic Of India, IN

Company is a rapidly growing, private equity backed SaaS product company and provides cloud-based solutions.As a Site Reliability Engineer (SRE), you will be responsible for building and maintainin...Show more

Last updated: 11 days ago • Promoted

AWS Site Reliability Engineer

HTC Global Services • India

Last updated: 2 days ago • Promoted

Site Reliability Engineer

WhiteLotus Talent Partners • India

L0 and L1 Site Reliability Engineer (SRE) Support.Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by. In this role, you will focu...Show more

Last updated: 2 days ago • Promoted

Site Reliability Engineer

Synamedia • India

At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the age of infinite ...Show more

Last updated: 2 days ago • Promoted

Site Reliability Engineer

Delta Electronics India • India

Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to balance reliability with feature velocity and ensure optimal system availability.Respond to...Show more

Last updated: 2 days ago • Promoted

Site Reliability Engineer

Grootan Technologies • Chennai, Republic Of India, IN

Site Reliability Engineer (SRE).In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications.You will leverage your e...Show more

Last updated: 11 days ago • Promoted

Site Reliability Engineer

Relevance Lab • India

The ideal candidate will have a strong background in infrastructure management and a deep understanding of blockchain ecosystems. You will be responsible for designing, implementing, and maintaining...Show more

Last updated: 2 days ago • Promoted

Site Reliability Engineer

CareStack - Dental Practice Management • India

Manage and maintain day-to-day BAU operations, including monitoring system.Build infrastructure as code (IAC) patterns that meet security and engineering. Build CI / CD pipelines using Octopus, GitLab...Show more

Last updated: 1 day ago • Promoted