Talent.com
Site Reliability Engineer - IAC Terraform
Site Reliability Engineer - IAC TerraformHashone Careers • India
Site Reliability Engineer - IAC Terraform

Site Reliability Engineer - IAC Terraform

Hashone Careers • India
30+ days ago
Job description

Job Summary :

The Site Reliability Engineer specializing in Infrastructure as Code (IaC) and Terraform is responsible for designing, building, automating, and maintaining cloud infrastructure using modern DevOps and SRE practices.

The role ensures system reliability, scalability, high availability, and operational excellence across production environments.

The engineer will focus heavily on automation, monitoring, CI / CD, incident response, and performance engineering while working closely with developers and platform teams.

Key Responsibilities :

  • Design, create, and maintain scalable cloud infrastructure using Terraform.
  • Develop reusable Terraform modules, pipelines, and automation frameworks.
  • Implement infrastructure provisioning, updates, and rollback workflows through version-controlled IaC.
  • Ensure compliance with infrastructure standards, security policies, and cloud governance frameworks.
  • Build and manage cloud infrastructure on AWS / Azure / GCP (customize as needed).
  • Implement scalable architecture patterns (auto-scaling, load balancing, container orchestration).
  • Optimize resource utilization and cost-efficiency.
  • Manage VPCs, subnets, security groups, firewalls, IAM, and other cloud services.
  • Ensure reliability, resiliency, scalability, and performance of production systems.
  • Implement chaos engineering practices, fault injection, and resiliency tests.
  • Conduct root cause analysis (RCA) and develop permanent fixes for system failures.
  • Define and maintain SLOs, SLIs, SLAs, and error budgets.
  • Build and enhance CI / CD pipelines using GitHub Actions, GitLab CI, Jenkins, Azure DevOps, or similar.
  • Automate testing, security checks, deployments, and environment provisioning.
  • Implement GitOps workflows with tools like ArgoCD or Flux (optional).
  • Deploy and manage containerized applications using Docker and Kubernetes.
  • Manage clusters (EKS, AKS, GKE, or self-hosted Kubernetes).
  • Implement Service Mesh (Istio / Linkerd) is an advantage.
  • Manage Helm charts, Kustomize, and Kubernetes controllers.
  • Implement and maintain monitoring solutions (Prometheus, Grafana, Datadog, New Relic, CloudWatch, etc.
  • Set up centralized logging using ELK / EFK, Cloud Logging, or Splunk.
  • Monitor system health, performance metrics, and application behavior.
  • Build alerting strategies and auto-remediation systems.
  • Implement security best practices across infrastructure and deployments.
  • Manage secrets, encryption, access control, and network security.
  • Use Terraform Cloud / Enterprise, Sentinel policies, and linting tools for compliance enforcement.
  • Participate in security audits, pen tests, and cloud hardening initiatives.
  • Participate in on-call rotations and respond to production incidents.
  • Troubleshoot and resolve system outages, latency issues, and performance problems.
  • Develop runbooks, playbooks, and post-incident reports.
  • Automate repetitive operational tasks.
  • Work collaboratively with developers, QA, product teams, and other SRE members.
  • Assist teams in adopting cloud-native, scalable, and automated practices.
  • Maintain up-to-date system documentation, diagrams, and operational SOPs.
  • Provide technical guidance and mentorship to junior engineers.

Required Skills & Competencies :

Technical Skills :

  • Strong experience in Terraform and IaC best practices.
  • Hands-on expertise with major cloud providers (AWS / Azure / GCP).
  • Solid knowledge of Linux administration, networking, and distributed systems.
  • Strong scripting skills (Python, Bash, Shell).
  • Excellent understanding of Kubernetes, Docker, and container orchestration.
  • Strong CI / CD experience.
  • Solid experience with monitoring tools (Grafana, Prometheus, Datadog, ELK).
  • Knowledge of GitOps, configuration management (Ansible), or cloud-native patterns (preferred).
  • Understanding of SRE concepts (SLIs, SLOs, error budgets, toil reduction)
  • (ref : hirist.tech)

    Create a job alert for this search

    Site Reliability Engineer • India

    Related jobs
    Site Reliability Engineer Ii

    Site Reliability Engineer Ii

    RecRoots • Republic Of India, IN
    Key Job Responsibilities and Duties : .The core premise for the SRE lies in treating operational issues as a software problem. We code our way out of problems where operations are concerned addressing...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Yum! India Global Services Private Limited • Patna, Republic Of India, IN
    Design, test, implement, deploy, and support continuous integration pipelines that build and deploy to cloud-based environments (development, stage / testing, production). In this role, you will help ...Show more
    Last updated: 13 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Tata Consultancy Services • Republic Of India, IN
    Kubernetes (Any cloud) + PostgresSQL, SQL(Must).Linux (Optional), Java (Optional), Kubernetes (CLI), Prior Production support experience, Release Management, Prior Deployment experience,.Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Sails Software Inc • India
    We are looking for an experienced and driven Senior Site Reliability Engineer (SRE) to architect, implement, and maintain robust cloud infrastructure. This role demands a deep understanding of AWS, ...Show more
    Last updated: 2 days ago • Promoted
    Aws Site Reliability Engineer

    Aws Site Reliability Engineer

    HTC Global Services • Chennai, Republic Of India, IN
    Troy, Michigan, is a leading global Information Technology solution and BPO provider.HTC assists clients across multiple industry verticals, offering turnkey project lifecycle in, e-business, data ...Show more
    Last updated: 21 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    o9 Solutions, Inc. • India
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    HRhelpdesk • Indore, Republic Of India, IN
    Company is a rapidly growing, private equity backed SaaS product company and provides cloud-based solutions.As a Site Reliability Engineer (SRE), you will be responsible for building and maintainin...Show more
    Last updated: 11 days ago • Promoted
    AWS Site Reliability Engineer

    AWS Site Reliability Engineer

    HTC Global Services • India
    Troy, Michigan, is a leading global Information Technology solution and BPO provider.HTC assists clients across multiple industry verticals, offering turnkey project lifecycle in, e-business, data ...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    WhiteLotus Talent Partners • India
    L0 and L1 Site Reliability Engineer (SRE) Support.Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by. In this role, you will focu...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Synamedia • India
    At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the age of infinite ...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Delta Electronics India • India
    Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to balance reliability with feature velocity and ensure optimal system availability.Respond to...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Grootan Technologies • Chennai, Republic Of India, IN
    Site Reliability Engineer (SRE).In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications.You will leverage your e...Show more
    Last updated: 11 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Relevance Lab • India
    The ideal candidate will have a strong background in infrastructure management and a deep understanding of blockchain ecosystems. You will be responsible for designing, implementing, and maintaining...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CareStack - Dental Practice Management • India
    Manage and maintain day-to-day BAU operations, including monitoring system.Build infrastructure as code (IAC) patterns that meet security and engineering. Build CI / CD pipelines using Octopus, GitLab...Show more
    Last updated: 1 day ago • Promoted