Talent.com
Site Reliability Engineer

Site Reliability Engineer

ConfidentialIndia
30+ days ago
Job description

We are looking for a highly skilled AWS Engineer with strong Python development and Chaos Engineering expertise to design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency of critical systems.

Key Responsibilities

Cloud Engineering (AWS) :

  • Architect, implement, and manage secure, scalable, and cost-efficient AWS infrastructure (EC2, Lambda, EKS, S3, RDS, IAM, CloudFront, etc.).
  • Automate infrastructure provisioning and configuration using Terraform / CloudFormation and AWS SDKs.
  • Manage containerized workloads (Docker, Kubernetes, EKS).

Python Development :

  • Build automation scripts, deployment utilities, and infrastructure tooling using Python (Boto3, Flask, FastAPI, etc.) .
  • Develop custom monitoring / alerting integrations with APIs, SDKs, and third-party observability platforms.
  • Implement self-healing and resilience-focused automation scripts.
  • Chaos Engineering & Resiliency :

  • Design and execute chaos experiments (fault injection, latency, outages, resource failures) to validate system resilience.
  • Use tools like Gremlin, Litmus, Chaos Mesh, or AWS Fault Injection Simulator .
  • Partner with SRE and development teams to define SLIs, SLOs, and error budgets .
  • Document learnings from chaos tests and improve incident response & recovery playbooks.
  • DevOps & Observability :

  • Build and maintain CI / CD pipelines for automated deployments (Jenkins, GitHub Actions, GitLab CI, AWS CodePipeline).
  • Integrate observability frameworks (Prometheus, Grafana, ELK / EFK, CloudWatch, Datadog) for monitoring and tracing.
  • Ensure proactive alerting and real-time visibility into system health.
  • Security & Compliance :

  • Apply AWS security best practices for IAM, networking, and data protection.
  • Ensure compliance with internal and external regulatory frameworks (SOC2, ISO, GDPR, etc.).
  • Required Skills & Qualifications

  • 6–10 years of experience in Cloud, DevOps, or SRE roles.
  • Strong hands-on expertise in AWS Cloud (certifications preferred : AWS DevOps Engineer / Solutions Architect).
  • Advanced Python development skills for automation and tooling (Boto3 a must).
  • Experience designing and running chaos experiments (Gremlin, AWS FIS, Litmus, Chaos Mesh, or custom Python-based fault injection).
  • Solid knowledge of IaC (Terraform / CloudFormation) .
  • Proficiency in containers & orchestration (Docker, Kubernetes, EKS) .
  • Strong background in monitoring, observability, and incident management .
  • Familiarity with DevOps toolchain (CI / CD, Git, Jenkins, GitLab, CodePipeline) .
  • Good understanding of resilient architectures, reliability principles, and disaster recovery .
  • Preferred Skills

  • Knowledge of Go / Shell scripting in addition to Python.
  • Experience with chaos testing in production-like environments .
  • Exposure to multi-cloud or hybrid-cloud environments .
  • Strong problem-solving and debugging skills.
  • What We Offer

  • Opportunity to lead cloud reliability & chaos engineering initiatives .
  • Culture focused on automation, resilience, and continuous improvement .
  • Growth opportunities through certifications, R&D projects, and leadership roles.
  • Skills Required

    Elk, Cloudformation, Prometheus, Grafana, Datadog, Jenkins, Cloudwatch, Terraform, Docker, Flask, AWS CodePipeline, FastAPI, Kubernetes, Python, Aws

    Create a job alert for this search

    Site Reliability Engineer • India

    Related jobs
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Nagpur, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer Ii

    Site Reliability Engineer Ii

    RecRootsRepublic Of India, IN
    Key Job Responsibilities and Duties : .The core premise for the SRE lies in treating operational issues as a software problem. We code our way out of problems where operations are concerned addressing...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Turgajo Technologies Pvt. Ltd.Republic Of India, IN
    We are a product-based company, on a mission to capitalize on the evolution of new technologies and the new opportunities they present. We develop cutting-edge software solutions for the service ind...Show moreLast updated: 22 days ago
    • Promoted
    Sr Site Reliability Engineer

    Sr Site Reliability Engineer

    Media.netRepublic Of India, IN
    Net is a leading, global ad tech company that focuses on creating the most transparent and efficient path for advertiser budgets to become publisher revenue. Our proprietary contextual technology is...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeNagpur, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 13 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Searce IncPune, Republic Of India, IN
    The ‘process-first’ AI-native modern tech consultancy that's rewriting the rules.As an engineering-led consultancy, we are dedicated to relentlessly improving the real business outcomes.Our solvers...Show moreLast updated: 13 days ago
    • Promoted
    Site Reliability Engineer 2

    Site Reliability Engineer 2

    ConfidentialIndia
    Every career journey is personal.That's why we empower you with the tools and support to create your own success story.We are seeking a skilled Site Reliability Engineer 2 (SRE 2) with a strong bac...Show moreLast updated: 4 days ago
    • Promoted
    Senior Site Reliability Engineer - Remote

    Senior Site Reliability Engineer - Remote

    ConfidentialIndia
    Remote
    Senior Site Reliability Engineer - Remote.Do you have a passion for cutting edge technologies and tackling system problems. Are you a self-starting professional who thrives in a fast-paced environme...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiIndia, India
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 10 days ago
    • Promoted
    Senior MLOps Engineer

    Senior MLOps Engineer

    Mitchell Martin Inc.Nagpur, IN
    Include, but are not limited to, the following : .Own productionizing models—from tracked experiments to governed releases—ensuring resilient services with clear SLOs, runbooks, and fast, safe rollba...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Nebula Tech Solutionsnagpur, maharashtra, in
    SRE team supporting mission-critical applications for our.We’re now looking for engineers who can go beyond operations — those who can. Enhance application reliability through code.Add or modify cod...Show moreLast updated: 23 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    o9 Solutions, Inc.nagpur, maharashtra, in
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 22 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmanagpur, maharashtra, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 21 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Tata Consultancy ServicesChennai, Republic Of India, IN
    Role : Site Reliability Engineer.Locations : Chennai / Pune / Kolkata.Show moreLast updated: 9 days ago
    • Promoted
    Deployment Engineer

    Deployment Engineer

    AvocaNagpur, IN
    Build, launch & optimize AI agents that power the next generation of home-service customer experiences.Avoca is the all-in-one AI lead-conversion platform. Our technology boosts booking rates, slash...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    PRI GlobalPune, Republic Of India, IN
    Experience in Linux, Azure cloud certification and candidate must have good knowledge on Bash / jenkins / Chef / chef-habitat technologies.Show moreLast updated: 22 hours ago
    • Promoted
    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Jade Globalnagpur, maharashtra, in
    Senior Site Reliability Engineer (SRE) – Datadog Observability.SRE and Infrastructure Operations with minimum 3.Hyderabad preferable but open for Pune and remote. Site Reliability Engineer (SRE).SRE...Show moreLast updated: 23 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    JefferiesRepublic Of India, IN
    Jefferies,’’ ‘‘we,’’ ‘‘us’’ or ‘‘our’’) is a U.Our largest subsidiary, Jefferies LLC, a U.Jefferies International Limited, a U. Our strategy focuses on continuing to build out our investment banking...Show moreLast updated: 30+ days ago