Talent.com
This job offer is not available in your country.
Site Reliability Engineer

Site Reliability Engineer

Sonata SoftwareHyderabad, India
4 days ago
Job description

Category

Details

Role

Site Reliability Engineer (SRE) III – Data Engineering

Location

Hyderabad-

Employment Type

Full Time

Experience

7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U)

Primary Skills (Must-Have)

AWS, CI / CD, Jenkins, IAAC, Terraform, Kubernetes

Secondary Skills (Good-to-Have)

AWS systems; Dataiku data, Platform updates and patching

Tools & Platforms

Data Warehousing & Processing : Snowflake, Redshift, Apache Airflow, dbt

CI / CD & Deployment : Jenkins, GitHub Actions, AWS CodePipeline, Terraform

Cloud & Event Processing : AWS Lambda, API Gateway, SNS / SQS, Kafka, Step Functions

Monitoring & Logging : DataDog, AWS CloudWatch, Prometheus, Splunk

Incident Management : PagerDuty, Opsgenie, AWS Health Dashboard

Collaboration & Code Review : GitHub, Jira, Confluence

Key Responsibilities

Data Pipeline Reliability & Observability :

  • Maintain and optimize highly available, fault-tolerant infrastructure for data pipelines, ETL jobs, and real-time data processing
  • Implement end-to-end monitoring of Airflow DAGs, Snowflake queries, and AWS-based data workflows
  • Automate data pipeline health checks, error handling, and auto-remediation strategies

Infrastructure & Cloud Automation :

  • Deploy and manage AWS-based data infrastructure using Terraform and CloudFormation
  • Optimize Kubernetes (EKS) clusters for processing large-scale datasets and real-time analytics
  • Ensure high availability and cost-efficient scaling for Redshift, Snowflake, and data storage solutions
  • Performance, Monitoring & Incident Response :

  • Implement real-time monitoring, logging, and alerting using DataDog, AWS CloudWatch, and Prometheus
  • Define and track SLOs, SLIs, and error budgets to improve data reliability and uptime
  • Conduct Root Cause Analysis (RCA), security audits, and post-mortems for incidents
  • Security & Compliance :

  • Ensure GDPR, CCPA, and SOC 2 compliance for data storage, access controls, and retention policies
  • Implement AWS security best practices (IAM, KMS, Shield, WAF) to secure data access and encryption
  • Secure API gateways, authentication mechanisms, and data lake permissions to prevent unauthorized access
  • Collaboration & Leadership :

  • Work closely with data engineers, analytics teams, and DevOps engineers to enhance data platform reliability
  • Participate in incident response drills, disaster recovery planning, and security compliance reviews
  • Advocate for best practices in automation, cost optimization, and cloud-native data solutions
  • Create a job alert for this search

    Site Reliability Engineer • Hyderabad, India

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ValueMomentumHyderabad, Telangana, India
    Site Reliability / Azure DevOps Engineer with Dynatrace Experience.CI / CD practices, infrastructure automation, and cloud operations. The ideal candidate will have deep expertise in Azure DevOps, Inf...Show moreLast updated: 20 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TalentiserHyderabad, Telangana, India
    Reliability, Automation, and Observability As a hybrid Site Reliability Engineer / DevOps Engineer, you'll be a key driver in ensuring the stability, performance, and scalability of our mission-criti...Show moreLast updated: 15 days ago
    • Promoted
    Sr Engineer, Site Reliability Engineer

    Sr Engineer, Site Reliability Engineer

    TMUS Global SolutionsHyderabad, India
    The Senior Systems Reliability Engineer (SRE) ensures the stability, performance, and reliability of IT services and infrastructure. This role combines software engineering and operations expertise ...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Site Reliability [T500-20517]

    Engineer, Site Reliability [T500-20517]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 9 days ago
    • Promoted
    Engineer, Site Reliability [T500-20521]

    Engineer, Site Reliability [T500-20521]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 9 days ago
    • Promoted
    Sr Engineer, Site Reliability Engineer [T500-20464]

    Sr Engineer, Site Reliability Engineer [T500-20464]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 8 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    SID Global SolutionsHyderabad, Telangana, India
    Job Role : Site Reliability Engineer (SRE) – GCP.SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortu...Show moreLast updated: 15 days ago
    • Promoted
    Engineer, Site Reliability [T500-20503]

    Engineer, Site Reliability [T500-20503]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 9 days ago
    • Promoted
    Engineer, Site Reliability [T500-20520]

    Engineer, Site Reliability [T500-20520]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 8 days ago
    • Promoted
    Engineer, Site Reliability [T500-20515]

    Engineer, Site Reliability [T500-20515]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 8 days ago
    • Promoted
    Sr Engineer, Site Reliability

    Sr Engineer, Site Reliability

    TMUS Global SolutionsHyderabad, India
    As a Senior Site Reliability Engineer, you will be a key member of the CFL Platform Engineering and Operations team you will play a pivotal role in building and scaling intelligent infrastructure t...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer - Site Relibility - FPT

    Engineer - Site Relibility - FPT

    Talent500 INCHyderabad, India
    Engineer - Site Reliability - FPT.As a Site Reliability Engineer, youll play a crucial role in keeping our digital backbone running seamlessly for millions of customers. Your mission : reduce inciden...Show moreLast updated: 30+ days ago
    • Promoted
    AWS Site Reliability Engineer

    AWS Site Reliability Engineer

    HTC Global ServicesHyderabad, Telangana, India
    Troy, Michigan, is a leading global Information Technology solution and BPO provider.HTC assists clients across multiple industry verticals, offering turnkey project lifecycle in, e-business, data ...Show moreLast updated: 14 days ago
    • Promoted
    Engineer, Site Reliability [T500-20266]

    Engineer, Site Reliability [T500-20266]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 9 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    o9 Solutions, Inc.hyderabad, telangana, in
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 5 days ago
    • Promoted
    Engineer, Site Reliability

    Engineer, Site Reliability

    TMUS Global SolutionsHyderabad, India
    As a Site Reliability Engineer (SRE), you will be a key member of the CFL Platform Engineering and Operations team you will be responsible for building and maintaining large-scale, distributed syst...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Site Reliability [T500-20504]

    Engineer, Site Reliability [T500-20504]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 9 days ago
    • Promoted
    Engineer, Site Reliability [T500-20519]

    Engineer, Site Reliability [T500-20519]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 9 days ago
    • Promoted
    Engineer, Site Reliability [T500-20518]

    Engineer, Site Reliability [T500-20518]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 9 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    NationsBenefits IndiaHyderabad, Telangana, India
    Site Reliability Engineer (SRE) | Fintech | Kubernetes | Datadog |.SRE team focused on maintaining the performance, reliability, and availability of our fintech platforms.Triage and resolve product...Show moreLast updated: 5 days ago