Talent.com
▷ [09 / 11 / 2025] Site Reliability Engineer

▷ [09 / 11 / 2025] Site Reliability Engineer

Sonata SoftwareHyderabad, Telangana, India
3 hours ago
Job description

Category

Details

Role

Site Reliability Engineer (SRE) III – Data Engineering

Location

Hyderabad-

Employment Type

Full Time

Experience

7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U)

Primary Skills (Must-Have)

AWS, CI / CD, Jenkins, IAAC, Terraform, Kubernetes

Secondary Skills (Good-to-Have)

AWS systems; Dataiku data, Platform updates and patching

Tools & Platforms

Data Warehousing & Processing : Snowflake, Redshift, Apache Airflow, dbt

CI / CD & Deployment : Jenkins, GitHub Actions, AWS CodePipeline, Terraform

Cloud & Event Processing : AWS Lambda, API Gateway, SNS / SQS, Kafka, Step Functions

Monitoring & Logging : DataDog, AWS CloudWatch, Prometheus, Splunk

Incident Management : PagerDuty, Opsgenie, AWS Health Dashboard

Collaboration & Code Review : GitHub, Jira, Confluence

Key Responsibilities

Data Pipeline Reliability & Observability :

  • Maintain and optimize highly available, fault-tolerant infrastructure for data pipelines, ETL jobs, and real-time data processing
  • Implement end-to-end monitoring of Airflow DAGs, Snowflake queries, and AWS-based data workflows
  • Automate data pipeline health checks, error handling, and auto-remediation strategies

Infrastructure & Cloud Automation :

  • Deploy and manage AWS-based data infrastructure using Terraform and CloudFormation
  • Optimize Kubernetes (EKS) clusters for processing large-scale datasets and real-time analytics
  • Ensure high availability and cost-efficient scaling for Redshift, Snowflake, and data storage solutions
  • Performance, Monitoring & Incident Response :

  • Implement real-time monitoring, logging, and alerting using DataDog, AWS CloudWatch, and Prometheus
  • Define and track SLOs, SLIs, and error budgets to improve data reliability and uptime
  • Conduct Root Cause Analysis (RCA), security audits, and post-mortems for incidents
  • Security & Compliance :

  • Ensure GDPR, CCPA, and SOC 2 compliance for data storage, access controls, and retention policies
  • Implement AWS security best practices (IAM, KMS, Shield, WAF) to secure data access and encryption
  • Secure API gateways, authentication mechanisms, and data lake permissions to prevent unauthorized access
  • Collaboration & Leadership :

  • Work closely with data engineers, analytics teams, and DevOps engineers to enhance data platform reliability
  • Participate in incident response drills, disaster recovery planning, and security compliance reviews
  • Advocate for best practices in automation, cost optimization, and cloud-native data solutions
  • Create a job alert for this search

    Site Reliability Engineer • Hyderabad, Telangana, India

    Related jobs
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    AutoRABITHyderabad, Telangana, India
    AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Zyoin GroupHyderabad
    Description : As the most senior technical individual contributor within an entire division of Engine...Show moreLast updated: 9 days ago
    • Promoted
    Sr Engineer, Site Reliability Engineer [T500-20464]

    Sr Engineer, Site Reliability Engineer [T500-20464]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 29 days ago
    • Promoted
    Engineer, Site Reliability [T500-20521]

    Engineer, Site Reliability [T500-20521]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 29 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    AutoRABITHyderabad, Republic Of India, IN
    AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeHyderabad, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 17 days ago
    • Promoted
    Engineer, Site Reliability [T500-20502]

    Engineer, Site Reliability [T500-20502]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 29 days ago
    • Promoted
    Site Reliability Engineer T500-21132

    Site Reliability Engineer T500-21132

    InspireHyderabad, Republic Of India, IN
    Inspire Brands is disrupting the restaurant industry through digital transformation and operational efficiencies.The company’s technology hub, Inspire Brands Hyderabad Support Center, India, will l...Show moreLast updated: 4 days ago
    • Promoted
    Engineer, Site Reliability [T500-20517]

    Engineer, Site Reliability [T500-20517]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 29 days ago
    • Promoted
    Engineer, Site Reliability [T500-20515]

    Engineer, Site Reliability [T500-20515]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 29 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiHyderabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 14 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    InfosysHyderabad, Republic Of India, IN
    We are seeking a skilled and motivated Site Reliability Engineer with hands-on expertise.DevOps tools, and SRE principles. Provide production support for Production applications, ensuring the stabil...Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    FoodsmartHyderabad, Republic Of India, IN
    Foodsmart is the leading telenutrition and foodcare solution, backed by a robust network of Registered Dietitians.Our platform is designed to foster healthier food choices, drive lasting behavior c...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CitNOW GroupHyderabad, IN
    Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably.CitNOW’s app-based plat...Show moreLast updated: 3 days ago
    • Promoted
    Engineer, Site Reliability [T500-20266]

    Engineer, Site Reliability [T500-20266]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 29 days ago
    • Promoted
    Site Reliability Engineer [T500-21132]

    Site Reliability Engineer [T500-21132]

    InspireHyderabad, Telangana, India
    Inspire Brands is disrupting the restaurant industry through digital transformation and operational efficiencies.The company’s technology hub, Inspire Brands Hyderabad Support Center, India, will l...Show moreLast updated: 3 days ago
    • Promoted
    Engineer, Site Reliability [T500-20519]

    Engineer, Site Reliability [T500-20519]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 29 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    S&P GlobalHyderabad, Telangana, India
    This job is with S&P Global, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.About the Rol...Show moreLast updated: 10 days ago