Category
Details
Role
Site Reliability Engineer (SRE) III – Data Engineering
Location
Hyderabad-
Employment Type
Full Time
Experience
7–12 years in
site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering
within
EdTech platforms (2U)
Primary Skills (Must-Have)
AWS, CI / CD, Jenkins, IAAC, Terraform, Kubernetes
Secondary Skills (Good-to-Have)
AWS systems; Dataiku data, Platform updates and patching
Tools & Platforms
Data Warehousing & Processing : Snowflake, Redshift, Apache Airflow, dbt
CI / CD & Deployment : Jenkins, GitHub Actions, AWS CodePipeline, Terraform
Cloud & Event Processing : AWS Lambda, API Gateway, SNS / SQS, Kafka, Step Functions
Monitoring & Logging : DataDog, AWS CloudWatch, Prometheus, Splunk
Incident Management : PagerDuty, Opsgenie, AWS Health Dashboard
Collaboration & Code Review : GitHub, Jira, Confluence
Key Responsibilities
Data Pipeline Reliability & Observability :
highly available, fault-tolerant infrastructure
for
data pipelines, ETL jobs, and real-time data processing
end-to-end monitoring of Airflow DAGs, Snowflake queries, and AWS-based data workflows
data pipeline health checks, error handling, and auto-remediation strategies
Infrastructure & Cloud Automation :
AWS-based data infrastructure using Terraform and CloudFormation
Kubernetes (EKS) clusters
for processing large-scale datasets and real-time analytics
high availability and cost-efficient scaling
for
Redshift, Snowflake, and data storage solutions
Performance, Monitoring & Incident Response :
real-time monitoring, logging, and alerting
using
DataDog, AWS CloudWatch, and Prometheus
SLOs, SLIs, and error budgets
to improve data reliability and uptime
Root Cause Analysis (RCA), security audits, and post-mortems for incidents
Security & Compliance :
GDPR, CCPA, and SOC 2 compliance
for
data storage, access controls, and retention policies
AWS security best practices (IAM, KMS, Shield, WAF) to secure data access and encryption
API gateways, authentication mechanisms, and data lake permissions
to prevent unauthorized access
Collaboration & Leadership :
data engineers, analytics teams, and DevOps engineers
to enhance data platform reliability
incident response drills, disaster recovery planning, and security compliance reviews
best practices in automation, cost optimization, and cloud-native data solutions
Site Reliability Engineer • India