Category
Details
Role
Site Reliability Engineer (SRE) III – Data Engineering
Location
Hyderabad-
Employment Type
Full Time
Experience
7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U)
Primary Skills (Must-Have)
AWS, CI / CD, Jenkins, IAAC, Terraform, Kubernetes
Secondary Skills (Good-to-Have)
AWS systems; Dataiku data, Platform updates and patching
Tools & Platforms
Data Warehousing & Processing : Snowflake, Redshift, Apache Airflow, dbt
CI / CD & Deployment : Jenkins, GitHub Actions, AWS CodePipeline, Terraform
Cloud & Event Processing : AWS Lambda, API Gateway, SNS / SQS, Kafka, Step Functions
Monitoring & Logging : DataDog, AWS CloudWatch, Prometheus, Splunk
Incident Management : PagerDuty, Opsgenie, AWS Health Dashboard
Collaboration & Code Review : GitHub, Jira, Confluence
Key Responsibilities
Data Pipeline Reliability & Observability :
- Maintain and optimize highly available, fault-tolerant infrastructure for data pipelines, ETL jobs, and real-time data processing
- Implement end-to-end monitoring of Airflow DAGs, Snowflake queries, and AWS-based data workflows
- Automate data pipeline health checks, error handling, and auto-remediation strategies
Infrastructure & Cloud Automation :
Deploy and manage AWS-based data infrastructure using Terraform and CloudFormationOptimize Kubernetes (EKS) clusters for processing large-scale datasets and real-time analyticsEnsure high availability and cost-efficient scaling for Redshift, Snowflake, and data storage solutionsPerformance, Monitoring & Incident Response :
Implement real-time monitoring, logging, and alerting using DataDog, AWS CloudWatch, and PrometheusDefine and track SLOs, SLIs, and error budgets to improve data reliability and uptimeConduct Root Cause Analysis (RCA), security audits, and post-mortems for incidentsSecurity & Compliance :
Ensure GDPR, CCPA, and SOC 2 compliance for data storage, access controls, and retention policiesImplement AWS security best practices (IAM, KMS, Shield, WAF) to secure data access and encryptionSecure API gateways, authentication mechanisms, and data lake permissions to prevent unauthorized accessCollaboration & Leadership :
Work closely with data engineers, analytics teams, and DevOps engineers to enhance data platform reliabilityParticipate in incident response drills, disaster recovery planning, and security compliance reviewsAdvocate for best practices in automation, cost optimization, and cloud-native data solutions