Responsibilities :
- 24x7 production support for all Data Engineering Technology application jobs and processes
- Solid understanding of data pipelines, transforms, and step functions in AWS Glue
- Monitoring of all application jobs and processes in Redwood Scheduler, AWS, and Azure
- Lead a group of engineers working 24x7
- Troubleshooting and resolving all failures
- Proactively identify pipeline failures and quickly analyze the root cause
- Batch job / report performance monitoring and optimization
- Incident creation, triaging, and resolution
- Proactive communication of issues and job / process delays
- Drive P1 / P2 incident calls by triaging issues and collaborating with internal and external stakeholders
- Sense of ownership and accountability for tasks
- Provide daily load status reports
- Post-patching application validation
- Manage and maintain knowledge database / repository and runbooks
- Regression testing / job execution in a non-prod environment (e.g., application upgrade testing)
- Post-incident analysis and problem ticket creation and management
- Provide evidence about production processes as required by audit
- Support disaster recovery (DR) and business continuity planning (BCP) activities
- Identify improvement areas and propose solutions
Required Technical Skillset :
Redwood or any similar job scheduling toolPreferred Skillset :
AWS Glue, AWS CloudWatch, Dynamo DP, Azure DART, ETL tools, databases, ability to troubleshoot scriptsSkills Required
AWS Glue, Redwood, Cloudwatch, Etl, Sql, Data Support