Our client, a leading global specialist in energy management and automation is seeking an experienced Data Quality (DQ) and ETL Engineer to ensure the integrity, reliability, and governance of our data platform. This role is a hybrid of development, operations, and data governance.
Key Responsibilities :
- Design, develop, and deploy scalable ETL / ELT solutions using AWS Glue (PySpark) and Python, IDQ (Informatica Data Quality) to ingest and transform high-volume data.
- Write and optimize complex SQL queries and stored procedures.
- Manage and orchestrate automated data workflows using AWS Step Functions or Apache Airflow.
- Define, implement, and enforce robust Data Quality (DQ) rules (completeness, accuracy, and consistency) across all stages of the data lifecycle.
- Develop and manage automated data validation frameworks and monitoring tools to proactively detect and report anomalies.
- Conduct continuous Data Profiling to understand data lineage, identify gaps, and ensure compliance with governance standards.
- Perform deep technical troubleshooting and root cause analysis (RCA) on pipeline failures and data quality exceptions.
- Optimize the performance of ETL jobs and queries to improve data freshness and system efficiency.
- Collaborate with data architects and business users to align pipeline design with data modelling best practices (e.g., star schema).
- 2+ years of dedicated experience working with the AWS data ecosystem.
- AWS Certified Data Analytics – Specialty or AWS Certified Developer / Informatica DQ certification is highly preferred.
Skills Required
Data Engineering & ETL / ELT Development : Hands-on experience building scalable pipelines using AWS Glue (PySpark), Python, and SQL.Cloud Data Workflow Orchestration : Proficiency with AWS Step Functions or Apache Airflow for automated pipeline scheduling and monitoring.Data Quality & Governance : Strong command over IDQ (Informatica Data Quality), rule implementation, data profiling, and automated validation frameworks.Troubleshooting & Performance Optimization : Ability to perform RCA for pipeline failures and optimize ETL jobs and queries for efficiency and data freshness.