Description :
We are looking for a highly skilled ETL Tester with strong experience in validating data pipelines, ensuring data integrity, and testing end-to-end ETL workflows. The ideal candidate will have a solid background in ETL processes, Python, SQL, Hadoop, and PySpark, with a keen eye for detail and a passion for delivering high-quality data solutions.
Key Responsibilities :
- Design, develop, and execute ETL test cases, test scripts, and test data for complex data integration projects.
- Validate data extraction, transformation, and loading (ETL) processes across various data sources and targets.
- Perform data validation, data reconciliation, and data integrity testing using SQL and Python scripts.
- Collaborate with data engineers, analysts, and business teams to understand business rules and ensure accurate data delivery.
- Conduct performance and regression testing for ETL workflows.
- Work with large-scale datasets in Hadoop environments using PySpark for data analysis and validation.
- Identify, log, and track defects, and ensure timely resolution through coordination with development teams.
- Document test plans, results, and process improvements to support quality and compliance standards.
Required Skills :
Strong experience in ETL testing and data validation methodologies.Proficiency in SQL (complex queries, joins, aggregations, and data comparisons).Hands-on experience with Python for test automation and data validation scripting.Good understanding of Hadoop ecosystem (HDFS, Hive, Spark).Practical experience with PySpark for handling big data validation.Strong analytical and problem-solving skills with attention to detail.Experience working in Agile / Scrum environments.Preferred Qualifications :
Knowledge of data warehousing concepts and ETL tools (e.g., Informatica, Talend, DataStage, or similar).Familiarity with cloud data platforms (e.g., AWS, Azure, GCP).Experience with CI / CD tools for test automation integration.Why Join Us :
Opportunity to work on cutting-edge big data and ETL projects.Collaborative and growth-oriented environment.Exposure to modern data technologies and cloud ecosystems.(ref : hirist.tech)