About the Role
We are looking for a detail-oriented Data Validation Engineer to ensure the accuracy, quality, and consistency of data across our platforms. The ideal candidate will be proficient in PySpark and have strong experience working with Databricks to build automated validation frameworks that verify ingestion, transformation, and reporting pipelines.
You will work closely with Data Engineers, Data Scientists, and business stakeholders to design validation rules, troubleshoot data issues, and ensure reliable delivery of high-quality data.
Key Responsibilities :
- Develop and maintain automated data validation pipelines using PySpark on Databricks.
- Validate data ingestion, transformation, and aggregation processes across multiple data sources.
- Identify, document, and resolve data quality issues in collaboration with engineering and analytics teams.
- Create reusable validation frameworks to check data accuracy, completeness, timeliness, and consistency.
- Write and optimize PySpark queries for large-scale validation in Databricks notebooks.
- Perform root cause analysis for data discrepancies and ensure corrective actions are implemented.
- Work with business stakeholders to define validation rules and acceptance criteria.
- Contribute to continuous improvement of data quality, governance, and monitoring & Skills :
- 36 years of experience in Data Engineering, Data Validation, or Data Quality roles.
- Strong expertise in PySpark, Python, and SQL, with hands-on Databricks experience.
- Experience building validation frameworks on Databricks and Spark-based platforms.
- Familiarity with ETL / ELT processes, data pipelines, and data warehousing concepts.
- Exposure to cloud data ecosystems (Azure, AWS, or GCP) with Databricks integration.
- Knowledge of data governance, monitoring, and quality frameworks.
- Strong analytical and troubleshooting skills with attention to detail.
- Excellent communication and collaboration We Offer :
- Opportunity to work on large-scale data validation projects using Databricks & PySpark.
- 100% Remote Opportunity
- Exposure to modern big data platforms and cloud data ecosystems.
- Collaborative, innovation-driven, and growth-oriented culture.
- Competitive compensation and career advancement opportunities.
(ref : hirist.tech)