Roles & Responsibilities :
- Collaborate with the QA Manager to design and implement end-to-end test strategies for data validation, semantic layer testing, and GraphQL API validation.
- Perform manual validation of data pipelines, including source-to-target data mapping, transformation logic, and business rule verification.
- Develop and maintain automated data validation scripts using Python and PySpark for both real-time and batch pipelines.
- Contribute to the design and enhancement of reusable automation frameworks, with components for schema validation, data reconciliation, and anomaly detection.
- Validate semantic layers (e.g., Looker, dbt models) and GraphQL APIs, ensuring data consistency, compliance with contracts, and alignment with business expectations.
- Write and manage test plans, test cases, and test data for structured, semi-structured, and unstructured data.
- Track, manage, and report defects using tools like JIRA, ensuring thorough root cause analysis and timely resolution.
- Collaborate with Data Engineers, Product Managers, and DevOps teams to integrate tests into CI / CD pipelines and enable shift-left testing practices.
- Ensure comprehensive test coverage for all aspects of the data lifecycle, including ingestion, transformation, delivery, and consumption.
- Participate in QA ceremonies (standups, planning, retrospectives) and continuously contribute to improving the QA process and culture.
- Experience building or maintaining test data generators
- Contributions to internal quality dashboards or data observability systems
- Awareness of metadata-driven testing approaches and lineage-based validations
- Experience working with agile Testing methodologies such as Scaled Agile.
- Familiarity with automated testing frameworks like Selenium, JUnit, TestNG, or PyTest.
Must-Have Skills :
69 years of experience in QA roles, with at least 3+ years of strong exposure to data pipeline testing and ETL validation.Strong in SQL, Python, and optionally PySpark comfortable with writing complex queries and validation scripts.Practical experience with manual validation of data pipelines and source-to-target testing.Experience in validating GraphQL APIs, semantic layers (Looker, dbt, etc.), and schema / data contract compliance.Familiarity with data integration tools and platforms such as Databricks, AWS Glue, Redshift, Athena, or BigQuery.Strong understanding of test planning, defect tracking, bug lifecycle management, and QA documentation.Experience working in Agile / Scrum environments with standard QA processes.Knowledge of test case and defect management tools (e.g., JIRA, TestRail, Zephyr).Strong understanding of QA methodologies, test planning, test case design, and defect lifecycle management.Deep hands-on expertise in SQL, Python, and PySpark for testing and automating validation.Proven experience in manual and automated testing of batch and real-time data pipelines.Familiarity with data processing and analytics stacks : Databricks, Spark, AWS (Glue, S3, Athena, Redshift).Experience with bug tracking and test management tools like JIRA, TestRail, or Zephyr.Ability to troubleshoot data issues independently and collaborate with engineering for root cause analysis.Experience integrating automated tests into CI / CD pipelines (e.g., Jenkins, GitHub Actions).Experience validating data from various file formats such as JSON, CSV, Parquet, and AvroSkills Required
Testrail, Jira, Sql, Python