We are seeking a highly skilled PySpark Developer with 7+ years of experience and expertise in Reltio MDM to join our data engineering team. You will design and implement scalable data processing solutions, integrate enterprise systems with Reltio, and ensure high-quality data governance.
Key Responsibilities :
- Develop and maintain PySpark data pipelines on platforms like AWS EMR or Databricks.
- Integrate and synchronize data between enterprise applications and Reltio MDM.
- Design and implement data transformation, cleansing, and enrichment logic.
- Collaborate with architects and analysts for effective data modeling.
- Build and manage API-based integrations between Reltio and upstream / downstream systems.
- Optimize PySpark jobs for performance, scalability, and cost-efficiency.
- Ensure data quality, integrity, and governance throughout the pipeline lifecycle.
- Troubleshoot and resolve data-related and performance issues.
Required Skills & Qualifications :
7+ years of hands-on experience in PySpark and distributed data processing.Strong command of Apache Spark, Spark SQL, and DataFrames.Deep expertise in Reltio MDM (entity modeling, survivorship rules, match & merge configuration).Proficiency in REST APIs, JSON, and data integration techniques.Experience with AWS services (S3, Lambda, Step Functions).Solid understanding of ETL workflows, data warehousing, and data modeling.Familiarity with CI / CD pipelines and Git.Excellent problem-solving, communication, and collaboration skills.(ref : hirist.tech)