We are building a next-generation Customer Data Platform (CDP) powered by the Databricks Lakehouse architecture and Lakehouse Engine framework. We're looking for a skilled Data Engineer with 4-9 years of experience to help us build metadata-driven pipelines, enable real-time data processing, and support marketing campaign orchestration capabilities at scale.
Responsibilities :
- Configure and extend the Lakehouse Engine framework for batch and streaming pipelines.
- Implement the medallion architecture (Bronze -> Silver -> Gold) using Delta Lake.
- Develop metadata-driven ingestion patterns from various customer data sources.
- Build reusable transformers for PII handling, data standardization, and data quality enforcement.
- Build Spark Structured Streaming pipelines for customer behavior and event tracking.
- Set up Debezium + Kafka for Change Data Capture (CDC) from CRM systems.
- Design and develop identity resolution logic across both streaming and batch datasets.
- Use Unity Catalog for managing RBAC, data lineage, and auditability.
- Integrate Great Expectations or similar tools for continuous data quality monitoring.
- Set up CI / CD pipelines for deploying Databricks notebooks, jobs, and DLT pipelines.
Requirements :
4-9 years of hands-on experience in data engineering.Expertise in Databricks Lakehouse platform, Delta Lake, and Unity Catalog.Advanced PySpark skills, including Structured Streaming.Experience implementing Kafka + Debezium CDC pipelines.Strong in SQL transformations, data modeling, and analytical querying.Familiarity with metadata-driven architecture and parameterized pipelines.Understanding of data governance : PII masking, access control, and lineage tracking.Proficiency in working with AWS, MongoDB, and PostgreSQL.Experience working on Customer 360 or Martech CDP platforms.Familiarity with Martech tools like Segment, Braze, or other CDPs.Exposure to ML pipelines for segmentation, scoring, or personalization.Knowledge of CI / CD for data workflows using GitHub Actions, Terraform, or Databricks CLI.(ref : hirist.tech)