About the Company
Our client is a pioneering company dedicated to simplifying daily living through an integrated ecosystem for rental, purchasing, and co-living needs. They are committed to delivering exceptional results by harnessing the power of artificial intelligence and machine learning. As a rapidly growing organization, they offer a dynamic work environment with opportunities for professional growth and real-world impact.
Job Title :
Data Engineer - Lakehouse Engine (Martech CDP Implementation)
Experience Required :
5 to 9 years
About the Role :
We are building a next-generation Customer Data Platform (CDP) powered by the Databricks
Lakehouse architecture and Lakehouse Engine framework. We're looking for a skilled Data
Engineer with 5-9 years of experience to help us build metadata-driven pipelines, enable real-time
data processing, and support marketing campaign orchestration capabilities at scale.
Key Responsibilities
Lakehouse Engine Implementation
- Configure and extend the Lakehouse Engine framework for batch and streaming pipelines
- Implement the medallion architecture (Bronze ->
Silver ->
Gold) using Delta Lake
Develop metadata-driven ingestion patterns from various customer data sourcesBuild reusable transformers for PII handling, data standardization, and data quality enforcementReal-Time CDP Enablement
Build Spark Structured Streaming pipelines for customer behavior and event trackingSet up Debezium + Kafka for Change Data Capture (CDC) from CRM systemsDesign and develop identity resolution logic across both streaming and batch datasetsDataOps & Governance
Use Unity Catalog for managing RBAC, data lineage, and auditabilityIntegrate Great Expectations or similar tools for continuous data quality monitoringSet up CI / CD pipelines for deploying Databricks notebooks, jobs, and DLT pipelinesTechnical Requirements
Must Have :
5-9 years of hands-on experience in data engineeringExpertise in Databricks Lakehouse platform, Delta Lake, and Unity CatalogAdvanced PySpark skills, including Structured StreamingExperience implementing Kafka + Debezium CDC pipelinesStrong in SQL transformations, data modeling, and analytical queryingFamiliarity with metadata-driven architecture and parameterized pipelinesUnderstanding of data governance : PII masking, access controls, lineage trackingProficiency in working with AWS, MongoDB, and PostgreSQLNice to Have :
Experience working on Customer 360 or Martech CDP platformsFamiliarity with Martech tools like Segment, Braze, or other CDPsExposure to ML pipelines for segmentation, scoring, or personalizationKnowledge of CI / CD for data workflows using GitHub Actions, Terraform, or Databricks CLI