Talent.com
This job offer is not available in your country.
Senior Data Engineer

Senior Data Engineer

Celebal TechnologiesMumbai, Maharashtra, India
11 hours ago
Job description

JOB DESCRIPTION Data Engineer Designation – Data Engineer Experience – 5+ Years Location Mumbai (onsite)

Job Summary : We are seeking a highly skilled Data Engineer with deep expertise in Apache Kafka integration with Databricks, structured streaming, and large-scale data pipeline design using the Medallion Architecture. The ideal candidate will demonstrate strong hands-on experience in building and optimizing real-time and batch pipelines, and will be expected to solve real coding problems during the interview. Job Description :

  • Design, develop, and maintain real-time and batch data pipelines in Databricks.
  • Integrate Apache Kafka with Databricks using Structured Streaming.
  • Implement robust data ingestion frameworks using Databricks Autoloader.
  • Build and maintain Medallion Architecture pipelines across Bronze, Silver, and Gold layers.
  • Implement checkpointing, output modes, and appropriate processing modes in structured streaming jobs.
  • Design and implement Change Data Capture (CDC) workflows and Slowly Changing Dimensions (SCD) Type 1 and Type 2 logic.
  • Develop reusable components for merge / upsert operations and window functionbased transformations.
  • Handle large volumes of data efficiently through proper partitioning, caching, and cluster tuning techniques.
  • Collaborate with cross-functional teams to ensure data availability, reliability, and consistency. Must Have :
  • Apache Kafka : Integration, topic management, schema registry (Avro / JSON).
  • Databricks & Spark Structured Streaming : o Processing Modes : Append, Update, Complete o Output Modes : Memory, Console, File, Kafka, Delta o Checkpointing and fault tolerance
  • Databricks Autoloader : Schema inference, schema evolution, incremental loads.
  • Medallion Architecture implementation expertise.
  • Performance Optimization : o Data partitioning strategies o Caching and persistence o Adaptive query execution and cluster configuration tuning
  • SQL & Spark SQL : Proficiency in writing efficient queries and transformations.
  • Data Governance : Schema enforcement, data quality checks, and monitoring.
  • Good to Have :
  • Strong coding skills in Python and PySpark.
  • Experience working in CI / CD environments for data pipelines.
  • Exposure to cloud platforms (AWS / Azure / GCP).
  • Understanding of Delta Lake, time travel, and data versioning.
  • Familiarity with orchestration tools like Airflow or Azure Data Factory.

Mandatory Hands-on Coding Assessment (During Interview) : Candidates will be required to demonstrate hands-on proficiency in the following areas :

1. Window Functions : o Implement logic using ROW_NUMBER, RANK, and DENSE_RANK in Spark. o Use cases such as deduplication, ranking within groups.

2. Merge / Upsert Logic : o Write PySpark code to perform MERGE operations in Delta Lake.

3. SCD Implementation : o SCD Type 1 : Overwriting existing records. o SCD Type 2 : Versioning records with effective start / end dates or is_current flags.

4. CDC (Change Data Capture) : o Capture and process changes using techniques such as : ▪ Comparison with previous snapshots ▪ Using audit columns or timestamps ▪ Kafka-based event-driven ingestion

Create a job alert for this search

Senior Data Engineer • Mumbai, Maharashtra, India