Talent.com
Lead Data Pipeline Engineer (Azure / Fabric)

Lead Data Pipeline Engineer (Azure / Fabric)

Accion LabsPune, Republic Of India, IN
15 hours ago
Job description

Role Snapshot

  • Title : Senior Azure Data Engineer
  • Experience : 6–8+ years in Data Engineering ( minimum 4+ years on Azure , and ( Or ) 6 months to 1+ year with Microsoft Fabric )
  • Tech Focus : Microsoft Fabric, Azure Data Factory (ADF), Databricks (Python, PySpark, Spark SQL), Delta Lake, Power BI (DAX), Azure Storage, Lakehouse, Warehouse
  • Engagement : Client-facing, hands-on, design-to-delivery
  • Location : Any Accion Labs offices in India - Bangalore, Pune (Preferred), Mumbai, Hyderabad, Indore, Noida - THIS IS A HYBRID Work Model , NOT Remote.
  • Notice period : Preferred Immediate joiners or candidates who can join within 20 days are needed

Core Responsibilities

  • End-to-End Engineering
  • Design, implement, and deliver batch & streaming data pipelines into Fabric Lakehouse / Warehouse using ADF and Databricks with Delta Lake .
  • Data Architecture Understanding
  • Strong grasp of Bronze–Silver–Gold layering , incremental ingestion , watermarking , and best practices for scalable pipelines.
  • Medallion Architecture
  • Apply Bronze / Silver / Gold patterns , enforce schema evolution , handle late / dirty data , and implement SCD (Type 1 / 2) and late-arriving dimensions .
  • Fabric Platform & Security
  • Build solutions on Microsoft Fabric (OneLake, Lakehouse, Warehouse, Pipelines, Dataflows Gen2, Notebooks) .
  • Implement security layers : workspace & item permissions, RLS / OLS in Warehouse / Lakehouse SQL endpoints, credentialed connections / shortcuts to external storage, environments & capacities alignment.
  • ADF Orchestration & Reusability
  • Create parameterized, template-driven pipelines with reusable activities (ForEach, Lookup, Mapping Data Flows).
  • Ensure robust dependency management with retry / alert patterns.
  • Databricks Engineering Excellence
  • Author complex & nested notebooks (via %run / dbutils.Notebook.Run) in Python, PySpark, and Spark SQL for ETL / ELT.
  • Debug & troubleshoot jobs and clusters;
  • resolve skew, shuffle spills, checkpoint failures, schema drift, streaming backlogs .

  • Apply performance optimizations : partitioning & clustering, Z-ORDER, OPTIMIZE / VACUUM, file size tuning, AQE, broadcast joins, caching, checkpoint & trigger strategies for Structured Streaming.
  • Data Quality, Observability & Reliability
  • Implement data quality checks (validations, expectations), idempotency, exactly-once / at-least-once semantics, and dead-letter flows .
  • Set up monitoring & logging (Azure Monitor / Log Analytics, Databricks system tables, Fabric monitoring), with alerting & dashboards.
  • SQL
  • Strong understanding of MS SQL concepts ;
  • hands-on experience in writing functions and stored procedures , along with DDL / DML operations .

  • Design, Documentation & Governance
  • Contribute to data models (star / snowflake), semantic layers, dimensional design, and documentation (solution design docs, runbooks).
  • CI / CD & ADO Versioning Management
  • Implement branching strategy (Git / ADO) , perform PR reviews , manage environment promotion (Dev / Test / Prod), and support Fabric CI / CD process .
  • Leadership & Client Engagement
  • Mentor junior engineers;
  • enforce reusable & scalable patterns .

  • Run client demos and brainstorming discussions .
  • Be self-driven and innovative in solution delivery.
  • Must-Have Skills (Strong, Hands-On)

  • Microsoft Fabric (2024+)
  • OneLake, Lakehouse, Warehouse, Pipelines, Dataflows Gen2, Notebooks, capacities, workspace & item security, RLS / OLS.
  • Azure Data Factory (ADF)
  • Reusable, parameterized pipelines;
  • high-level orchestration;
  • robust scheduling, logging, retries, and alerts.

  • Databricks (5+ years on Azure)
  • Python, PySpark, Spark SQL : complex transformations, joins, window functions, UDFs / UDAs.
  • Complex & nested notebooks;
  • modular code with %run / dbutils.Notebook.Run.

  • Structured Streaming : watermarks, triggers, checkpointing, foreachBatch, schema evolution.
  • Delta Lake : Z-ORDER, OPTIMIZE / VACUUM, MERGE for SCD, Auto Optimize, compaction, time travel.
  • Performance tuning : partitioning, file sizing, broadcast hints, caching, Photon (where available), cluster sizing / pools.
  • Medallion Architecture
  • Bronze / Silver / Gold patterns, SCD (Type 1 / 2), handling late-arriving dimensions.
  • Azure Storage
  • ADLS Gen2 (hierarchical namespace), tiering (Hot / Cool / Archive), lifecycle & cost optimization, shortcuts into OneLake.
  • Data Warehousing
  • Dimensional modeling, fact / aggregate design, query performance tuning in Fabric Warehouse & Lakehouse SQL endpoint.
  • SQL
  • Excellent SQL development;
  • advanced joins, windowing, CTEs, performance tuning / indexing where applicable.

  • Power BI (DAX)
  • Awareness of Power BI and DAX;
  • RLS alignment with Warehouse / Lakehouse.

  • Security & Compliance
  • RBAC, item-level permissions, credentials for data sources, RLS / OLS, secret management (Key Vault), PII handling.
  • ETL / ELT Methodologies
  • Robust, testable pipelines;
  • idempotency;
  • errorhandling;
  • data quality gates.

  • Ways of Working
  • Agile delivery, client-facing communication, crisp demos, documentation, and best-practice advocacy.
  • If interested or know anyone, Kindly write to me at shruti.saboo@accionlabs.com along with your latest CV.

    Thank You,

    Shruti Saboo

    Create a job alert for this search

    Data Pipeline Engineer • Pune, Republic Of India, IN