Lead Data Pipeline Engineer (Azure / Fabric)

Accion LabsPune, Republic Of India, IN

15 hours ago

Job description

Role Snapshot

Title : Senior Azure Data Engineer
Experience : 6–8+ years in Data Engineering ( minimum 4+ years on Azure , and ( Or ) 6 months to 1+ year with Microsoft Fabric )
Tech Focus : Microsoft Fabric, Azure Data Factory (ADF), Databricks (Python, PySpark, Spark SQL), Delta Lake, Power BI (DAX), Azure Storage, Lakehouse, Warehouse
Engagement : Client-facing, hands-on, design-to-delivery
Location : Any Accion Labs offices in India - Bangalore, Pune (Preferred), Mumbai, Hyderabad, Indore, Noida - THIS IS A HYBRID Work Model , NOT Remote.
Notice period : Preferred Immediate joiners or candidates who can join within 20 days are needed

Core Responsibilities

End-to-End Engineering

Design, implement, and deliver batch & streaming data pipelines into Fabric Lakehouse / Warehouse using ADF and Databricks with Delta Lake .

Data Architecture Understanding

Strong grasp of Bronze–Silver–Gold layering , incremental ingestion , watermarking , and best practices for scalable pipelines.

Medallion Architecture

Apply Bronze / Silver / Gold patterns , enforce schema evolution , handle late / dirty data , and implement SCD (Type 1 / 2) and late-arriving dimensions .

Fabric Platform & Security

Build solutions on Microsoft Fabric (OneLake, Lakehouse, Warehouse, Pipelines, Dataflows Gen2, Notebooks) .

Implement security layers : workspace & item permissions, RLS / OLS in Warehouse / Lakehouse SQL endpoints, credentialed connections / shortcuts to external storage, environments & capacities alignment.

ADF Orchestration & Reusability

Create parameterized, template-driven pipelines with reusable activities (ForEach, Lookup, Mapping Data Flows).

Ensure robust dependency management with retry / alert patterns.

Databricks Engineering Excellence

Author complex & nested notebooks (via %run / dbutils.Notebook.Run) in Python, PySpark, and Spark SQL for ETL / ELT.

Debug & troubleshoot jobs and clusters;

resolve skew, shuffle spills, checkpoint failures, schema drift, streaming backlogs .

Apply performance optimizations : partitioning & clustering, Z-ORDER, OPTIMIZE / VACUUM, file size tuning, AQE, broadcast joins, caching, checkpoint & trigger strategies for Structured Streaming.

Data Quality, Observability & Reliability

Implement data quality checks (validations, expectations), idempotency, exactly-once / at-least-once semantics, and dead-letter flows .

Set up monitoring & logging (Azure Monitor / Log Analytics, Databricks system tables, Fabric monitoring), with alerting & dashboards.

SQL

Strong understanding of MS SQL concepts ;

hands-on experience in writing functions and stored procedures , along with DDL / DML operations .

Design, Documentation & Governance

Contribute to data models (star / snowflake), semantic layers, dimensional design, and documentation (solution design docs, runbooks).

CI / CD & ADO Versioning Management

Implement branching strategy (Git / ADO) , perform PR reviews , manage environment promotion (Dev / Test / Prod), and support Fabric CI / CD process .

Leadership & Client Engagement

Mentor junior engineers;

enforce reusable & scalable patterns .

Run client demos and brainstorming discussions .

Be self-driven and innovative in solution delivery.

Must-Have Skills (Strong, Hands-On)

Microsoft Fabric (2024+)

OneLake, Lakehouse, Warehouse, Pipelines, Dataflows Gen2, Notebooks, capacities, workspace & item security, RLS / OLS.

Azure Data Factory (ADF)

Reusable, parameterized pipelines;

high-level orchestration;

robust scheduling, logging, retries, and alerts.

Databricks (5+ years on Azure)

Python, PySpark, Spark SQL : complex transformations, joins, window functions, UDFs / UDAs.

Complex & nested notebooks;

modular code with %run / dbutils.Notebook.Run.

Structured Streaming : watermarks, triggers, checkpointing, foreachBatch, schema evolution.

Delta Lake : Z-ORDER, OPTIMIZE / VACUUM, MERGE for SCD, Auto Optimize, compaction, time travel.

Performance tuning : partitioning, file sizing, broadcast hints, caching, Photon (where available), cluster sizing / pools.

Medallion Architecture

Bronze / Silver / Gold patterns, SCD (Type 1 / 2), handling late-arriving dimensions.

Azure Storage

ADLS Gen2 (hierarchical namespace), tiering (Hot / Cool / Archive), lifecycle & cost optimization, shortcuts into OneLake.

Data Warehousing

Dimensional modeling, fact / aggregate design, query performance tuning in Fabric Warehouse & Lakehouse SQL endpoint.

SQL

Excellent SQL development;

advanced joins, windowing, CTEs, performance tuning / indexing where applicable.

Power BI (DAX)

Awareness of Power BI and DAX;

RLS alignment with Warehouse / Lakehouse.

Security & Compliance

RBAC, item-level permissions, credentials for data sources, RLS / OLS, secret management (Key Vault), PII handling.

ETL / ELT Methodologies

Robust, testable pipelines;

idempotency;

errorhandling;

data quality gates.

Ways of Working

Agile delivery, client-facing communication, crisp demos, documentation, and best-practice advocacy.

If interested or know anyone, Kindly write to me at shruti.saboo@accionlabs.com along with your latest CV.

Thank You,

Shruti Saboo

Create a job alert for this search

Data Pipeline Engineer • Pune, Republic Of India, IN