Role Overview :
Design, build, and optimize large-scale, production-grade data pipelines and analytics platforms on Azure, leveraging Databricks, Synapse, and the broader Microsoft data ecosystem.
Deliver business-critical data assets for analytics, BI, and AI / ML initiatives.
Key Technical Responsibilities :
- Architect modern data lakes using Azure Data Lake Storage Gen2 for batch and streaming workloads
- Build and maintain scalable ETL / ELT pipelines using Azure Data Factory and Databricks (PySpark, Scala, SQL)
- Orchestrate data workflows across ADF, Databricks, and Synapse Pipelines; implement modular and reusable data pipeline components
- Develop advanced notebooks and production jobs in Azure Databricks (PySpark, SparkSQL, Delta Lake)
- Optimize Spark jobs by tuning partitioning, caching, cluster configuration, and autoscaling for performance and cost
- Implement Delta Lake for ACID-compliant data lakes and enable time travel / audit features
- Engineer real-time data ingestion from Event Hubs, IoT Hub, and Kafka into Databricks and Synapse
- Transform and enrich raw data, building robust data models and marts for analytics and AI use cases
- Integrate structured, semi-structured, and unstructured sources, including APIs, logs, and files
- Implement data validation, schema enforcement, and quality checks using Databricks, PySpark, and tools like Great Expectations
- Manage access controls : Azure AD, Databricks workspace permissions, RBAC, Key Vault integration
- Enable end-to-end lineage and cataloging via Microsoft Purview (or Unity Catalog if multi-cloud)
- Automate deployment of Databricks assets (notebooks, jobs, clusters) using Databricks CLI / REST API, ARM / Bicep, or Terraform
- Build and manage CI / CD pipelines in Azure DevOps for data pipelines and infrastructure as code
- Containerize and deploy custom code with Azure Kubernetes Service (AKS) or Databricks Jobs as needed
- Instrument monitoring and alerting with Azure Monitor, Log Analytics, and Databricks native tools
- Diagnose and resolve performance bottlenecks in distributed Spark jobs and pipeline orchestrations
- Collaborate with data scientists, BI engineers, and business stakeholders to deliver data solutions
- Document design decisions, create technical specifications, and enforce engineering standards across the team
Required Skills & Experience :
Hands-on with :1. Azure Data Lake Gen2, Azure Data Factory, Azure Synapse Analytics, Azure Databricks
2. PySpark, SparkSQL, advanced SQL, Delta Lake
3. Data modeling (star / snowflake), partitioning, and data warehouse concepts
Strong Python programming and experience with workflow / orchestration (ADF, Airflow, or Synapse Pipelines)Infrastructure automation : ARM / Bicep, Terraform, Databricks CLI / API, Azure DevOpsDeep understanding of Spark internals, cluster optimization, cost management, and distributed computingData security, RBAC, encryption, and compliance (SOC2, ISO, GDPR / DPDPA)Excellent troubleshooting, performance tuning, and documentation skills(ref : hirist.tech)