Description
Responsibilities :
- Develop & Optimize Data Pipelines Build, test, and maintain ETL / ELT data pipelines using Azure Databricks & Apache Spark (PySpark) .Optimize performance and cost-efficiency of Spark jobs.Ensure data quality through validation, monitoring, and alerting mechanisms.Understand cluster types, configuration, and use-case for serverless
- Implement Unity Catalog for Data Governance Design and enforce access control policies using Unity Catalog.Manage data lineage, auditing, and metadata governance .Enable secure data sharing across teams and external stakeholders.
- Integrate with Cloud Data Platforms Work with Azure Data Lake Storage / Azure Blob Storage / Azure Event Hub to integrate Databricks with cloud-based data lakes, data warehouses, and event streams .Implement Delta Lake for scalable, ACID-compliant storage.
- Automate & Orchestrate Workflows Develop CI / CD pipelines for data workflows using Azure Databricks Workflows or Azure Data Factory .Monitor and troubleshoot failures in job execution and cluster performance .
- Collaborate with Stakeholders Work with Data Analysts, Scientists, and Business Teams to understand requirements.Translate business needs into scalable data engineering solutions .
- API expertiseAbility to pull data from a wide variety of APIs using different strategies and methods
Required Skills & Experience :
Azure Databricks & Apache Spark (PySpark) – Strong experience in building distributed data pipelines .Python – Proficiency in writing optimized and maintainable Python code for data engineering.Unity Catalog – Hands-on experience implementing data governance, access controls, and lineage tracking .SQL – Strong knowledge of SQL for data transformations and optimizations.Delta Lake – Understanding of time travel, schema evolution, and performance tuning .Workflow Orchestration – Experience with Azure Databricks Jobs or Azure Data Factory .CI / CD & Infrastructure as Code (IaC) – Familiarity with Databricks CLI, Databricks DABs, and DevOps principles .Security & Compliance – Knowledge of IAM, role-based access control (RBAC), and encryption .Preferred Qualifications :
Experience with MLflow for model tracking & deployment in Databricks.Familiarity with streaming technologies (Kafka, Delta Live Tables, Azure Event Hub, Azure Event Grid).Hands-on experience with dbt (Data Build Tool) for modular ETL development.Certification in Databricks, Azure is a plus.Experience with Azure Databricks Lakehouse connectors for SalesForce and SQL ServerExperience with Azure Synapse Link for Dynamics, dataverseFamiliarity with other data pipeline strategies, like Azure Functions, Fabric, ADF, etcSoft Skills :
Strong problem-solving and debugging skills.Ability to work independently and in teams .Excellent communication and documentation skills.