Description :
- Design, develop, and maintain scalable data pipelines using Azure Databricks, Apache Spark, and Delta Lake.
- Implement data ingestion, transformation, and integration from multiple structured and unstructured data sources.
- Optimize ETL / ELT processes for performance, cost-efficiency, and scalability in Azure environments.
- Build and manage data models, ensuring data quality, consistency, and lineage across systems.
- Collaborate with cross-functional teams (data scientists, analysts, business users) to understand requirements and deliver data solutions.
- Integrate Databricks with other Azure services such as Azure Data Lake Storage (ADLS), Synapse Analytics, Azure SQL Database, and Azure Data Factory.
- Develop and manage notebooks, jobs, and clusters within Azure Databricks for batch and streaming workloads.
- Implement CI / CD pipelines for data workflows using tools like Azure DevOps or GitHub Actions.
- Monitor, debug, and tune performance of Spark jobs and clusters.
- Ensure compliance, data security, and governance following organizational and regulatory standards.
- Document technical designs, processes, and best practices.
Required Skills and Qualifications :
Bachelors or Masters degree in Computer Science, Information Systems, or a related field.4+ years of experience in data engineering or big data development, with at least 2+ years in Azure Databricks.Strong hands-on experience in Apache Spark (PySpark / Scala / SQL) for data transformation and processing.Proficiency in Azure Data Lake (ADLS), Azure Data Factory (ADF), Azure Synapse, and Azure SQL Database.Experience building Delta Lake architectures and implementing medallion (bronze-silver-gold) data models.Solid understanding of ETL / ELT design, orchestration, and performance optimization.Experience with CI / CD pipelines, Git, and DevOps principles.Familiarity with data security, compliance, and access control within Azure.Strong problem-solving skills and the ability to troubleshoot distributed data workflows.Excellent communication skills and the ability to work in agile, cross-functional teams.Preferred Skills :
Experience with Power BI or other visualization tools for data consumption.Exposure to machine learning pipelines or integration with Azure Machine Learning.Knowledge of Databricks REST APIs, Unity Catalog, and MLflow.Experience with real-time data streaming tools (Kafka, Event Hubs, or Spark Streaming).Familiarity with infrastructure-as-code (IaC) using Terraform or ARM templates.Understanding of data governance, metadata management, and lineage tracking.Key Attributes :
Strong analytical and data-driven mindset.Excellent problem-solving and performance-tuning abilities.Detail-oriented with a focus on data integrity and reliability.Ability to manage multiple data projects simultaneously.Passion for continuous learning and adopting modern data technologies(ref : hirist.tech)