We are seeking a dynamic Data Engineer with strong hands-on experience in Databricks Lakehouse Platform to build and optimize scalable data solutions on AWS or Azure. The ideal candidate will have a solid foundation in data engineering and distributed data processing (Spark / PySpark), with a deep understanding of Delta Lake, Medallion Architecture, and Databricks-native tools. You will play a key role in designing modern ELT pipelines, ensuring data quality, performance, and governance, and collaborating across teams to deliver high-impact analytics and AI-ready datasets.
Key Responsibilities
- Design and implement end-to-end data pipelines in Databricks leveraging Spark (PySpark), SQL, and Delta Lake for large-scale data ingestion, transformation, and delivery.
- Apply Medallion Architecture (Bronze / Silver / Gold) to structure and optimize data flows for analytics and ML use cases.
- Develop and operationalize ELT frameworks using Databricks Workflows, Airflow, or Azure Data Factory, ensuring reliability and scalability.
- Optimize Databricks clusters, jobs, and queries for performance and cost efficiency, using tools like Photon, Job Clusters, and Auto-Optimize.
- Collaborate with data analysts, scientists, and architects to design data models and expose curated datasets via Databricks SQL, dbt, or BI tools.
- Enforce data quality, validation, and lineage through Delta Live Tables (DLT) or Unity Catalog.
- Implement data governance, access control, and audit compliance aligned with enterprise policies.
- Troubleshoot pipeline and performance issues using Databricks monitoring, CloudWatch, or Azure Monitor.
- Contribute to continuous improvement by evaluating new Databricks features, runtime upgrades, and best practices.
Skills Required
Airflow, Bi Tools, Pyspark, DLT, Azure Data Factory, Cloudwatch, dbt, Spark, Azure, Aws