Data Pipeline Development : Design, build, and maintain efficient, scalable, and reliable data pipelines using Databricks and Python to ingest, transform, and load data from various sources.
Databricks Expertise : Leverage advanced features of Databricks, including Delta Lake, Spark SQL, and Databricks notebooks, for large-scale data processing and analytics.
ETL / ELT Processes : Develop and optimize complex ETL / ELT processes to ensure data quality, consistency, and timely availability.
Data Modeling : Collaborate with data scientists and analysts to understand data requirements and design appropriate data models for optimal storage and retrieval.
Performance Optimization : Identify and resolve performance bottlenecks in data pipelines and Databricks jobs, optimizing data processing efficiency.
Cloud Integration : Work with cloud services (e.g., Azure Data Lake Storage, AWS S3, Google Cloud Storage) for data storage and integration with Databricks.
Automation : Implement automation for data pipeline orchestration, monitoring, and alerting.
Collaboration : Work closely with cross-functional teams, including data scientists, analysts, and software engineers, to deliver comprehensive data solutions.
Documentation : Create and maintain technical documentation for data pipelines, data models, and Skills & Qualifications
Experience : 6 to 10 years of hands-on experience as a Data Engineer.
Data Bricks : Strong, proven experience with Databricks platform, including Spark SQL and Delta Lake.
Python : Expert-level proficiency in Python for data engineering tasks.
SQL : Excellent command of SQL for data manipulation, querying, and optimization.
Data Warehousing : Solid understanding of data warehousing concepts, data modeling (star schema, snowflake schema), and ETL / ELT principles.
Cloud Platforms : Experience with at least one major cloud provider's data services (e.g., Azure Data Factory, AWS Glue, Google Cloud Dataflow).
Version Control : Proficiency with version control systems, especially Git.
Problem-Solving : Strong analytical and problem-solving skills to tackle complex data challenges.
Communication : Excellent verbal and written communication skills.
Education : Bachelor's degree in Computer Science, Engineering, or a related quantitative field.