Key Responsibilities :
Design develop and maintain robust ETL pipelines using Databricks and Python.
Implement and optimize the Medallion Architecture (Bronze Silver Gold layers) within our Data Lakehouse ecosystem.
Collaborate with data engineers data scientists and business stakeholders to translate business requirements into scalable data solutions.
Perform data ingestion transformation cleansing and enrichment from various structured and unstructured data sources.
Optimize Spark jobs for performance and cost-efficiency on Databricks.
Implement best practices for data governance security and quality within the data pipelines.
Mentor junior team members and contribute to improving team processes and standards.
Troubleshoot and resolve data pipeline and platform-related issues promptly.
Required Skills & Qualifications :
Strong proficiency in Python programming and libraries related to data processing (PySpark preferred).
Hands-on experience with Databricks platform and Apache Spark.
Deep understanding of ETL concepts and implementation in large-scale data environments.
Expertise in Medallion Architecture and Data Lakehouse design patterns.
Experience with data storage technologies like Delta Lake Parquet and cloud data platforms (AWS Azure or GCP).
Familiarity with SQL and performance tuning of Spark SQL queries.
Strong problem-solving skills and attention to detail.
Excellent communication and collaboration skills.
Preferred Qualifications :
Experience with containerization (Docker / Kubernetes) and orchestration tools (Airflow Azure Data Factory).
Knowledge of CI / CD pipelines for data workflows.
Exposure to machine learning pipelines and MLOps.
Key Skills
Apache Hive,S3,Hadoop,Redshift,Spark,AWS,Apache Pig,NoSQL,Big Data,Data Warehouse,Kafka,Scala
Employment Type : Full Time
Experience : years
Vacancy : 1
Data • Bengaluru, Karnataka, India