Lead and mentor a team of data engineers in designing, developing, and maintaining scalable data pipelines.
Architect, build, and optimize ETL workflows using Python, PySpark, and SQL.
Collaborate with data scientists, analysts, and business teams to understand data requirements and deliver reliable solutions.
Implement and manage data integration from multiple structured and unstructured sources.
Design and maintain data lake / data warehouse solutions on AWS (S3, Glue, Redshift, EMR, Lambda) or Azure (Data Lake, Synapse, Databricks, Data Factory).
Ensure data quality, security, and compliance with best practices.
Optimize performance of large-scale data processing systems and pipelines.
Drive automation, CI / CD practices, and infrastructure-as-code for data platforms.
Provide technical leadership in solution design, code reviews, and architecture decisions.
Required Skills & Qualifications :
Strong proficiency in Python, PySpark, and SQL.
Proven experience in ETL design and development.
Hands-on expertise in big data frameworks (Spark, Hadoop ecosystem).
Deep understanding of cloud platforms - AWS (Glue, EMR, Redshift, S3, Lambda) or Azure (Data Factory, Synapse, Databricks, Data Lake).
Experience with data modeling, data warehousing, and performance optimization.
Strong knowledge of version control (Git), CI / CD pipelines, and DevOps practices.
Excellent problem-solving and analytical skills.
Strong communication and leadership skills with experience leading teams / projects.
Good to Have :
Experience with streaming platforms (Kafka, Kinesis, Event Hub).
Knowledge of containerization & orchestration (Docker, Kubernetes).
Exposure to machine learning pipelines or MLOps.
Familiarity with data governance and security frameworks.