Description :
Primary Job Title : Data Engineering Lead.
About The Opportunity :
We are seeking a highly skilled Lead Data Engineer with strong expertise in Python, Pandas, PySpark, AWS, and SQL to design, build, and manage scalable data solutions.
The ideal candidate will lead a team of data engineers, develop robust ETL pipelines, and collaborate with analytics, data science, and business teams to ensure high data quality and performance across cloud-based environments.
Role & Responsibilities :
- Lead the design and development of data pipelines for ingestion, transformation, and integration from multiple sources into the enterprise data platform.
- Implement data quality frameworks, validation checks, and monitoring solutions using Python and SQL.
- Optimize PySpark jobs for performance and scalability in a distributed computing environment (AWS EMR / Glue).
- Develop reusable ETL frameworks using PySpark and Pandas for data transformation and analytics.
- Manage and maintain cloud-based infrastructure and data storage (AWS S3, Redshift, Lambda, Glue, Athena).
- Collaborate with data scientists, analysts, and stakeholders to provide clean, structured, and accessible datasets.
- Oversee code reviews, performance tuning, and mentoring junior data engineers.
- Establish best practices for data governance, version control, and CI / CD integration.
- Troubleshoot production data issues and ensure system reliability, availability, and efficiency.
Must-Have Skills :
Programming : Python (advanced scripting, Pandas, PySpark).Cloud : AWS (S3, Glue, Lambda, Redshift, Athena, EMR).Database / Querying : Advanced SQL (Joins, Window Functions, Query Optimization).Big Data Tools : PySpark, Spark SQL.ETL Development : Data ingestion, transformation, and validation pipelines.Version Control : Git / GitHub / Bitbucket.Workflow Orchestration : Apache Airflow or equivalent (preferred).Good-to-Have Skills :
Experience with Docker or Kubernetes for containerization.Familiarity with CI / CD pipelines and DevOps practices.Exposure to data modeling, schema design, and partitioning strategies.Understanding of data lake and data warehouse architecture.Knowledge of monitoring tools (CloudWatch, Datadog, or Prometheus).Qualifications :
Bachelors or Masters degree in Computer Science, Data Engineering, or a related field.11- 16 years of total experience with at least 3- 4 years leading data engineering projects.Proven track record in handling large-scale data pipelines and cloud migrations.(ref : hirist.tech)