Description : Job Summary :
We are seeking a highly skilled Lead Data Engineer with strong expertise in SQL, Python, Big Data technologies, and Google BigQuery. The ideal candidate will design and build scalable data pipelines, lead data integration initiatives, and ensure optimal data quality and performance across large datasets. This role involves close collaboration with data analysts, scientists, and business teams to deliver reliable and high-performing data solutions.
Roles & Responsibilities :
- Design, develop, and maintain scalable and reliable data pipelines for batch and real-time processing.
- Lead and mentor a team of data engineers in best practices, performance tuning, and code quality.
- Implement ETL / ELT workflows using Python, SQL, and modern orchestration tools (Airflow, Dataflow, etc.).
- Optimize and manage large-scale datasets using BigQuery, Data Lake, or Data Warehouse architectures.
- Collaborate with analytics, BI, and business teams to ensure data accuracy, consistency, and availability.
- Work with Big Data tools (e.g., Hadoop, Spark, Kafka) for data transformation and aggregation.
- Ensure proper data governance, lineage, and documentation of data flows.
- Develop and maintain unit tests, monitoring, and data quality validation frameworks.
- Integrate data from multiple sources (APIs, databases, cloud storage) into unified data systems.
- Participate in architecture reviews and contribute to the overall data platform strategy.
Required Skills & Qualifications :
Bachelors or Masters degree in Computer Science, Information Technology, or related field.7+ years of experience in data engineering with at least 2+ years in a lead role.Strong proficiency in SQL (complex queries, performance tuning, stored procedures).Hands-on experience with Python for data processing and automation.Expertise in Google BigQuery (partitioning, clustering, optimization, query design).Strong experience with Big Data tools like Apache Spark, Hadoop, Hive, or Kafka.Familiarity with ETL orchestration tools (Apache Airflow, Dataflow, Luigi, etc.).Working knowledge of cloud platforms (GCP preferred; AWS or Azure is a plus).Strong understanding of data modeling, warehousing, and schema design.Experience with version control (Git) and CI / CD pipelines for data projects.Preferred Skills (Good to Have) :
Experience with DBT or Alembic for data transformations and migrations.Exposure to containerization tools (Docker, Kubernetes).Knowledge of BI tools (Power BI, Looker, Tableau).Familiarity with machine learning data preparation pipelines(ref : hirist.tech)