About the Role :
We are looking for an experienced Lead Data Engineer with deep expertise in Big Data technologies, particularly within the Google Cloud Platform (GCP) ecosystem. The ideal candidate should have a strong command of PySpark / Spark, SQL, and Python, and a proven track record in building, optimizing, and managing large- scale data pipelines and cloud- native data platforms.
Key Responsibilities :
- Lead the design and implementation of scalable ETL / ELT pipelines using Spark (batch and stream) and Python
- Architect and optimize BigQuery solutions using advanced SQL, partitioning, clustering, and materialized views
- Guide the team on GCP services : Dataproc, GCS, BigQuery, Cloud Composer, Cloud Functions, IAM, and Cloud Logging
- Conduct code reviews and mentor team members on Spark optimization (caching, memory management, broadcast joins, skew handling)
- Drive Airflow DAG development, configuration management, and orchestration workflows
- Solve complex data engineering problems and contribute to architectural decisions for performance, scalability, and cost- efficiency
- Ensure data quality, governance, and security best practices are enforced across all data platforms
- Support team readiness through technical ramp- up and ongoing skill enhancement
Must- Have Skills :
Hands- on experience with PySpark / Spark core concepts, internal workings, transformations, and tuningStrong knowledge of SQL and BigQuery (including window functions, CTEs, performance tuning, joins)Proficiency in Python with strong problem- solving abilitiesDeep experience with GCP components : BigQuery, Dataproc, GCS, and Cloud ComposerUnderstanding of Airflow, including XComs, variables, schema- based DAG creation, and branchingExposure to Hive, partitioning (static / dynamic), and bucketed tablesFamiliarity with data pipeline orchestration, monitoring, and failure handlingSolid grasp of data security (column- level, row- level, IAM roles)(ref : hirist.tech)