About the job :
Location : Kochi / Bangalore (initially in Kochi)
Role Summary :
We are seeking a Data Engineering Lead to architect, build, and optimize scalable data pipelines and data platforms that power critical business decisions. The ideal candidate will have deep expertise in PySpark, ETL frameworks, SQL Stored Procedures, and Data Modeling, along with a strong focus on performance tuning across large datasets.
Key Responsibilities :
- Lead the design and development of scalable data pipelines and ETL workflows using PySpark and SQL
- Develop and maintain Stored Procedures to support batch processes and data transformations
- Design and implement robust data models (Star, Snowflake, and normalized models) across data warehouse and lake environments
- Work closely with BI, Analytics, and Product teams to understand data requirements and ensure data quality, lineage, and integrity
- Optimize data workflows for performance, cost, and scalability across distributed systems
- Manage data ingestion from structured and semi-structured sources (e.g., APIs, files, databases, streams)
- Support CI / CD and DevOps practices for data pipeline deployments
- Collaborate with cross-functional teams on data governance, access controls, and compliance
- Guide and mentor junior data engineers; perform code reviews and technical planning
Required Skills & Experience :
5+ years of experience in data engineering and data platform developmentExperience with Azure ADlS Synapse or Microsoft FabricExpert in PySpark / Spark SQL for large-scale distributed data processingSolid understanding of ETL / ELT architectures, data ingestion, and transformation strategiesExperience in data modeling techniques and toolsProven experience in performance tuning of queries, jobs, and pipelinesExperience with tools like Airflow or equivalent for orchestration and transformationsHands-on experience with version control, CI / CD, and data testing frameworksNice To Have :
Exposure to Data Virtualization platforms (Denodo, Dremio, etc.)Exposure to Purview or any Data Governance tools.(ref : hirist.tech)