We are in the process of helping our client in hiring Lead Data Engineer which is Full Time role in Location in Pune and client is looking for immediate joiner or max 15 days
Lead Data Engineer Azure | Python | Pyspark
Job Type : Full Time
Job Positions : 2
Location : Pune (Hybrid model 3 days WFO required)
Job Description :
We are looking for a Senior Data Engineer with 7+ years of hands-on experience in architecting and optimizing complex data pipelines, with a strong command over Azure cloud ecosystem, Python, and PySpark. This hybrid role is based out of Pune and requires deep technical expertise in building scalable, resilient, and secure data platforms that drive business intelligence and style="">
Key Responsibilities :
- Design and develop complex SQL queries and Python scripts for data extraction, transformation, and processing.
- Build and optimize scalable data pipelines and architectures for performance, cost-efficiency, and resiliency.
- Lead on-prem to cloud data migration initiatives, especially into Azure-based environments.
- Develop and manage data models and implement effective ETL frameworks across large datasets.
- Implement batch and real-time data ingestion strategies using tools like Azure Data Factory, Kafka, and Spark.
- Utilize Azure Synapse, Azure Data Lake, Azure SQL, and other Azure-native services for data orchestration and storage.
- Ensure data quality, lineage, governance, and enforce security protocols including encryption and access control.
- Automate data workflows to improve delivery speed and minimize manual efforts.
- Collaborate with cross-functional teams including Data Scientists, Analysts, and Platform Engineers.
- Optional but preferred) Participate in managing CI / CD pipelines, infrastructure-as-code, and monitoring data platform health.
Core Skills & Requirements :
7+ years of proven experience in data engineering, with a focus on designing scalable data architectures, building automated pipelines, and working across large, complex datasets in enterprise environments.Deep expertise in writing, optimizing, and debugging complex, high-performance SQL queries across relational and cloud databasesAdvanced hands-on experience using Python for data wrangling, automation, and ETL / ELT pipeline orchestration.Proficient in distributed data processing using PySpark for big data pipelines in real-time and batch modes.Azure Synapse Analytics for scalable query processing and data warehousingAzure Data Factory (ADF) for orchestrating pipelines and data integrationAzure Data Lake (Gen2) for storage and structured / unstructured data ingestionAzure SQL, Azure Cosmos DB, and exposure to Azure FabricPractical experience with Apache Spark for in-memory computationSkilled in end-to-end ETL / ELT pipeline design, development, and optimizationExperience in on-premise to cloud migration projects, especially to Azure-based environmentsKnowledge of data modeling, delta-lake architecture, and lakehouse patterns for scalable analytics solutionsFocus on resiliency, cost-efficiency, and performance optimization of data workflowsUnderstanding of CI / CD concepts, with exposure to implementing automated deployments for data solutionsExperience with infrastructure-as-code, environment provisioning, and pipeline monitoring toolsHands-on implementation of data security measures, including encryption, RBAC, auditing, and PII protectionFamiliarity with governance standards, compliance practices, and best practices for enterprise data platformsStrong analytical and problem-solving abilitiesEffective communication and collaboration in cross-functional agile teamsSelf-driven and proactive in identifying quality gaps and proposing solutionsWillingness to continuously learn and adapt to new technologies and testing methodologiesMust-Have- Skills : Python & Pyspark, Advanced SQL , Azure Synapse Analytics, Azure Data Factory (ADF), Azure Data Lake & Azure SQL, Azure Cloud Data Migration, Data Security
Academic : Post Graduate / Graduate in Engineering / Technology / MBA
ref : hirist.tech)