About the Role :
We are seeking an experienced Data Engineer with a strong background in building scalable data platforms, big data engineering, and end-to-end data lifecycle management. The ideal candidate will have deep expertise in data governance, modelling, and architecture, with hands-on experience in modern data platforms, pipelines, and analytics tools. This role involves designing and maintaining robust data systems that empower business stakeholders to drive data-driven decision-making at scale.
Key Responsibilities :
Data Architecture & Governance :
- Architect and define end-to-end data flows for Big Data / Data Lake use cases.
- Implement best practices in data governance, data quality, master data management, and data security.
- Collaborate with enterprise / domain architects to align data solutions with enterprise roadmaps.
- Participate in Technical Design Authority forums to influence and validate architectural decisions.
Pipeline Development & Data Engineering :
Design, develop, and optimize scalable ETL / ELT pipelines across diverse data sources (cloud, on-premises, SQL / NoSQL, APIs).Automate data ingestion and transformation processes, ensuring performance, scalability, and reliability.Implement real-time, batch, and scheduled data ingestion using tools like Apache Sqoop, Flume, Kinesis, Logstash, FluentD.Work with Databricks, Spark, Hive, Hadoop, Azure Data Factory, Scala, Python, R to deliver robust data processing workflows.Optimize pipeline performance by analyzing physical / logical execution plans.Data Management & Analytics Enablement :
Collaborate with analytics teams to improve data models feeding BI tools (e.g., Power BI, Tableau).Build and maintain OLAP cubes to address BI limitations and enable complex business analysis.Deliver data cleansing, validation, and enrichment solutions to ensure data accuracy.Lead initiatives in data mining, statistical analysis, and advanced data modelling (Star / Snowflake schemas, SCD2).Operations & Performance Optimization :
Estimate and optimize cluster / core sizes for Databricks clusters and Analysis Services.Deploy and maintain CI / CD DevOps pipelines across development, staging, and production environments.Monitor, troubleshoot, and enhance system performance, ensuring optimal data ingestion and storage.Conduct continuous audits of data systems to identify gaps, performance bottlenecks, or security loopholes.Leadership & Collaboration :
Act as a coach / mentor to junior data engineers, providing technical guidance and enforcing best practices.Collaborate cross-functionally with business stakeholders, analytics teams, and engineering squads to deliver business outcomes.Allocate and track tasks across the team, reporting progress and deliverables to management.Essential Qualifications & Skills :
Education : Bachelors degree in Computer Science, Engineering, or related field (Masters preferred).
Experience :
10+ years in data analytics platforms, ETL / ELT transformations, SQL programming.5+ years hands-on experience in Big Data Engineering, Data Lakes, Distributed Systems.Technical Expertise :
Strong proficiency in Hadoop ecosystem (HDFS, Hive, Sqoop, Oozie, Spark Core / Streaming).Programming in Scala, Java, Python, Shell scripting.Deep experience with Azure Data Platform (Azure SQL DB, Data Factory, Cosmos DB).Database expertise : Oracle, MySQL, MongoDB, Presto.Data ingestion / extraction using REST API, ODATA, JSON, XML, Web Services.Core Skills :
Strong foundation in data modelling, warehousing, and architecture principles.Hands-on experience with ETL tools and best practices.Solid understanding of data security (encryption, tunneling, access control).Proven ability in troubleshooting and performance optimization.(ref : hirist.tech)