Lead Data Engineer
Location : Banglore / Pune (Hybrid)
Mode : Hybrid
Shift Timing : 2 PM to 11 PM
Experience : 7-12 years. 3+ years in cloud data platforms and Databricks.
Purpose :
We are seeking a hands-on Technical Lead to drive the ingestion of high-volume mainframe RPC data into Databricks, enabling scalable machine learning workflows. This role is critical to building a robust data foundation for training thousands of AI models that detect anomalous behavior across applications, services, and functions.
Key Responsibilities
- Mainframe Data Ingestion :
- Design and implement scalable pipelines to extract, parse, and ingest RPC logs and technical attributes from mainframe systems into Delta Lake on Databricks.
- ML-Ready Data Engineering :
- Transform and structure data for time-series modelling and anomaly detection across thousands of models.
- ML Workflow Integration :
- Collaborate with ML engineers to ensure data pipelines support SARIMA, ANN, and other model types; enable automated retraining and scoring.
- Performance Optimization :
- Tune Spark jobs, Delta Lake storage, and cluster configurations for billions of records and real-time aggregation.
- Fin Ops & Cost Control :
- Monitor and optimize Databricks resource usage; implement auto-scaling and cost-aware job scheduling.
- Monitoring & Alerting :
- Integrate Databricks-native alerting for pipeline health, data anomalies, and job failures.
Required Skills
Strong hands-on experience with Databricks, Apache Spark, Delta Lake, and MLflowProven expertise in mainframe data integration (e.g., SMF, RPC logs, VSAM, DB2)Strong Python and Py Spark programming skillsPreferred Skills
Databricks experience with CI / CD tools Git Hub ActionsKnowledge of Fin Ops principles for cloud cost optimization