Lead Data Engineer
Location : Banglore / Pune (Hybrid)
Mode : Hybrid
Shift Timing : 2 PM to 11 PM
Experience : 7-12 years. 3+ years in cloud data platforms and Databricks.
Purpose :
We are seeking a hands-on Technical Lead to drive the ingestion of high-volume mainframe RPC data into Databricks , enabling scalable machine learning workflows. This role is critical to building a robust data foundation for training thousands of AI models that detect anomalous behavior across applications, services, and functions.
Key Responsibilities
- Mainframe Data Ingestion :
- Design and implement scalable pipelines to extract, parse, and ingest RPC logs and technical attributes from mainframe systems into Delta Lake on Databricks.
- ML-Ready Data Engineering :
- Transform and structure data for time-series modelling and anomaly detection across thousands of models.
- ML Workflow Integration :
- Collaborate with ML engineers to ensure data pipelines support SARIMA, ANN, and other model types; enable automated retraining and scoring.
- Performance Optimization :
- Tune Spark jobs, Delta Lake storage, and cluster configurations for billions of records and real-time aggregation.
- FinOps & Cost Control :
- Monitor and optimize Databricks resource usage; implement auto-scaling and cost-aware job scheduling.
- Monitoring & Alerting :
- Integrate Databricks-native alerting for pipeline health, data anomalies, and job failures.
Required Skills
Strong hands-on experience with Databricks , Apache Spark , Delta Lake , and MLflowProven expertise in mainframe data integration (e.g., SMF, RPC logs, VSAM, DB2)Strong Python and PySpark programming skillsPreferred Skills
Databricks experience with CI / CD tools GitHub ActionsKnowledge of FinOps principles for cloud cost optimization