Lead Data Engineer
Location : Banglore / Pune (Hybrid)
Mode : Hybrid
Shift Timing : 2 PM to 11 PM
Experience : 7-12 years.
3+ years in cloud data platforms and Databricks.
Purpose :
We are seeking a hands-on
Technical Lead
to drive the ingestion of high-volume mainframe RPC data into
Databricks , enabling scalable machine learning workflows. This role is critical to building a robust data foundation for training thousands of AI models that detect anomalous behavior across applications, services, and functions.
Key Responsibilities
Mainframe Data Ingestion :
Design and implement scalable pipelines to extract, parse, and ingest RPC logs and technical attributes from mainframe systems into Delta Lake on Databricks.
ML-Ready Data Engineering :
Transform and structure data for time-series modelling and anomaly detection across thousands of models.
ML Workflow Integration :
Collaborate with ML engineers to ensure data pipelines support SARIMA, ANN, and other model types; enable automated retraining and scoring.
Performance Optimization :
Tune Spark jobs, Delta Lake storage, and cluster configurations for billions of records and real-time aggregation.
FinOps & Cost Control :
Monitor and optimize Databricks resource usage; implement auto-scaling and cost-aware job scheduling.
Monitoring & Alerting :
Integrate Databricks-native alerting for pipeline health, data anomalies, and job failures.
Required Skills
Strong hands-on experience with
Databricks ,
Apache Spark ,
Delta Lake , and
MLflow
Proven expertise in
mainframe data integration
(e.g., SMF, RPC logs, VSAM, DB2)
Strong Python and PySpark programming skills
Preferred Skills
Databricks experience with
CI / CD tools
GitHub Actions
Knowledge of
FinOps principles
for cloud cost optimization
Lead Data Engineer • Delhi, India