Description :
Position : SME Application Support Engineer - Databricks 24 / 7 Operations
Work Mode : 1 / 6 / 5 rotational support across Morning, Afternoon, General, Weekend, Night support on need basis
Position Count : 3
Education : B.E / B.Tech / MCA
Total IT Experience : 6-10 years
Location : RCP Navi Mumbai
Responsibilities :
- Serve as First Level Escalation for 24 / 7 monitoring of Databricks clusters, jobs, workflows, repos, and data pipelines
- SME Level issue troubleshooting / analysis related to :
a. Cluster failures or auto-scaling issues
b. Job failures (PySpark / Scala / Spark SQL / Delta Live Tables)
c. Workspace availability issues
Work directly with application Dev owners to remediate pipeline failuresParticipate in resolution of Sev1 / Sev2 IncidentsPrepare RCAImplement Workspace governance, User access control (RBAC), Cluster policies, Data security best practicesEnsure compliance with Audit requirementsBuild custom dashboards / logging for Job performance, Failure analytics, Cluster utilizationMaintain SOPs, runbooks, Architecture diagrams provided by Data Engineering and Platform Engineering teamsIdentify recurring issues and report to L3 / Platform EngineeringSupport debugging complex Spark issues, including OOM in driver / executor, Long GC cyclesSkills :
6 to 10 years of experience in Big Data / Cloud Data Platform SupportSME of Databricks platform (clusters, jobs, repos, MLflow, warehouse)Expertise in UNIX, SQL, Shell ScriptingExpertise in Spark UI job debuggingStrong skill in CI / CD pipelines (Azure DevOps)Strong skill in Apache Spark, Azure Cloud(ref : hirist.tech)