Talent.com
Databricks AI Platform SRE_Director_Infrastructure Production Management & Reliability Engineering
Databricks AI Platform SRE_Director_Infrastructure Production Management & Reliability EngineeringMorgan Stanley • Bangalore, India
Databricks AI Platform SRE_Director_Infrastructure Production Management & Reliability Engineering

Databricks AI Platform SRE_Director_Infrastructure Production Management & Reliability Engineering

Morgan Stanley • Bangalore, India
2 days ago
Job description

Profile Description

We're seeking someone to join our Enterprise Technology team as a Databricks AI Platform SRE, in Enterprise Computing (EC) to join our Platform SRE team. This role will be critical in designing, building, and optimizing a scalable, secure, and developer-friendly Databricks platform to enable Machine Learning (ML) and Artificial Intelligence (AI) workloads at enterprise scale. You will partner with ML engineer, data scientists, platform teams, and cloud architects to automate infrastructure, enforce best practices, and streamline the end-to-end ML lifecycle using modern cloud-native technologies.

In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our communities.

This is Director position that maintains the stability and reliability of the organization's infrastructure systems, ensuring optimal performance and availability to support business operations.

Since 1935, Morgan Stanley is known as a global leader in financial services, always evolving and innovating to better serve our clients and our communities in more than 40 countries around the world.

What you'll do in the role :

  • Design and implement secure, scalable, and automated Databricks environments to support AI / ML workloads.
  • Develop infrastructure-as-code (IaC) solutions using Terraform for provisioning Databricks, cloud resources, and network configurations.
  • Build automation and self-service capabilities using Python, Java and APIs for platform onboarding, workspace provisioning, orchestration and monitoring.
  • Collaborate with data science and ML teams to define compute requirements, governance policies, and efficient workflows across dev / qa / prod environments.
  • Integrate Databricks offering with cloud-native services on Azure / AWS- Champion CI / CD and GitOps for managing ML infrastructure and configurations.- Ensure compliance with enterprise security and data governance policies using RBAC, Audit Controls, Encryption, Network Isolation, and policies.
  • Monitor platform performance, reliability, and usage, and drive improvements to optimize cost and resource utilizations.

What you'll bring to the role :

  • At least 4+ years' relevant experience would generally be expected to find the skills required for this role .
  • Proven experience with Terraform for building and managing infrastructure.
  • Strong programming skills in Python and Java.
  • Hands-on experience with cloud networking, identity and access management, key vaults, monitoring, and logging in Azure.
  • Hands on experience with Databricks (Workspace management, Clusters, Jobs, MLFlow, Delta Lake, Unity Catalog, Mosaic AI).
  • Deep understanding of Azure or AWS infrastructure (e.g. IAM, VNets / VPC, Storage, Networks, Compute, Key management, monitoring)- Strong experience in distributed system design, development and deployment using agile / devops practices.
  • Experience with CI / CD pipelines (GitHub Actions, or similar)
  • Experience implementing monitoring and observability using Prometheus, Grafana or Databricks-native solutions.
  • Good communication skills, excellent teamwork experience, ability to mentor and develop more junior developers, including participating in constructive code reviews.
  • Experience in multi-cloud environments (AWS / GCP) is a bonus.
  • Experience in working in highly regulated environments (finance, healthcare, etc.) is desirable-
  • Experience with Databricks REST APIs and SDKs- Knowledge of MLFlow, Mosaic AC, & MLOps tooling-
  • Working with teams using Scrum, Kanban or other agile practices
  • Proficiency with standard Linux command line and debugging tools
  • Azure or AWS Certifications
  • WHAT YOU CAN EXPECT FROM MORGAN STANLEY :

    We are committed to maintaining the first-class service and high standard of excellence that have defined Morgan Stanley for over 89 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren't just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you'll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There's also ample opportunity to move about the business for those who show passion and grit in their work.

    To learn more about our offices across the globe, please copy and paste https : / / www.morganstanley.com / about-us / global-offices into your browser.

    Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents.

    Create a job alert for this search

    Ai Platform • Bangalore, India

    Related jobs
    Director of AI Solutions Engineering

    Director of AI Solutions Engineering

    Idexcel • Bengaluru, Republic Of India, IN
    JD for Director of Engineering – AI Document Intelligence & Automation.This is a hands-on, high-impact leadership role : . We are now productizing this internal platform into a scalable, standalone so...Show more
    Last updated: 21 days ago • Promoted
    AI / ML Solutions Director

    AI / ML Solutions Director

    True ValueHub, Inc. • Bengaluru, Republic Of India, IN
    True ValueHub is an AI-native B2B SaaS solution for manufacturing companies.We help manufacturing companies save costs on direct material procurement. Our AI engine provides visibility into the true...Show more
    Last updated: 30+ days ago • Promoted
    Senior Analog Director- Pcie-Gen7, Ucie

    Senior Analog Director- Pcie-Gen7, Ucie

    Mulya Technologies • Bengaluru, Republic Of India, IN
    UCIe ( Senior Director level / Director ).Location : Bengaluru / Hyderabad.About Omni Design Technologies.Omni Design Technologies is a leading provider of high-performance, ultra-low power IP cores...Show more
    Last updated: 21 days ago • Promoted
    Storage Engineer_Director_Infrastructure Production Management & Reliability Engineering

    Storage Engineer_Director_Infrastructure Production Management & Reliability Engineering

    Morgan Stanley • Bangalore, India
    We're seeking someone to join our Enterprise Technology team as a Storage Engineer, in Enterprise Computing (EC) to who will be responsible for managing infrastructure, handling incidents, implemen...Show more
    Last updated: 9 days ago • Promoted
    Director of AI / ML Strategy

    Director of AI / ML Strategy

    True ValueHub, Inc. • Bengaluru, Republic Of India, IN
    True ValueHub is an AI-native B2B SaaS solution for manufacturing companies.We help manufacturing companies save costs on direct material procurement. Our AI engine provides visibility into the true...Show more
    Last updated: 30+ days ago • Promoted
    Director of Engineering

    Director of Engineering

    Grizmo Labs • Bengaluru, Karnataka, India
    Bachelor's or Master's in Computer Science, Engineering, or a related field.Proven expertise in scalable, high-performance, and resilient system design. Strong knowledge of serverless architecture, ...Show more
    Last updated: 22 days ago • Promoted
    Engineering Leader- Agentic AI

    Engineering Leader- Agentic AI

    ThoughtSpot • Bangalore Urban, Karnataka, India
    AI-powered tool in the emerging market of AI power.Spotter leverages machine learning and AI to continuously.As Director of Engineering, you’ll lead a team of managers and senior engineers in devel...Show more
    Last updated: 19 days ago • Promoted
    Director of AI / ML Engineering

    Director of AI / ML Engineering

    True ValueHub, Inc. • Bengaluru, Republic Of India, IN
    True ValueHub is an AI-native B2B SaaS solution for manufacturing companies.We help manufacturing companies save costs on direct material procurement. Our AI engine provides visibility into the true...Show more
    Last updated: 30+ days ago • Promoted
    Director AI / ML

    Director AI / ML

    True ValueHub, Inc. • Bengaluru, Karnataka, India
    True ValueHub is an AI-native B2B SaaS solution for manufacturing companies.We help manufacturing companies save costs on direct material procurement. Our AI engine provides visibility into the true...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer - Director- Software Production Management & Reliability Engineering

    Site Reliability Engineer - Director- Software Production Management & Reliability Engineering

    Morgan Stanley • Bangalore, India
    We're seeking someone to join our CDRR Technology team as a Site Reliability Engineer, in Cyber to help drive performance, reliability, enhanced observability and efficiency for the department's Da...Show more
    Last updated: 2 days ago • Promoted
    SRE_Director_Software Prod Management and Reliability Engineering

    SRE_Director_Software Prod Management and Reliability Engineering

    Morgan Stanley • Bangalore, India
    We're seeking someone to join our team as (Director) Site Reliabilty Engineer experienced in developing and / or supporting Enterprise Applications,Willingness to embrace Agile and DevOps / SRE concept...Show more
    Last updated: 1 day ago • Promoted
    Engineering Director - AI Platform Development

    Engineering Director - AI Platform Development

    EvoluteIQ • Bengaluru, Republic Of India, IN
    We are seeking an experienced Engineering Manager to lead the development and delivery of our next-generation Agentic AI low-code / no-code hyperautomation platform. You will be responsible for engine...Show more
    Last updated: 9 days ago • Promoted
    Senior Analog Director- PCIe-Gen7, UCIe

    Senior Analog Director- PCIe-Gen7, UCIe

    Mulya Technologies • bangalore, karnataka, in
    UCIe ( Senior Director level / Director ).Location : Bengaluru / Hyderabad.About Omni Design Technologies.Omni Design Technologies is a leading provider of high-performance, ultra-low power IP cores...Show more
    Last updated: 21 days ago • Promoted
    Site Reliability Engineer on AI Platform, Director

    Site Reliability Engineer on AI Platform, Director

    Morgan Stanley • Bangalore, India
    Site Reliability Engineer on AI Platform , Director.We're seeking someone to join our AI Platform team as Site Reliability Engineer on AI Platform to help support, scale and harden the infrastructu...Show more
    Last updated: 17 days ago • Promoted
    Director of Data Engineering – 100% Remote

    Director of Data Engineering – 100% Remote

    Hyly.AI • Bengaluru, IN
    Remote
    AI is multifamily’s only Intelligence Fabric™, weaving Artificial, Business, and Human intelligence into one operating system for growth. The company connects raw data to decisions, decisions to act...Show more
    Last updated: 6 days ago • Promoted
    Senior Director, AI Core Infrastructure

    Senior Director, AI Core Infrastructure

    WITS (Wistron ITS) • Bengaluru, Republic Of India, IN
    Build scalable, fault-tolerant cloud-native services on Microsoft Azure, ensuring high performance and reliability.Develop secure, well-documented public APIs and SDKs for consumption by internal a...Show more
    Last updated: 2 days ago • Promoted
    Director of Engineering

    Director of Engineering

    Amicon Hub Services • Bangalore, IN
    Position Title : Director – Platform Engineering.Contract Duration : Minimum 3 months (Extendable based on the performance and need). We are seeking a seasoned Platform Architect to spearhead the desi...Show more
    Last updated: 5 hours ago • Promoted • New!
    Senior Director — AI / ML & GenAI

    Senior Director — AI / ML & GenAI

    Mobileum • Bengaluru, India
    Mobileum is a leading provider of Telecom analytics solutions for roaming, core network, security, risk management, domestic and international connectivity testing, and customer intelligence.More t...Show more
    Last updated: 30+ days ago • Promoted