Talent.com
This job offer is not available in your country.
Director of Site Reliability Engineering

Director of Site Reliability Engineering

ConfidentialBengaluru / Bangalore, India
10 days ago
Job description

Join us in bringing joy to customer experience. Five9 is a leading provider of cloud contact center software, bringing the power of cloud innovation to customers worldwide.

Living our values everyday results in our team-first culture and enables us to innovate, grow, and thrive while enjoying the journey together. We celebrate diversity and foster an inclusive environment, empowering our employees to be their authentic selves.

Director of Site Reliability Engineering

The Director of Site Reliability Engineering is responsible for leading the strategic vision, operational excellence, and organizational capability of our SRE function. This role combines technical leadership with people management to build and scale a world-class SRE organization that enables rapid innovation while maintaining exceptional reliability standards.

As the senior leader of the SRE discipline, you will establish the technical strategy, culture, and practices that ensure our systems can scale reliably to meet business demands. You will build and lead a team of SRE professionals, partner with engineering leadership across the organization, and drive the adoption of SRE principles and practices.

This is a hands-on leadership role requiring deep technical expertise, proven ability to scale engineering organizations, and a track record of building reliable systems at scale. The ideal candidate will balance reliability with tactical execution, driving both immediate operational excellence and long-term architectural improvements where necessary.

Key Responsibilities

Strategic Leadership & Vision

  • Define and execute the long-term SRE strategy aligned with business objectives and technical roadmap
  • Establish reliability standards, SLI / SLO frameworks, and error budget policies across services
  • Drive architectural decisions that improve system reliability, scalability, and operational efficiency
  • Partner with engineering leadership to influence platform and application design for reliability
  • Represent SRE perspective in executive technical discussions and strategic planning

Team Leadership & Development

  • Build, lead, and scale a high-performing SRE organization
  • Recruit, hire, and onboard top-tier SRE talent across multiple experience levels
  • Develop career progression frameworks and growth paths for SRE professionals
  • Foster a culture of continuous learning, blameless post-mortems, and operational excellence
  • Provide technical mentorship and leadership development for senior SRE staff
  • Operational Excellence & Incident Management

  • Manage and oversee enterprise-wide incident response processes and on-call practices
  • Drive root cause analysis programs and ensure systematic elimination of failure modes
  • Implement sustainable on-call practices that maintain work-life balance while ensuring coverage
  • Oversee capacity planning and resource optimization strategies across all services
  • Establish metrics and reporting frameworks for reliability, performance, and operational health
  • Cross-Functional Partnership

  • Collaborate with VP / Director level peers in Engineering, Product, and Infrastructure
  • Work with Security leadership to integrate reliability and security practices
  • Partner with Finance on cost optimization initiatives and capacity planning budgets
  • Engage with Customer Success and Support teams on reliability-impacting issues
  • Platform & Tooling Strategy

  • Drive the simplification and reduction of observability, monitoring, and alerting platforms
  • Establish automation standards and drive toil reduction initiatives
  • Help improve CI / CD pipeline architecture and deployment practices
  • Influence infrastructure-as-code and configuration management strategies
  • Organizational & Process Innovation

  • Implement SRE best practices including error budgets, toil tracking, and reliability reviews
  • Establish metrics-driven decision making and continuous improvement processes
  • Drive adoption of chaos engineering and proactive reliability testing
  • Create and maintain SRE documentation, runbooks, and knowledge sharing systems
  • Develop and execute disaster recovery and business continuity plans
  • Required Skills

    Leadership & Management Experience

  • Bachelor&aposs or Master&aposs degree in Computer Science, Engineering, or equivalent experience
  • 8+ years in engineering leadership roles, with 4+ years managing managers
  • Proven track record of building and scaling engineering teams
  • Experience with performance management, career development, and succession planning
  • Strong executive presence and ability to influence without authority
  • Experience driving organizational change and cultural transformation
  • Technical Expertise

  • Experience with multiple cloud platforms (AWS, GCP, Azure) and hybrid environments
  • Deep understanding of distributed systems, microservices architecture, and cloud platforms
  • Hands-on experience with modern observability tools (Prometheus, Grafana, Datadog, etc.)
  • Strong background in infrastructure automation, CI / CD, and infrastructure-as-code
  • Expertise in capacity planning, performance optimization, and cost management
  • SRE & Operations Mastery

  • Deep understanding of SRE principles, practices, and implementation at scale
  • Experience establishing SLI / SLO frameworks and error budget management
  • Proven track record of improving system reliability and reducing operational toil
  • Experience with incident management, post-mortem processes, and reliability engineering
  • Background in 24 / 7 operations and on-call management best practices
  • Business & Strategic Acumen

  • Understanding of budget management, resource allocation, and ROI analysis
  • Ability to communicate technical concepts to non-technical stakeholders and executives
  • Experience with vendor management and technology partnership decisions
  • Knowledge of compliance frameworks and regulatory requirements
  • Desired Skills

    Advanced Technical Background

  • Background in container orchestration (Kubernetes) and service mesh technologies
  • Knowledge of database administration and data platform reliability
  • Experience with security engineering and DevSecOps practices
  • Success Metrics

    Reliability & Performance

  • Achieve and maintain service availability targets (typically 99.9%+ uptime)
  • Reduce mean time to detection (MTTD) and mean time to recovery (MTTR)
  • Improve capacity planning accuracy and reduce over-provisioning costs
  • Increase deployment frequency while maintaining reliability standards
  • Team & Organizational Development

  • Build and retain a high-performing SRE organization with low attrition
  • Establish clear career progression and achieve high employee satisfaction scores
  • Develop internal talent and promote from within the SRE organization
  • Create sustainable on-call practices with reasonable operational load
  • Operational Excellence

  • Drive measurable reduction in operational toil and manual interventions
  • Establish comprehensive observability and proactive alerting across all services
  • Implement effective incident response with blameless post-mortem culture
  • Achieve cost optimization targets while maintaining reliability standards
  • Five9 embraces diversity and is committed to building a team that represents a variety of backgrounds, perspectives, and skills.  The more inclusive we are, the better we are.  Five9 is an equal opportunity employer.

    View our privacy policy, including our privacy notice to California residents here : https : / / www.five9.com / pt-pt / legal.

    Note : Five9 will never request that an applicant send money as a prerequisite for commencing employment with Five9.

    Show more

    Show less

    Skills Required

    Performance Optimization, Distributed Systems, Cost Management, Prometheus, Grafana, Datadog, Infrastructure Automation, Capacity Planning, Gcp, Incident Management, Azure, Kubernetes, Aws

    Create a job alert for this search

    Director Of Engineering • Bengaluru / Bangalore, India

    Related jobs
    • Promoted
    Manager, Site Reliability Engineering (Cortex XDR XSIAM)

    Manager, Site Reliability Engineering (Cortex XDR XSIAM)

    Palo Alto NetworksBengaluru, Karnataka, India
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    BCT Consulting P LimitedBangalore
    Job Description : Key Responsibilities : &l...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Rangam Indiabangalore, India
    Infrastructure Platform Engineering (IPE), part of the client Infrastructure & Cloud organisation, are searching for a senior Associate to drive Site Reliability Engineering (SRE) and a professiona...Show moreLast updated: 1 hour ago
    • Promoted
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    Synechronbangalore, karnataka, in
    We have immediate opportunity for Senior Site Reliability Engineer.Senior Site Reliability Engineer.At Synechron, we believe in the power of digital to transform businesses for the better.Our globa...Show moreLast updated: 5 days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    BayOne Solutionsbangalore, karnataka, in
    Role : Site Reliability Engineer.The CXE Site Reliability Engineering (SRE) team manages the CI / CD pipelines and cloud infrastructure, ensuring seamless deployment, monitoring, and maintenance.Howev...Show moreLast updated: 22 hours ago
    • Promoted
    Minfy Technologies - Head - Site Reliability Engineering

    Minfy Technologies - Head - Site Reliability Engineering

    Minfy Technologies Private LimitedBangalore, India
    Job Summary We are seeking a strategic and technically proficient Head of Site Reliability Engineering (SRE) to lead th...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.hosur, tamil nadu, in
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ElgebraBangalore
    Role Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our c...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Core Minds Tech SOlutionsHosur
    Job Description : - Engage with our product teams to understand requirements, design, and implement resilient and scalable infrastructure solutions&l...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Principal Site Reliability Engineer

    Senior Principal Site Reliability Engineer

    F5bangalore, India
    At F5, we strive to bring a better digital world to life.Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital...Show moreLast updated: 1 hour ago
    Senior Manager – Site Reliability Engineering (SRE)

    Senior Manager – Site Reliability Engineering (SRE)

    First AdvantageBangalore-560066, ITPL Bangalore, IN
    Quick Apply
    At First Advantage (Nasdaq : FA), people are at the heart of everything we do.From our customers and partners to our greatest advantage — our team members. Operating with empathy and compassion...Show moreLast updated: 11 days ago
    • Promoted
    • New!
    Principal Consultant - Site Reliability Engineers / L2

    Principal Consultant - Site Reliability Engineers / L2

    Genpactbangalore, India
    Genpact (NYSE : G) is a global professional services and solutions firm delivering outcomes that shape the future.Our 125,000+ people across 30+ countries are driven by our innate curiosity, entrepr...Show moreLast updated: 1 hour ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Exasofthosur, tamil nadu, in
    Responsibilities and Requirements : .Experience must be at least 10+ years in SRE.Multi Cloud, Hybrid Cloud – on Data center sites. Experience with multiple operating systems (.Operating Systems, Kern...Show moreLast updated: 22 hours ago
    • Promoted
    Five9 - Director - Site Reliability Engineering

    Five9 - Director - Site Reliability Engineering

    Five9Bangalore
    Join us in bringing joy to customer experience.Five9 is a leading provider of cloud contact center software, bringing the power of cloud innovation to customers worldwide.Living our values everyday...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TavantBengaluru, Karnataka, India
    With 25+ years of experience building innovative digital products and solutions, Tavant provides impactful results to its customers. It has been the frontrunner in driving digital innovation and tec...Show moreLast updated: 27 days ago
    • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Rakuten IndiaBengaluru, Karnataka, India
    Design, develop SLA, SLO, SLI of services within the Business Unit.Involve in whole process of Development, Production System Operation including system maintenance, monitoring, automation, backend...Show moreLast updated: 8 days ago
    • Promoted
    • New!
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    UnitedHealth Groupbangalore, India
    Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives.The work you do with our team will directly improve health outcomes by connect...Show moreLast updated: 1 hour ago
    • Promoted
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    Epsilonbangalore, karnataka, in
    SaaSOps leads post-production support and the overall experience of Epsilon PeopleCloud products for our global clients.This function is responsible for product support, incident management, manage...Show moreLast updated: 8 days ago
    • Promoted
    o9 Solutions - Site Reliability Engineering Manager

    o9 Solutions - Site Reliability Engineering Manager

    o9 SolutionsBangalore
    Job Summary : We are seeking an experienced Manager to lead complex, cross-functional initiatives across our DevOps in collaboration with platform engineering.This ro...Show moreLast updated: 5 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Xebiahosur, tamil nadu, in
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 27 days ago