Talent.com
Infrastructure and Reliability Manager

Infrastructure and Reliability Manager

o9 Solutions, Inc.Bengaluru, Republic Of India, IN
5 days ago
Job description

About o9 Solutions :

o9 Solutions is a leading enterprise AI software platform provider transforming planning and decision-making capabilities for the world’s most forward-thinking companies. Our platform enables organizations to optimize operations, drive efficiency, and innovate faster.

Role : Engineering Manager - DevOps, R&D

Location : Bangalore, Hybrid

Work hours : Regular (Mon - Fri)

About the Role :

We are seeking an experienced Manager to lead complex, cross-functional initiatives across our DevOps in collaboration with platform engineering . This role will be instrumental in aligning operational priorities with engineering and business goals, driving initiatives related to infrastructure scalability, system reliability, incident response, automation, and cloud operations. You will be responsible for managing program delivery, establishing repeatable processes, and ensuring high visibility and accountability for all infrastructure and reliability programs.

This role involves building, designing, and developing tools from scratch while leading independent, end-to-end projects to successful delivery . It’s a core technical, hands-on position requiring strong analytical, networking, and security skills.

We’re looking for someone who has transitioned from a Principal or Senior Architect role to a Technical DevOps or Engineering Manager position, with some team management experience. Hands-on technical expertise is essential, as the role focuses on driving technical excellence within the project team for DevOps development.

What you will do in this role :

  • Lead and oversee end-to-end program delivery across multiple complex initiatives within DevOps in collaboration with platform engineering.
  • Drive planning, execution, and delivery of strategic programs supporting DevOps and platform functions, including reliability, observability & automation.
  • Managed and guided an automation team of over 8 professionals in DevOps practices.
  • Act as a strategic partner to engineering and product leadership to define program goals, scope, timelines, and success metrics.
  • Coordinate efforts across engineering, product, security, compliance, and operations teams.
  • Track and manage program risks, issues, and dependencies;

ensure timely mitigation and escalation where necessary.

  • Ensure alignment of engineering execution with product roadmaps and business priorities.
  • Translate technical complexities and constraints into actionable program plans and communicate them effectively to stakeholders.
  • Drive transparency through clear reporting, dashboards, and regular program reviews.
  • Foster a culture of continuous improvement, agility, and cross-team collaboration.
  • Establish and evolve program management best practices, tools, templates, and processes to support efficient delivery and communication.
  • Manage programs involving CI / CD infrastructure, platform migrations, infrastructure-as-code, monitoring / logging platforms, and disaster recovery.
  • Develop and manage roadmaps for technical operations initiatives with clear milestones and KPIs.
  • Champion DevOps / SRE best practices and help drive cultural change around service ownership, operational excellence, and continuous improvement.
  • What you’ll have...

    Qualifications

    Must Have :

  • Bachelor's or Master's degree in Computer Science, Engineering, or related technical field.
  • Total experience : 12-18 Years
  • 5+ years managing infrastructure, SRE, or DevOps-related programs .
  • Strong understanding of SRE principles (SLIs, SLOs, error budgets) and DevOps practices (CI / CD, automation, infrastructure as code).
  • Experience with cloud platforms (AWS, GCP, or Azure), Kubernetes, Terraform, monitoring / observability tools (Prometheus, Grafana, ELK, Datadog, etc.).
  • Strong experience with Agile / Scrum or hybrid delivery methodologies.
  • Proven ability to lead complex programs with multiple cross-functional stakeholders .
  • Familiarity with incident management, operational playbooks, runbooks, and on-call practices.
  • Nice to Have :

  • Hands-on engineering or DevOps / SRE experience earlier in career.
  • Program or project management experience
  • Certification(s) : PMP, AWS / GCP certifications, or SRE-focused training.
  • Experience working in high-availability, high-scale production environments (e.G., SaaS, FinTech, or eCommerce).
  • Key Competencies :

  • Strategic mindset with deep operational awareness.
  • Excellent communication and stakeholder management skills.
  • Ability to simplify complex technical concepts for executive reporting.
  • Strong leadership, people development, and cross-functional influencing skills.
  • Bias for action and a relentless focus on continuous improvement.
  • What we’ll do for you :

  • Flat organization : With a very strong entrepreneurial culture (and no corporate politics).
  • Great people and unlimited fun at work.
  • Possibility to really make a difference in a scale-up environment.
  • Support network : Work with a team you can learn from every day.
  • Diversity : We pride ourselves on our international working environment.
  • Work-Life Balance : https : / / youtu.Be / IHSZeUPATBA?feature=shared
  • Feel part of A team : https : / / youtu.Be / QbjtgaCyhes?feature=shared
  • How the process works...

  • Respond with your interest to us.
  • We’ll contact you either via video call or phone call - whatever you prefer, with the further schedule status.
  • During the interview phase, you will meet with the technical panel for 60 minutes. We will contact you after the interview to let you know if we’d like to progress your application.
  • There will be a round of Intro call, a technical discussion followed by a techno - Managerial and Managerial round.
  • We will let you know if you’re the successful candidate.
  • Good luck!

    More about us…

    With the latest increase in our valuation from $2.7B to $3.7B despite challenging global macroeconomic conditions, o9 Solutions is one of the fastest-growing technology companies in the world today. Our mission is to digitally transform planning and decision-making for the enterprise and the planet. Our culture is high-energy and drives us to aim 10x in everything we do.

    Our platform, the o9 Digital Brain, is the premier AI-powered, cloud-native platform driving the digital transformations of major global enterprises including Google, Walmart, ABInBev, Starbucks and many others.

    Our headquarters are located in Dallas, with offices in Amsterdam, Paris, London, Barcelona, Madrid, Sao Paolo, Bengaluru, Tokyo, Seoul, Milan, Stockholm, Sydney, Shanghai, Singapore and Munich.

    o9 is an equal opportunity employer and seeks applicants of diverse backgrounds and hires without regard to race, colour, gender, religion, national origin, citizenship, age, sexual orientation or any other characteristic protected by law

    Create a job alert for this search

    Infrastructure Manager • Bengaluru, Republic Of India, IN

    Related jobs
    • Promoted
    Infrastructure Manager

    Infrastructure Manager

    ITC InfotechBengaluru, Karnataka, India
    Job Title : Technical Release Manager – Microsoft Ecosystem (Windows OS, M365, W365, Autopilot).End User Computing (EUC) / IT Infrastructure. We are looking for a detail-oriented and technically prof...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    JRD SystemsBengaluru, Karnataka, India
    Site Reliability Engineer (Windows / Cloud / Automation).We are seeking an experienced Site Reliability Engineer with a strong background in managing Windows infrastructure and cloud environments.T...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmahosur, tamil nadu, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 25 days ago
    • Promoted
    Sr Manager for Infrastructure and Cloud Services (ICS) [T500-21300]

    Sr Manager for Infrastructure and Cloud Services (ICS) [T500-21300]

    ANSRBengaluru, Karnataka, India
    Illumina is a leading developer, manufacturer, and marketer of life science tools and integrated systems dedicated to making genomics useful for all. Illumina’s integrated Indian global hub in Benga...Show moreLast updated: 3 days ago
    • Promoted
    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Jade Globalhosur, tamil nadu, in
    Senior Site Reliability Engineer (SRE) – Datadog Observability.SRE and Infrastructure Operations with minimum 3.Hyderabad preferable but open for Pune and remote. Site Reliability Engineer (SRE).SRE...Show moreLast updated: 5 days ago
    • Promoted
    H1B Resource Deployment Manager

    H1B Resource Deployment Manager

    PTR GlobalBengaluru, IN
    Pinnacle Group is a nationally recognized leader in workforce solutions, known for delivering high-impact staffing, talent management, and contingent workforce programs. We support some of the most ...Show moreLast updated: 30+ days ago
    • Promoted
    Infrastructure Solutions Architect

    Infrastructure Solutions Architect

    BayOne Solutionshosur, tamil nadu, in
    Systems or Solutions Architect.IaaS), and cloud-scale system design.The ideal candidate combines strong fundamentals in.Kubernetes, observability, and automation. You’ll design scalable systems that...Show moreLast updated: 5 days ago
    • Promoted
    • New!
    Infrastructure Reliability Engineer

    Infrastructure Reliability Engineer

    Ubique SystemsBengaluru, Republic Of India, IN
    Job Role : Site Reliability Engineer.Location : Brookefield, Bangalore.The ideal candidate will have a strong background in infrastructure management and a deep understanding of blockchain ecosystems...Show moreLast updated: 3 hours ago
    • Promoted
    athenahealth - Site Reliability Engineer - Cloud Infrastructure

    athenahealth - Site Reliability Engineer - Cloud Infrastructure

    athenaHealth Technology Private Limited.Bangalore
    Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all. Our modern, open ecosystem connects care teams and delivers actionable i...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Business Development Manager - Managed IT Infrastructure Services

    Business Development Manager - Managed IT Infrastructure Services

    Velocis Systems Private LimitedBangalore, IN
    Velocis Systems Private Limited is an IT solutions and services provider based at various locations in India.With a focus on accelerating digital transformation journeys, Velocis partners with cust...Show moreLast updated: 4 hours ago
    • Promoted
    Infrastructure Engineer - Tier3

    Infrastructure Engineer - Tier3

    NEXPLAY SECUREhosur, tamil nadu, in
    The Infrastructure Engineer (Tier III, remote) serves as the senior technical authority within Nexplay Secure's Managed Services division. This role leads the deployment and ongoing support of criti...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaServicehosur, tamil nadu, in
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer - Cloud Infrastructure

    Site Reliability Engineer - Cloud Infrastructure

    HyreSnapBangalore
    Description : We are eager to speak with SREs who are not just looking for a role but are excited about shaping their position to match their aspirations and our evol...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiBengaluru, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 15 days ago
    • Promoted
    Regional Cloud Infrastructure Engineer

    Regional Cloud Infrastructure Engineer

    Argyll Scotthosur, tamil nadu, in
    This position offers an opportunity to lead and support a diverse hybrid IT landscape across the APAC region.The Regional IT and Cloud Specialist will be responsible for managing, optimizing, and s...Show moreLast updated: 5 days ago
    • Promoted
    Infrastructure Project Manager

    Infrastructure Project Manager

    Akkodishosur, tamil nadu, in
    Manage IT infrastructure projects, with a focus on network technologies and datacenter management.Lead and coordinate datacenter moves, migrations, and implementation projects.Ensure compliance wit...Show moreLast updated: 19 days ago
    • Promoted
    Lead - Cloud Reliability Engineer

    Lead - Cloud Reliability Engineer

    Searce Incbangalore district, karnataka, in
    The ‘process-first’ AI-native modern tech consultancy that's rewriting the rules.As an engineering-led consultancy, we are dedicated to relentlessly improving the real business outcomes.Our solvers...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CitNOW Grouphosur, tamil nadu, in
    Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably.CitNOW’s app-based plat...Show moreLast updated: 4 days ago