Talent.com
Manager- Site Reliability Engineering (SRE)
Manager- Site Reliability Engineering (SRE)Confidential • Mumbai, India
Manager- Site Reliability Engineering (SRE)

Manager- Site Reliability Engineering (SRE)

Confidential • Mumbai, India
5 days ago
Job description

About Us

Zycus, recognized by leading analyst firms in procurement technology, empowers teams to unlock deep value through its comprehensive Source-to-Pay (S2P) solutions. At the heart of our S2P solution is the Merlin Agentic Platform, which orchestrates intelligent AI agents to deliver simplified, efficient, and compliant processes.

The Merlin Intake Agent Offers Business Users Unparalleled Ease Of Use, Increasing Adoption Rates And Significantly Reducing Non-compliant Spending. For Procurement Teams, The Merlin Autonomous Negotiation Agent Handles Tail Spend Autonomously, Securing Additional Savings; The Merlin Contract Agent Helps Draft Compliant Contracts And Reduces Risks By Actively Monitoring Them; And The Merlin AP Agent Further Enhances Efficiency By Automating Invoice Processing With Exceptional Speed And Accuracy. We Are An Equal Opportunity Employer :

Zycus is committed to providing equal opportunities in employment and creating an inclusive work environment. We do not discriminate against applicants on the basis of race, color, religion, gender, sexual orientation, national origin, age, disability, or any other legally protected characteristic. All hiring decisions will be based solely on qualifications, skills, and experience relevant to the job requirements.

Job Description

Zycus is looking for a Site Reliability Engineer (SRE) with deep expertise in Kubernetes , automation , and Linux systems . The ideal candidate will have hands-on experience in deploying, administrating, and optimizing large-scale production systems, with a strong focus on microservices architecture , ensuring automation, performance, and reliability across our SaaS platform.

Roles And Responsibilities :

  • System Reliability & Uptime : Ensure high availability, performance, and reliability of applications and infrastructure.
  • Kubernetes & Cluster Management : Deploy, administer, and maintain Kubernetes clusters, managing scaling, upgrades, and troubleshooting.
  • Microservices Management : Handle the deployment, monitoring, and scaling of microservices in distributed environments.
  • Incident Management : Respond to production incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence.
  • Automation & Infrastructure as Code (IaC) : Automate repetitive tasks, infrastructure provisioning, and deployment workflows using tools like Ansible and Terraform.
  • Monitoring & Observability : Implement and maintain monitoring tools (e.g., Prometheus, Grafana, Datadog) to track system health and application performance.
  • Performance Optimization : Analyze system performance, identify bottlenecks, and optimize resources for better efficiency.
  • Disaster Recovery & Backup : Design and implement backup and disaster recovery (DR) strategies for business continuity.
  • Capacity Planning : Forecast infrastructure needs based on performance trends and business growth to ensure scalability.
  • Security & Compliance : Ensure infrastructure and applications meet security standards and compliance requirements.
  • Collaboration with Dev & Ops Teams : Work closely with development and operations teams to improve deployment pipelines, release processes, and system reliability.
  • Documentation : Maintain clear and detailed documentation of systems, processes, and incident reports for knowledge sharing and compliance.
  • Continuous Improvement : Identify opportunities for improving system architecture, deployment strategies, and automation workflows.
  • Cloud Infrastructure Management : Manage cloud services (AWS, GCP, Azure) for resource optimization, cost management, and automation.
  • On-Call Support : Participate in on-call rotations to handle urgent production issues and ensure rapid recovery.

Job Requirement

  • Experience : 5 to 12 years
  • Technical skills as mentioned below :
  • Must Have :

  • Kubernetes Expertise :
  • Hands-on experience with installing and provisioning Kubernetes clusters .

    Deep understanding of core Kubernetes components such as CRI, CNS, ETCD, CoreDNS, KubeProxy .

    Strong knowledge of Kubernetes internal networking , service discovery, and ingress management.

  • Kubernetes Distributions :
  • Hands-on experience with different Kubernetes provisioners and distributions.

  • Kubernetes Cluster Administration :
  • Experience in administering production Kubernetes clusters , including backup and disaster recovery (DR)

    strategies.

    Familiarity with cluster health monitoring and troubleshooting issues.

  • Monitoring tools : Exposure to monitoring tools such as Prometheus, Grafana, Datadog or AppDynamics
  • Automation & Scripting :
  • Strong programming skills in Python or Shell , or similar languages.

    Hands-on experience with Infrastructure-as-Code (IaC) tools such as Terraform or Ansible .

    Cloud automation experience, ideally with AWS or other major cloud platforms.

  • Operating Systems : Hands-on experience with Linux system administration.
  • Microservices : Experience with microservices architecture and managing more than 50 microservices
  • simultaneously.

    Good To Have Skills :

  • Experience with OpenShift virtualization in production environments.
  • Knowledge of AWS EKS, Rancher, or other Kubernetes distributions.
  • CKA (Certified Kubernetes Administrator) certification or equivalent.
  • Experience in fine-tuning RHEL, CentOS, and Ubuntu.
  • Familiarity with DevSecOps practices, container security, and compliance frameworks.
  • Five Reasons Why You Should Join Zycus

  • Industry Recognized Leader : Zycus is recognized by Gartner (world's leading market research analyst) as a Leader in Procurement Software Suites. Zycus is also recognized as a Customer First Organization by Gartner. Zycus's Procure to Pay Suite Scores 4.5 out of 5 ratings in Gartner Peer Insights for Procure-to-Pay Suites.
  • Pioneer in Cognitive Procurement : Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises
  • Fast Growing : Growing Region at the rate of 30% Y-o-Y
  • Global Enterprise Customers : Work with Large Enterprise Customers globally to drive Complex Global Implementation on the value framework of Zycus
  • AI Product Suite : Steer next gen cognitive product suite offering
  • About Us

    Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises for two decades. Zycus has been consistently recognized by Gartner, Forrester, and other analysts for its Source to Pay integrated suite. Zycus powers its S2P software with the revolutionary Merlin AI Suite. Merlin AI takes over the tactical tasks and empowers procurement and AP officers to focus on strategic projects; offers data-driven actionable insights for quicker and smarter decisions, and its conversational AI offers a B2C type user-experience to the end-users.

    Zycus helps enterprises drive real savings, reduce risks, and boost compliance, and its seamless, intuitive, and easy-to-use user interface ensures high adoption and value across the organization.

    Start your #CognitiveProcurement journey with us, as you are #MeantforMore

    Skills Required

    Shell, Terraform, Ansible, Python, Kubernetes

    Create a job alert for this search

    Engineering Manager • Mumbai, India

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    People Prime Worldwide • Mumbai, IN
    Our client is a French multinational information technology (IT) services and consulting company, headquartered in Paris, France. Founded in 1967, It has been a leader in business transformation for...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Awign Expert • Thane, IN
    Position : SRE Observability Engineer.Mandatory Skills : Observability, Grafana and Writing queries using Prometheus and Loki. We are seeking a highly experienced and driven Senior Observability Engin...Show more
    Last updated: 1 hour ago • Promoted • New!
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Media.net • Mumbai, Maharashtra, India
    Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show more
    Last updated: 3 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PhonePe • Kalyan-Dombivli, IN
    SRE We are looking for engineers who are passionate about reliability, performance, and efficiency, and with experience in building tools, services, and automation to manage and improve production ...Show more
    Last updated: 15 days ago • Promoted
    Zycus - Site Reliability Engineering Manager

    Zycus - Site Reliability Engineering Manager

    Zycus Infotech Pvt Ltd • Mumbai
    Job Description : Zycus is looking for a Site Reliability Engineer (SRE) with deep expertise in Kubernetes, automation, and Linux systems. The ideal candidate will ha...Show more
    Last updated: 30+ days ago • Promoted
    Engineering Manager

    Engineering Manager

    The Sleep Company • Mumbai, Maharashtra, India
    We are seeking an Engineering Manager to lead our engineering team in building and scaling complex products and integrations. Our systems are based on a microservices architecture, with a stack that...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer / Lead - CI / CD Pipeline

    Site Reliability Engineer / Lead - CI / CD Pipeline

    SolutionTech HR • Mumbai
    Key Responsibilities : - Lead and mentor a team of SREs / DevOps Engineers, fostering a culture of ownership, reliability,...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Voya India • Thane, IN
    We are seeking a strategic and technically adept leader to drive the scalability, resilience, and operational excellence of our enterprise systems. This role will set the vision for site reliability...Show more
    Last updated: less than 1 hour ago • Promoted • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Datum Technologies Group • Mumbai, IN
    Job Title : Site Reliability Engineer (SRE) – AWS.AWS, Terraform, Kubernetes, Docker, Grafana, Prometheus, Datadog.We are looking for a skilled Site Reliability Engineer (SRE) with strong AWS experi...Show more
    Last updated: 8 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Synechron • Mumbai, Maharashtra, India
    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5+ years.Synechron – Mumbai Job Role : - SRE (Senior Site Reliability Engineer) Job Location : - Mumbai About Synechron We...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaService • Mumbai, IN
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
    Last updated: 13 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Synechron • Mumbai, Maharashtra, India
    We have immediate opportunity for.Site Reliability Engineer Devop 5 to 9 years.SRE (Senior Site Reliability Engineer) Devop. We began life in 2001 as a small, self-funded team of technology speciali...Show more
    Last updated: 30+ days ago • Promoted
    Engineering Manager

    Engineering Manager

    Tamara • Kalyan-Dombivli, IN
    Tamara is the leading fintech platform in Saudi Arabia and the wider GCC region with a mission to help people make their dreams come true by building the most customer-centric financial super-app o...Show more
    Last updated: 30+ days ago • Promoted
    Engineering Manager

    Engineering Manager

    Branch International • Thane, IN
    Branch delivers world-class financial services to the mobile generation.With offices in the United States, Nigeria, Kenya, and India, Branch is a for-profit socially conscious company that uses the...Show more
    Last updated: 30+ days ago • Promoted
    Sr Site Reliability Engineer

    Sr Site Reliability Engineer

    Media.net • Mumbai, Maharashtra, India
    Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer (SRE) / DevOps Engineer

    Site Reliability Engineer (SRE) / DevOps Engineer

    Stoopa AI • Kalyan-Dombivli, IN
    AI is building next-generation AI-driven platforms for ports and is focused on reliability, speed, and intelligent automation. As we scale our next generation smart port product Turi, we are hiring ...Show more
    Last updated: 1 hour ago • Promoted • New!
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Xequalsto • Mumbai
    Description : Senior Site Reliability Engineer (SRE) Location : Mumbai , Navi Mumbai - Hybrid office visits will be scheduled as and when requi...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Search Synergy Pvt Ltd • Mumbai
    Note - Location - Dadar / Kurla (Mumbai) Skill, Knowledge &Trainings : - Own and manage the CI / CD pipelines for auto...Show more
    Last updated: 30+ days ago • Promoted