Talent.com
This job offer is not available in your country.
Zycus - Site Reliability Engineering Manager

Zycus - Site Reliability Engineering Manager

Zycus Infotech Pvt LtdMumbai
3 days ago
Job description

Job Description :

Zycus is looking for a Site Reliability Engineer (SRE) with deep expertise in Kubernetes, automation, and Linux systems.

The ideal candidate will have hands-on experience in deploying, administrating, and optimizing large-scale production systems, with a strong focus on microservices architecture, ensuring automation, performance, and reliability across our SaaS platform.

Roles And Responsibilities :

  • System Reliability & Uptime : Ensure high availability, performance, and reliability of applications and infrastructure.
  • Kubernetes & Cluster Management : Deploy, administer, and maintain Kubernetes clusters, managing scaling, upgrades, and troubleshooting.
  • Microservices Management : Handle the deployment, monitoring, and scaling of microservices in distributed environments.
  • Incident Management : Respond to production incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence.
  • Automation & Infrastructure as Code (IaC) : Automate repetitive tasks, infrastructure provisioning, and deployment workflows using tools like Ansible and Terraform.
  • Monitoring & Observability : Implement and maintain monitoring tools (e.

, Prometheus, Grafana, Datadog) to track system health and application performance.

  • Performance Optimization : Analyze system performance, identify bottlenecks, and optimize resources for better efficiency.
  • Disaster Recovery & Backup : Design and implement backup and disaster recovery (DR) strategies for business continuity.
  • Capacity Planning : Forecast infrastructure needs based on performance trends and business growth to ensure scalability.
  • Security & Compliance : Ensure infrastructure and applications meet security standards and compliance requirements.
  • Collaboration with Dev & Ops Teams : Work closely with development and operations teams to improve deployment pipelines, release processes, and system reliability.
  • Documentation : Maintain clear and detailed documentation of systems, processes, and incident reports for knowledge sharing and compliance.
  • Continuous Improvement : Identify opportunities for improving system architecture, deployment strategies, and automation workflows.
  • Cloud Infrastructure Management : Manage cloud services (AWS, GCP, Azure) for resource optimization, cost management, and automation.
  • On-Call Support : Participate in on-call rotations to handle urgent production issues and ensure rapid recovery.
  • Job Requirement :

    Experience : 5 to 12 years.

    Technical skills as mentioned below : .

    Must Have :

    Kubernetes Expertise :

  • Hands-on experience with installing and provisioning Kubernetes clusters.
  • Deep understanding of core Kubernetes components such as CRI, CNS, ETCD, CoreDNS, KubeProxy.
  • Strong knowledge of Kubernetes internal networking, service discovery, and ingress management.
  • Kubernetes Distributions :

  • Hands-on experience with different Kubernetes provisioners and distributions.
  • Kubernetes Cluster Administration :

  • Experience in administering production Kubernetes clusters, including backup and disaster recovery (DR) strategies.
  • Familiarity with cluster health monitoring and troubleshooting issues.
  • Monitoring tools : Exposure to monitoring tools such as Prometheus, Grafana, Datadog or AppDynamics.

    Automation & Scripting :

  • Strong programming skills in Python or Shell, or similar languages.
  • Hands-on experience with Infrastructure-as-Code (IaC) tools such as Terraform or Ansible.
  • Cloud automation experience, ideally with AWS or other major cloud platforms.
  • Operating Systems : Hands-on experience with Linux system : Experience with microservices architecture and managing more than 50 microservices simultaneously.

    Good To Have Skills :

  • Experience with OpenShift virtualization in production environments.
  • Knowledge of AWS EKS, Rancher, or other Kubernetes distributions.
  • CKA (Certified Kubernetes Administrator) certification or equivalent.
  • Experience in fine-tuning RHEL, CentOS, and Ubuntu.
  • Familiarity with DevSecOps practices, container security, and compliance frameworks.
  • (ref : hirist.tech)

    Create a job alert for this search

    Engineering Manager • Mumbai

    Related jobs
    • Promoted
    RELX - Site Reliability Engineer - IAC Terraform

    RELX - Site Reliability Engineer - IAC Terraform

    REED ELSEVIER INDIA (a part of RELX India Pvt Ltd)Mumbai
    Job Description : - Lead initiatives to identify and eliminate manual, repetitive tasks through automation and tooling.Develop s...Show moreLast updated: 30+ days ago
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Session AIMumbai, MH, IN
    Quick Apply
    Are you ready to make your mark with a true industry disruptor? ZineOne, a subsidiary of.We work with some of the leading brands nationwide and we innovate how brands connect with and convert custo...Show moreLast updated: 30+ days ago
    • Promoted
    Akasa Air - Site Reliability Engineer

    Akasa Air - Site Reliability Engineer

    SNV AVIATION PRIVATE LIMITED / Akasa AirMumbai
    As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our systems and infrastructure. This includes troubleshooting issues, developing and maintaini...Show moreLast updated: 30+ days ago
    • Promoted
    Natobotics - Vice President - Site Reliability Engineering

    Natobotics - Vice President - Site Reliability Engineering

    Natobotics Technologies Pvt LimitedMumbai
    Job Summary : We are seeking a visionary and strategic VP Site Reliability Engineering (SRE) to join the leadership team. This is a foundational role within the CTO o...Show moreLast updated: 29 days ago
    • Promoted
    Sr Site Reliability Engineer

    Sr Site Reliability Engineer

    Media.netMumbai, Maharashtra, India
    Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show moreLast updated: 19 days ago
    • Promoted
    Regional Software Engineering Manager - Marketplace / Fintech / Remote

    Regional Software Engineering Manager - Marketplace / Fintech / Remote

    Fynder Talentmumbai, maharashtra, in
    Remote
    We are working with a high-growth FinTech business, publicly listed on the NASDAQ, that is scaling its engineering capabilities across Asia. This role offers a unique opportunity to join a company t...Show moreLast updated: 13 days ago
    • Promoted
    Site Reliability Engineer / Lead - CI / CD Pipeline

    Site Reliability Engineer / Lead - CI / CD Pipeline

    SolutionTech HRMumbai
    Key Responsibilities : - Lead and mentor a team of SREs / DevOps Engineers, fostering a culture of ownership, reliability,...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    o9 Solutions, Inc.mumbai city, maharashtra, in
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 5 days ago
    • Promoted
    Senior Site Reliability Engineer I

    Senior Site Reliability Engineer I

    ConfidentialMumbai
    This Senior Site Reliability Engineer (SRE) position offers the opportunity to work on impactful projects that enhance reliability and reduce manual work through automation.You ll leverage your exp...Show moreLast updated: 29 days ago
    • Promoted
    • New!
    Sr Site Reliability Engineer (Only 24h Left)

    Sr Site Reliability Engineer (Only 24h Left)

    Media.netMumbai, Maharashtra, India
    Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show moreLast updated: 5 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    SynechronMumbai, Maharashtra, India
    We have immediate opportunity for.Site Reliability Engineer Devop 5 to 9 years.SRE (Senior Site Reliability Engineer) Devop. We began life in 2001 as a small, self-funded team of technology speciali...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Amicon Hub Servicesthane, maharashtra, in
    Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmamumbai, maharashtra, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 4 days ago
    • Promoted
    Rebel Foods - Engineering Manager - Distributed Systems

    Rebel Foods - Engineering Manager - Distributed Systems

    REBEL FOODS PRIVATE LIMITEDMumbai
    About Us : We are surrounded by the world's leading consumer companies led by technology - Amazon for retail, Airbnb for hospitality, Uber for mobility, Netflix ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer II

    Senior Site Reliability Engineer II

    ConfidentialMumbai
    We are seeking a skilled and proactive Site Reliability Engineer (SRE).This role involves close collaboration with.NET developers and QA teams, ensuring seamless transitions and ongoing reliability...Show moreLast updated: 29 days ago
    • Promoted
    Azilen Technologies - Site Reliability Engineer - Cloud Technologies

    Azilen Technologies - Site Reliability Engineer - Cloud Technologies

    Azilen Technologies Pvt LtdMumbai
    About the job : Who you are : - Deployment of large distributed application in Production / Staging environment Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Thane, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    XequalstoMumbai
    Description : Senior Site Reliability Engineer (SRE) Location : Mumbai , Navi Mumbai - Hybrid office visits will be scheduled as and when requi...Show moreLast updated: 4 days ago