This job offer is not available in your country.

Zycus - Site Reliability Engineering Manager

Zycus Infotech Pvt LtdMumbai

3 days ago

Job description

Job Description :

Zycus is looking for a Site Reliability Engineer (SRE) with deep expertise in Kubernetes, automation, and Linux systems.

The ideal candidate will have hands-on experience in deploying, administrating, and optimizing large-scale production systems, with a strong focus on microservices architecture, ensuring automation, performance, and reliability across our SaaS platform.

Roles And Responsibilities :

System Reliability & Uptime : Ensure high availability, performance, and reliability of applications and infrastructure.
Kubernetes & Cluster Management : Deploy, administer, and maintain Kubernetes clusters, managing scaling, upgrades, and troubleshooting.
Microservices Management : Handle the deployment, monitoring, and scaling of microservices in distributed environments.
Incident Management : Respond to production incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence.
Automation & Infrastructure as Code (IaC) : Automate repetitive tasks, infrastructure provisioning, and deployment workflows using tools like Ansible and Terraform.
Monitoring & Observability : Implement and maintain monitoring tools (e.

, Prometheus, Grafana, Datadog) to track system health and application performance.

Performance Optimization : Analyze system performance, identify bottlenecks, and optimize resources for better efficiency.

Disaster Recovery & Backup : Design and implement backup and disaster recovery (DR) strategies for business continuity.

Capacity Planning : Forecast infrastructure needs based on performance trends and business growth to ensure scalability.

Security & Compliance : Ensure infrastructure and applications meet security standards and compliance requirements.

Collaboration with Dev & Ops Teams : Work closely with development and operations teams to improve deployment pipelines, release processes, and system reliability.

Documentation : Maintain clear and detailed documentation of systems, processes, and incident reports for knowledge sharing and compliance.

Continuous Improvement : Identify opportunities for improving system architecture, deployment strategies, and automation workflows.

Cloud Infrastructure Management : Manage cloud services (AWS, GCP, Azure) for resource optimization, cost management, and automation.

On-Call Support : Participate in on-call rotations to handle urgent production issues and ensure rapid recovery.

Job Requirement :

Experience : 5 to 12 years.

Technical skills as mentioned below : .

Must Have :

Kubernetes Expertise :

Hands-on experience with installing and provisioning Kubernetes clusters.

Deep understanding of core Kubernetes components such as CRI, CNS, ETCD, CoreDNS, KubeProxy.

Strong knowledge of Kubernetes internal networking, service discovery, and ingress management.

Kubernetes Distributions :

Hands-on experience with different Kubernetes provisioners and distributions.

Kubernetes Cluster Administration :

Experience in administering production Kubernetes clusters, including backup and disaster recovery (DR) strategies.

Familiarity with cluster health monitoring and troubleshooting issues.

Monitoring tools : Exposure to monitoring tools such as Prometheus, Grafana, Datadog or AppDynamics.

Automation & Scripting :

Strong programming skills in Python or Shell, or similar languages.

Hands-on experience with Infrastructure-as-Code (IaC) tools such as Terraform or Ansible.

Cloud automation experience, ideally with AWS or other major cloud platforms.

Operating Systems : Hands-on experience with Linux system : Experience with microservices architecture and managing more than 50 microservices simultaneously.

Good To Have Skills :

Experience with OpenShift virtualization in production environments.

Knowledge of AWS EKS, Rancher, or other Kubernetes distributions.

CKA (Certified Kubernetes Administrator) certification or equivalent.

Experience in fine-tuning RHEL, CentOS, and Ubuntu.

Familiarity with DevSecOps practices, container security, and compliance frameworks.

(ref : hirist.tech)

Create a job alert for this search

Engineering Manager • Mumbai

Related jobs

Promoted

RELX - Site Reliability Engineer - IAC Terraform

REED ELSEVIER INDIA (a part of RELX India Pvt Ltd)Mumbai

Job Description : - Lead initiatives to identify and eliminate manual, repetitive tasks through automation and tooling.Develop s...Show moreLast updated: 30+ days ago

Staff Site Reliability Engineer

Session AIMumbai, MH, IN

Quick Apply

Are you ready to make your mark with a true industry disruptor? ZineOne, a subsidiary of.We work with some of the leading brands nationwide and we innovate how brands connect with and convert custo...Show moreLast updated: 30+ days ago

Promoted

Akasa Air - Site Reliability Engineer

SNV AVIATION PRIVATE LIMITED / Akasa AirMumbai

As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our systems and infrastructure. This includes troubleshooting issues, developing and maintaini...Show moreLast updated: 30+ days ago

Promoted

Natobotics - Vice President - Site Reliability Engineering

Natobotics Technologies Pvt LimitedMumbai

Job Summary : We are seeking a visionary and strategic VP Site Reliability Engineering (SRE) to join the leadership team. This is a foundational role within the CTO o...Show moreLast updated: 29 days ago

Promoted

Sr Site Reliability Engineer

Media.netMumbai, Maharashtra, India

Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show moreLast updated: 19 days ago

Promoted

Regional Software Engineering Manager - Marketplace / Fintech / Remote

Fynder Talentmumbai, maharashtra, in

Remote

We are working with a high-growth FinTech business, publicly listed on the NASDAQ, that is scaling its engineering capabilities across Asia. This role offers a unique opportunity to join a company t...Show moreLast updated: 13 days ago

Promoted

Site Reliability Engineer / Lead - CI / CD Pipeline

SolutionTech HRMumbai

Key Responsibilities : - Lead and mentor a team of SREs / DevOps Engineers, fostering a culture of ownership, reliability,...Show moreLast updated: 26 days ago

Promoted

Site Reliability Engineer

o9 Solutions, Inc.mumbai city, maharashtra, in

Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 5 days ago

Promoted

Senior Site Reliability Engineer I

ConfidentialMumbai

This Senior Site Reliability Engineer (SRE) position offers the opportunity to work on impactful projects that enhance reliability and reduce manual work through automation.You ll leverage your exp...Show moreLast updated: 29 days ago

Promoted
New!

Sr Site Reliability Engineer (Only 24h Left)

Media.netMumbai, Maharashtra, India

Promoted

Site Reliability Engineer

SynechronMumbai, Maharashtra, India

We have immediate opportunity for.Site Reliability Engineer Devop 5 to 9 years.SRE (Senior Site Reliability Engineer) Devop. We began life in 2001 as a small, self-funded team of technology speciali...Show moreLast updated: 4 days ago

Promoted

Site Reliability Engineer

Amicon Hub Servicesthane, maharashtra, in

Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 26 days ago

Promoted

Site Reliability Engineer

CodeKarmamumbai, maharashtra, in

Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 4 days ago

Promoted

Rebel Foods - Engineering Manager - Distributed Systems

REBEL FOODS PRIVATE LIMITEDMumbai

About Us : We are surrounded by the world's leading consumer companies led by technology - Amazon for retail, Airbnb for hospitality, Uber for mobility, Netflix ...Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer II

ConfidentialMumbai

We are seeking a skilled and proactive Site Reliability Engineer (SRE).This role involves close collaboration with.NET developers and QA teams, ensuring seamless transitions and ongoing reliability...Show moreLast updated: 29 days ago

Promoted

Azilen Technologies - Site Reliability Engineer - Cloud Technologies

Azilen Technologies Pvt LtdMumbai

About the job : Who you are : - Deployment of large distributed application in Production / Staging environment Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer- ELK Expert

iVedha Inc.Thane, IN

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer

XequalstoMumbai

Description : Senior Site Reliability Engineer (SRE) Location : Mumbai , Navi Mumbai - Hybrid office visits will be scheduled as and when requi...Show moreLast updated: 4 days ago