Talent.com
Site Reliability Engineer
Site Reliability EngineerWhiteLotus Talent Partners • Pune, Maharashtra, India
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

WhiteLotus Talent Partners • Pune, Maharashtra, India
1 day ago
Job description

We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high system availability, reliability, and performance. You will be responsible for identifying and addressing simple issues, as well as escalating more complex problems to senior SREs when needed.

The ideal candidate should have a basic understanding of cloud infrastructure (especially OpenStack and Kubernetes ), containerized environments , and system monitoring. This position offers an excellent opportunity for someone looking to grow into a more advanced SRE or DevOps role.

Key Responsibilities :

For L0 Support (Level 0) :

Incident Monitoring & Triage :

Respond to system alerts, monitor infrastructure health using tools like Prometheus , Grafana , and Observability for both OpenStack and Kubernetes.

Identify low-level issues and follow runbooks or predefined scripts to perform first-level triage.

Document and escalate unresolved incidents to L1 or L2 based on established escalation protocols.

System Health Checks :

Perform daily health checks for Kubernetes pods, nodes, and OpenStack instances.

Verify basic functionality of VMs , containers , and network services within the environment.

Basic Troubleshooting :

Resolve simple issues such as VM reboots, pod failures, and network connectivity issues within OpenStack or Kubernetes environments.

Follow the predefined steps for basic troubleshooting tasks like restarting services or clearing logs.

Ticket Management :

Log incidents and issues into a ticketing system (e.g., JIRA , ServiceNow ) for tracking and escalation.

Update incident tickets and provide relevant information for ongoing resolution efforts.

=========================================================================================================

For L1 Support (Level 1) :

Incident Resolution :

Investigate and resolve more complex issues compared to L0, such as Kubernetes pod crashes, network misconfigurations in OpenStack, and minor service disruptions.

Work with tools like kubectl to troubleshoot Kubernetes pods and nodes, and OpenStack CLI to diagnose problems with VMs, storage, and networks.

Automation & Scripting :

Automate routine tasks, such as VM provisioning, pod deployments, or status checks, using basic scripting languages ( Python , Bash ).

Improve automation workflows based on feedback and frequently encountered issues.

Log Aggregation & Monitoring :

Review logs and metrics collected from ELK Stack , Prometheus , Grafana , or other logging tools to detect trends and potential issues.

Analyze logs and metrics from OpenStack and Kubernetes clusters to pinpoint underlying problems (e.g., high CPU usage, memory leaks).

Basic Network & Storage Management :

Investigate networking issues related to Neutron (for OpenStack) and CNI configurations (for Kubernetes).

Manage storage resources within OpenStack and Kubernetes (e.g., creating persistent volumes, debugging storage access issues).

Collaboration & Escalation :

Work closely with L2 and L3 engineers for complex troubleshooting or advanced system issues that require in-depth knowledge.

Share knowledge with the team and assist in creating new documentation or updating existing troubleshooting guides.

User and Permissions Management :

Perform basic user management tasks within OpenStack (e.g., creating and managing tenants, security groups).

Review and modify Kubernetes RBAC (Role-Based Access Control) settings based on user access needs.

Skills & Qualifications :

Required Skills :

Basic Cloud & Kubernetes Knowledge :

Familiarity with OpenStack architecture (e.g., Nova , Neutron , Cinder ).

Basic understanding of Kubernetes components, including pods , services , deployments , and namespaces .

Systems & Networking :

Knowledge of Linux / Unix-based operating systems (e.g., Ubuntu , CentOS , Red Hat ).

Understanding of networking concepts like DNS , IP routing , and VLANs in cloud environments.

Monitoring & Alerting Tools :

Familiarity with monitoring tools like Prometheus , Grafana , Zabbix , or CloudWatch for alert management and system health monitoring.

Troubleshooting & Incident Response :

Experience in using log aggregation tools ( ELK stack , Splunk ) and interpreting logs for incident detection.

Ability to perform basic troubleshooting steps (e.g., restarting services, running basic shell commands) to resolve issues.

Communication Skills :

Strong communication skills to collaborate effectively with senior SREs, developers, and other teams.

Ability to document incidents, solutions, and troubleshooting steps clearly.

Preferred Skills :

Basic Scripting & Automation :

Exposure to scripting languages such as Bash , Python , or Go to automate basic administrative tasks.

Cloud Platform Experience :

Familiarity with other cloud technologies such as AWS , Azure , or Google Cloud Platform .

Certifications :

Basic certifications such as CompTIA Linux+ , AWS Certified Solutions Architect , Kubernetes Fundamentals (CKA), or OpenStack COA are a plus.

Create a job alert for this search

Site Reliability Engineer • Pune, Maharashtra, India

Related jobs
Site Reliability Engineer

Site Reliability Engineer

Synechron • pune, maharashtra, in
We have immediate opportunity for.SRE (Senior Site Reliability Engineer) 5 to 9 years.SRE (Senior Site Reliability Engineer). We began life in 2001 as a small, self-funded team of technology special...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

PRI Global • Pune, Maharashtra, India
Experience in Linux , Azure cloud certification and candidate must have good knowledge on Bash / jenkins / Chef / chef-habitat technologies.Show more
Last updated: 21 days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Onit • Pune, Maharashtra, IN
Quick Apply
Site Reliability Engineer Onit, Inc.Site Reliability Engineer L2 to join our Core Infrastructure team.This role will help to ensure the reliability of a diverse set of applications across our AWS i...Show more
Last updated: 30+ days ago
Site Reliability Engineer

Site Reliability Engineer

Yum! India Global Services Private Limited • pune, maharashtra, in
Design, test, implement, deploy, and support continuous integration pipelines that build and deploy to cloud-based environments (development, stage / testing, production). In this role, you will help ...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Confidential • Pune, India
Batch is a brand-first technology platform designed to amplify customer engagement, enable frictionless transactions, defend product authenticity, elevate customer loyalty, and ignite customer grow...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Talent Worx • Pune, MH, IN
Quick Apply
Site Reliability Engineer (SRE).At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of o...Show more
Last updated: 30+ days ago
Site Reliability Engineer Rotation shift

Site Reliability Engineer Rotation shift

Confidential • Pune, India
We have immediate opportunity for.SRE (Senior Site Reliability Engineer) 5-8 years.SRE (Senior Site Reliability Engineer). We began life in 2001 as a small, self-funded team of technology specialist...Show more
Last updated: 13 days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

DeepIntent • Pune, Maharashtra, India
DeepIntent is leading the healthcare advertising industry with data-driven solutions built for the future.From day one our mission has been to improve patient outcomes through the artful use of adv...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Synamedia • pune, maharashtra, in
At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the age of infinite ...Show more
Last updated: 15 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

HRhelpdesk • pune, maharashtra, in
Company is a rapidly growing, private equity backed SaaS product company and provides cloud-based solutions.As a Site Reliability Engineer (SRE), you will be responsible for building and maintainin...Show more
Last updated: 11 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

PhonePe • Pune, Maharashtra, India
Troubleshoot issues across the entire stack - hardware, software, application, and network.Work to improve the reliability and performance of the next generation of distributed systems.Work to impr...Show more
Last updated: 21 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Boomi • Pune, Maharashtra, India
About Boomi and What Makes Us Special.Are you ready to work at a fast-growing company where you can make a difference Boomi aims to make the world a better place by connecting everyone to everythin...Show more
Last updated: 22 days ago • Promoted
Sr. Site Reliability Engineer - 10823

Sr. Site Reliability Engineer - 10823

Confidential • Indi, Pune
Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of d...Show more
Last updated: 21 days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Confidential • Pune, India
Become a digital, global citizen and enable the new generation of digital entrepreneurs around the world.AppDirect offers a subscription commerce platform to sell any product, through any channel, ...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

AppDirect • Pune, Maharashtra, India
Become a digital global citizen and enable the new generation of digital entrepreneurs around the world.AppDirect offers a subscription commerce platform to sell any product through any channel on ...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

o9 Solutions, Inc. • pune, maharashtra, in
Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show more
Last updated: 12 days ago • Promoted
Site Reliability Engineer Rotation shift

Site Reliability Engineer Rotation shift

Synechron • pune, maharashtra, in
We have immediate opportunity for.SRE (Senior Site Reliability Engineer) 5-8 years.SRE (Senior Site Reliability Engineer). We began life in 2001 as a small, self-funded team of technology specialist...Show more
Last updated: 14 days ago • Promoted
Site Reliability Engineer - Elastic Kubernetes Service

Site Reliability Engineer - Elastic Kubernetes Service

MNR Solutions • Pune
Description : Site Reliability Engineer (SRE) Kubernetes & Cloud Position Summary : We are seeking a...Show more
Last updated: 30+ days ago • Promoted