Talent.com
Site Reliability Engineer
Site Reliability EngineerWhiteLotus Talent Partners • Panchkula, Haryana, India
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

WhiteLotus Talent Partners • Panchkula, Haryana, India
17 hours ago
Job description

We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high system availability, reliability, and performance. You will be responsible for identifying and addressing simple issues, as well as escalating more complex problems to senior SREs when needed.

The ideal candidate should have a basic understanding of cloud infrastructure (especially OpenStack and Kubernetes ), containerized environments , and system monitoring. This position offers an excellent opportunity for someone looking to grow into a more advanced SRE or DevOps role.

Key Responsibilities :

For L0 Support (Level 0) :

Incident Monitoring & Triage :

Respond to system alerts, monitor infrastructure health using tools like Prometheus , Grafana , and Observability for both OpenStack and Kubernetes.

Identify low-level issues and follow runbooks or predefined scripts to perform first-level triage.

Document and escalate unresolved incidents to L1 or L2 based on established escalation protocols.

System Health Checks :

Perform daily health checks for Kubernetes pods, nodes, and OpenStack instances.

Verify basic functionality of VMs , containers , and network services within the environment.

Basic Troubleshooting :

Resolve simple issues such as VM reboots, pod failures, and network connectivity issues within OpenStack or Kubernetes environments.

Follow the predefined steps for basic troubleshooting tasks like restarting services or clearing logs.

Ticket Management :

Log incidents and issues into a ticketing system (e.g., JIRA , ServiceNow ) for tracking and escalation.

Update incident tickets and provide relevant information for ongoing resolution efforts.

=========================================================================================================

For L1 Support (Level 1) :

Incident Resolution :

Investigate and resolve more complex issues compared to L0, such as Kubernetes pod crashes, network misconfigurations in OpenStack, and minor service disruptions.

Work with tools like kubectl to troubleshoot Kubernetes pods and nodes, and OpenStack CLI to diagnose problems with VMs, storage, and networks.

Automation & Scripting :

Automate routine tasks, such as VM provisioning, pod deployments, or status checks, using basic scripting languages ( Python , Bash ).

Improve automation workflows based on feedback and frequently encountered issues.

Log Aggregation & Monitoring :

Review logs and metrics collected from ELK Stack , Prometheus , Grafana , or other logging tools to detect trends and potential issues.

Analyze logs and metrics from OpenStack and Kubernetes clusters to pinpoint underlying problems (e.g., high CPU usage, memory leaks).

Basic Network & Storage Management :

Investigate networking issues related to Neutron (for OpenStack) and CNI configurations (for Kubernetes).

Manage storage resources within OpenStack and Kubernetes (e.g., creating persistent volumes, debugging storage access issues).

Collaboration & Escalation :

Work closely with L2 and L3 engineers for complex troubleshooting or advanced system issues that require in-depth knowledge.

Share knowledge with the team and assist in creating new documentation or updating existing troubleshooting guides.

User and Permissions Management :

Perform basic user management tasks within OpenStack (e.g., creating and managing tenants, security groups).

Review and modify Kubernetes RBAC (Role-Based Access Control) settings based on user access needs.

Skills & Qualifications :

Required Skills :

Basic Cloud & Kubernetes Knowledge :

Familiarity with OpenStack architecture (e.g., Nova , Neutron , Cinder ).

Basic understanding of Kubernetes components, including pods , services , deployments , and namespaces .

Systems & Networking :

Knowledge of Linux / Unix-based operating systems (e.g., Ubuntu , CentOS , Red Hat ).

Understanding of networking concepts like DNS , IP routing , and VLANs in cloud environments.

Monitoring & Alerting Tools :

Familiarity with monitoring tools like Prometheus , Grafana , Zabbix , or CloudWatch for alert management and system health monitoring.

Troubleshooting & Incident Response :

Experience in using log aggregation tools ( ELK stack , Splunk ) and interpreting logs for incident detection.

Ability to perform basic troubleshooting steps (e.g., restarting services, running basic shell commands) to resolve issues.

Communication Skills :

Strong communication skills to collaborate effectively with senior SREs, developers, and other teams.

Ability to document incidents, solutions, and troubleshooting steps clearly.

Preferred Skills :

Basic Scripting & Automation :

Exposure to scripting languages such as Bash , Python , or Go to automate basic administrative tasks.

Cloud Platform Experience :

Familiarity with other cloud technologies such as AWS , Azure , or Google Cloud Platform .

Certifications :

Basic certifications such as CompTIA Linux+ , AWS Certified Solutions Architect , Kubernetes Fundamentals (CKA), or OpenStack COA are a plus.

Create a job alert for this search

Site Reliability Engineer • Panchkula, Haryana, India

Related jobs
Site Reliability Engineer - DevOps

Site Reliability Engineer - DevOps

Wits Innovation Lab • Mohali
Key Responsibilities : - Design, implement, and maintain comprehensive monitoring, logging, and alerting solutions across our production and other environmentsShow more
Last updated: 30+ days ago • Promoted
Sr. DevOps Engineer

Sr. DevOps Engineer

Olive Trees Consulting • baddi, himachal pradesh, in
Our client is a manufacturing company, headquartered in UK with their India office in Bangalore.This role is for a Senior Developer with significant DevOps experience. Design, implement, and maintai...Show more
Last updated: less than 1 hour ago • Promoted • New!
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Confidential • Nagar, Sahibzada Ajit Singh Nagar, India
SRE will lead the implementation and management of the observability stack across cloud infrastructure, ensuring reliability, scalability, performance, and cost-efficiency.The role spans across Kub...Show more
Last updated: 15 days ago • Promoted
LLM Reliability & Evaluation Engineer

LLM Reliability & Evaluation Engineer

Confidential • Nagar, Sahibzada Ajit Singh Nagar, India
XenonStack is the fastest-growing.Data and AI Foundry for Agentic Systems.Agentic Systems for AI Agents → akira.Vision AI Platform → xenonstack. Inference AI Infrastructure for Agentic Systems → nex...Show more
Last updated: 25 days ago • Promoted
Diffusion Equipment Engineer

Diffusion Equipment Engineer

Orbit & Skyline • Mohali district, India, India
Orbit & Skyline is looking forward to onboarding a.The candidate will be responsible for preventive and corrective maintenance of diffusion furnace equipment. The candidate must have good understand...Show more
Last updated: 21 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

HRhelpdesk • Panchkula, Republic Of India, IN
Company is a rapidly growing, private equity backed SaaS product company and provides cloud-based solutions.As a Site Reliability Engineer (SRE), you will be responsible for building and maintainin...Show more
Last updated: 10 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Infosys Finacle • Baddi, Republic Of India, IN
Role : DevSecOps Developer – Secure Coding & Automation.Strong scripting skills in Python, Shell, or similar languages for automation and tooling. Should be able to design, develop, test, and deploy...Show more
Last updated: 2 days ago • Promoted
Senior Site Reliability Engineer (C / Python)

Senior Site Reliability Engineer (C / Python)

Entech • Baddi, Republic Of India, IN
Senior Software Site Reliability Engineer (C# / Python).You’ll ensure enterprise systems are reliable, scalable, and performant - driving improvements, leading SRE initiatives, and mentoring teams on...Show more
Last updated: 5 days ago • Promoted
Freelance Site Reliability Engineer (SRE) / DevOps Engineer

Freelance Site Reliability Engineer (SRE) / DevOps Engineer

ThreatXIntel • panchkula, haryana, in
ThreatXIntel is a startup cyber security company focused on delivering customized, affordable solutions to protect businesses and organizations from cyber threats. Our experienced team specializes i...Show more
Last updated: 23 hours ago • Promoted
Senior Site Reliability Engineer - Incident Management

Senior Site Reliability Engineer - Incident Management

Wits Innovation Lab • Mohali
Site Reliability Engineer (SRE) We are seeking an experienced and results-driven Sr.Site Reliability Engineer (SRE) to join our team. The SRE will be responsible for e...Show more
Last updated: 30+ days ago • Promoted
Freelance Site Reliability Engineer (Sre) / Devops Engineer

Freelance Site Reliability Engineer (Sre) / Devops Engineer

ThreatXIntel • Mohali, Republic Of India, IN
ThreatXIntel is a startup cyber security company focused on delivering customized, affordable solutions to protect businesses and organizations from cyber threats. Our experienced team specializes i...Show more
Last updated: 17 hours ago • Promoted • New!
Site Reliability Engineer (SRE) – Infrastructure & Automation

Site Reliability Engineer (SRE) – Infrastructure & Automation

InstaService • panchkula, haryana, in
InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
Last updated: 19 days ago • Promoted
Site Reliability Engineer - DevOps

Site Reliability Engineer - DevOps

Confidential • Nagar, Sahibzada Ajit Singh Nagar, India
Design, implement, and maintain comprehensive monitoring, logging, and alerting solutions across our production and other environments. Lead incident response and post-mortem analyses, establishing ...Show more
Last updated: 25 days ago • Promoted
Design Engineer

Design Engineer

Jhhaps Industries • Mohali district, India, India
Jhhaps Industries is a professionally managed company engaged in manufacturing and exporting a wide array of quality scaffolding products for building and construction industries.Recognized as a pr...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer (C# / Python)

Senior Site Reliability Engineer (C# / Python)

Entech • baddi, himachal pradesh, in
Senior Software Site Reliability Engineer (C# / Python).You’ll ensure enterprise systems are reliable, scalable, and performant - driving improvements, leading SRE initiatives, and mentoring teams on...Show more
Last updated: 6 days ago • Promoted
Site Engineer

Site Engineer

Confidential • Chandigarh, India
Communication, vendor relationship, Site measurement, client communication,.Site Engineer Site Measurement.We are seeking a detail-oriented and proactive Site Engineer who will be responsible for c...Show more
Last updated: 25 days ago • Promoted
Full Stack Trainer

Full Stack Trainer

Chitkara University • Rajpura, Punjab, India
We are seeking an experienced and passionate React / React.Development Trainers to join our team on a Full-time basis.As a trainer, you will be responsible for delivering engaging and informative tra...Show more
Last updated: 30+ days ago • Promoted
Site Engineers

Site Engineers

BuildAcre | Construction Company • panchkula, haryana, in
Buildacre Construction Company is looking to hire experienced Site Engineers who can manage on-site execution with strong technical proficiency. Requirements : • Minimum 2 years of experience in buil...Show more
Last updated: less than 1 hour ago • Promoted • New!