Talent.com
Site Reliability Engineer
Site Reliability EngineerWhiteLotus Talent Partners • ludhiana, India
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

WhiteLotus Talent Partners • ludhiana, India
2 days ago
Job description

We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high system availability, reliability, and performance. You will be responsible for identifying and addressing simple issues, as well as escalating more complex problems to senior SREs when needed.

The ideal candidate should have a basic understanding of cloud infrastructure (especially OpenStack and Kubernetes ), containerized environments , and system monitoring. This position offers an excellent opportunity for someone looking to grow into a more advanced SRE or DevOps role.

Key Responsibilities :

For L0 Support (Level 0) :

  • Incident Monitoring & Triage :
  • Respond to system alerts, monitor infrastructure health using tools like Prometheus , Grafana , and Observability for both OpenStack and Kubernetes.
  • Identify low-level issues and follow runbooks or predefined scripts to perform first-level triage.
  • Document and escalate unresolved incidents to L1 or L2 based on established escalation protocols.
  • System Health Checks :
  • Perform daily health checks for Kubernetes pods, nodes, and OpenStack instances.
  • Verify basic functionality of VMs , containers , and network services within the environment.
  • Basic Troubleshooting :
  • Resolve simple issues such as VM reboots, pod failures, and network connectivity issues within OpenStack or Kubernetes environments.
  • Follow the predefined steps for basic troubleshooting tasks like restarting services or clearing logs.
  • Ticket Management :
  • Log incidents and issues into a ticketing system (e.g., JIRA , ServiceNow ) for tracking and escalation.
  • Update incident tickets and provide relevant information for ongoing resolution efforts.

=========================================================================================================

For L1 Support (Level 1) :

  • Incident Resolution :
  • Investigate and resolve more complex issues compared to L0, such as Kubernetes pod crashes, network misconfigurations in OpenStack, and minor service disruptions.
  • Work with tools like kubectl to troubleshoot Kubernetes pods and nodes, and OpenStack CLI to diagnose problems with VMs, storage, and networks.
  • Automation & Scripting :
  • Automate routine tasks, such as VM provisioning, pod deployments, or status checks, using basic scripting languages ( Python , Bash ).
  • Improve automation workflows based on feedback and frequently encountered issues.
  • Log Aggregation & Monitoring :
  • Review logs and metrics collected from ELK Stack , Prometheus , Grafana , or other logging tools to detect trends and potential issues.
  • Analyze logs and metrics from OpenStack and Kubernetes clusters to pinpoint underlying problems (e.g., high CPU usage, memory leaks).
  • Basic Network & Storage Management :
  • Investigate networking issues related to Neutron (for OpenStack) and CNI configurations (for Kubernetes).
  • Manage storage resources within OpenStack and Kubernetes (e.g., creating persistent volumes, debugging storage access issues).
  • Collaboration & Escalation :
  • Work closely with L2 and L3 engineers for complex troubleshooting or advanced system issues that require in-depth knowledge.
  • Share knowledge with the team and assist in creating new documentation or updating existing troubleshooting guides.
  • User and Permissions Management :
  • Perform basic user management tasks within OpenStack (e.g., creating and managing tenants, security groups).
  • Review and modify Kubernetes RBAC (Role-Based Access Control) settings based on user access needs.
  • Skills & Qualifications :

    Required Skills :

  • Basic Cloud & Kubernetes Knowledge :
  • Familiarity with OpenStack architecture (e.g., Nova , Neutron , Cinder ).
  • Basic understanding of Kubernetes components, including pods , services , deployments , and namespaces .
  • Systems & Networking :
  • Knowledge of Linux / Unix-based operating systems (e.g., Ubuntu , CentOS , Red Hat ).
  • Understanding of networking concepts like DNS , IP routing , and VLANs in cloud environments.
  • Monitoring & Alerting Tools :
  • Familiarity with monitoring tools like Prometheus , Grafana , Zabbix , or CloudWatch for alert management and system health monitoring.
  • Troubleshooting & Incident Response :
  • Experience in using log aggregation tools ( ELK stack , Splunk ) and interpreting logs for incident detection.
  • Ability to perform basic troubleshooting steps (e.g., restarting services, running basic shell commands) to resolve issues.
  • Communication Skills :
  • Strong communication skills to collaborate effectively with senior SREs, developers, and other teams.
  • Ability to document incidents, solutions, and troubleshooting steps clearly.
  • Preferred Skills :

  • Basic Scripting & Automation :
  • Exposure to scripting languages such as Bash , Python , or Go to automate basic administrative tasks.
  • Cloud Platform Experience :
  • Familiarity with other cloud technologies such as AWS , Azure , or Google Cloud Platform .
  • Certifications :
  • Basic certifications such as CompTIA Linux+ , AWS Certified Solutions Architect , Kubernetes Fundamentals (CKA), or OpenStack COA are a plus.
  • Create a job alert for this search

    Site Reliability Engineer • ludhiana, India

    Related jobs
    DevOps Engineer

    DevOps Engineer

    go4WorldBusiness.com - Import | Export | Trade | Worldwide. • ludhiana, punjab, in
    If you love writing scripts more than clicking around dashboards, this role is for you.You’ll be responsible for managing and improving our AWS-based infrastructure, CI / CD pipelines, and monitoring...Show more
    Last updated: 23 days ago • Promoted
    Distributed System Engineer - Backend Infrastructure

    Distributed System Engineer - Backend Infrastructure

    Avensys Consulting • ludhiana, punjab, in
    Avensys is a reputed global IT professional services company headquartered in Singapore.Our service spectrum includes enterprise solution consulting, business intelligence, business process automat...Show more
    Last updated: 3 days ago • Promoted
    Full Stack Engineer

    Full Stack Engineer

    Insight Global • ludhiana, punjab, in
    Contract with Insight Global Client.React, React Native, TypeScript.React, React Native, and TypeScript.Deploy containerized solutions using. Ensure high-quality deliverables through.CI / CD pipelines...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer

    Senior Software Engineer

    BrightEdge • ludhiana, punjab, in
    Staff / Senior Software Engineer, Remote.The Software Engineer will be a critical individual contributor responsible for designing collection strategies, developing, and maintaining robust and scala...Show more
    Last updated: 30+ days ago • Promoted
    Lead RTL Design Engineer

    Lead RTL Design Engineer

    Tessolve • ludhiana, punjab, in
    Very strong expertise in Architecture micro-architecture development.Proven experience in RTL coding and RTL integration of sub-blocks into larger components. Ability to analyze Block / Sub-system req...Show more
    Last updated: 20 days ago • Promoted
    Full Stack Engineer (4-6 YOE)

    Full Stack Engineer (4-6 YOE)

    Redica Systems • ludhiana, punjab, in
    Redica Systems is a SaaS start-up serving more than 200 customers within the life science sector, with a specific focus on Pharmaceuticals and MedTech. Our workforce is distributed globally, with he...Show more
    Last updated: 3 days ago • Promoted
    DevOps Engineer

    DevOps Engineer

    CES • ludhiana, punjab, in
    We are seeking a highly skilled.Site Reliability Engineer (SRE) / DevOps Engineer.In this role, you will partner across engineering teams to enhance platform reliability, accelerate delivery, and e...Show more
    Last updated: 2 days ago • Promoted
    Full Stack and AI Engineer

    Full Stack and AI Engineer

    Loam.ai • ludhiana, punjab, in
    AI Consulting startup that designs and deploys custom artificial‑intelligence solutions for forward‑thinking businesses.We couple state‑of‑the‑art GenAI techniques with rock‑solid engineering to tu...Show more
    Last updated: 2 days ago • Promoted
    Full Stack Engineer

    Full Stack Engineer

    Beast Insights • ludhiana, punjab, in
    We’re building the Payment Command Center for high-risk merchants — a platform that helps businesses recover failed payments, prevent chargebacks, and boost approval rates using data and intelligen...Show more
    Last updated: 2 days ago • Promoted
    Observability Engineer (Cloud Engineer) (Otel, AWS, Grafana)

    Observability Engineer (Cloud Engineer) (Otel, AWS, Grafana)

    FICO • ludhiana, punjab, in
    FICO is seeking a Full-Stack observability Lead Engineer to design, maintain, and optimize our observability platform.The ideal candidate will be an expert in Open telemetry(Otel) instrumentation a...Show more
    Last updated: 21 days ago • Promoted
    Data Center Engineer

    Data Center Engineer

    Estarta Solutions • ludhiana, punjab, in
    Job Title : Datacenter Engineer.Estarta is seeking a skilled Datacenter Engineer to support Cisco’s Customer Delivery Engineering function. The role focuses on delivering high-quality technical solut...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer

    Software Engineer

    BayOne Solutions • ludhiana, punjab, in
    You will work directly with Technical Leaders, Principal Engineers and Product Managers leading platform specific microservices development for solutions across all Networking products.You will wor...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Capgemini • Ludhiāna, Republic Of India, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show more
    Last updated: 30+ days ago • Promoted
    Azure-ETL Engineer

    Azure-ETL Engineer

    Coforge • ludhiana, punjab, in
    The ideal candidate will be responsible for designing, implementing, and maintaining scalable data solutions that support business intelligence and analytics needs. Design, implement, and maintain d...Show more
    Last updated: 30+ days ago • Promoted
    Full Stack Engineer

    Full Stack Engineer

    Programmers.io • ludhiana, punjab, in
    Job Title : Senior Full Stack Developer (Laravel + Vue).We are seeking highly skilled Senior Full Stack Developers with 7–10 years of experience in Laravel and modern frontend frameworks (Vue.The ca...Show more
    Last updated: 12 days ago • Promoted
    AppScan Product _Lead Security Expert _Remote Location

    AppScan Product _Lead Security Expert _Remote Location

    HCLSoftware • ludhiana, punjab, in
    Remote
    Greetings from “HCL Software” Is a Product Development Division of HCL Tech!!.HCL Software” : - Is a Product Development Division of HCL Tech : That operates its primary Software Business.At HCL Soft...Show more
    Last updated: 2 days ago • Promoted
    Databricks Data Engineer Lead – Sustainability Project

    Databricks Data Engineer Lead – Sustainability Project

    Blue Cloud Softech Solutions Limited • ludhiana, punjab, in
    BCSS is seeking a Databricks Data Engineer to support its enterprise-wide Sustainability initiative.The engineer will be responsible for building data pipelines and models to support product-level ...Show more
    Last updated: 8 days ago • Promoted
    Lead / Principal Software Engineer (NFS / File system)

    Lead / Principal Software Engineer (NFS / File system)

    DDN • ludhiana, punjab, in
    This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a globa...Show more
    Last updated: 3 hours ago • Promoted • New!