Talent.com
Site Reliability Engineer
Site Reliability EngineerWhiteLotus Talent Partners • ballari, India
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

WhiteLotus Talent Partners • ballari, India
2 days ago
Job description

We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high system availability, reliability, and performance. You will be responsible for identifying and addressing simple issues, as well as escalating more complex problems to senior SREs when needed.

The ideal candidate should have a basic understanding of cloud infrastructure (especially OpenStack and Kubernetes ), containerized environments , and system monitoring. This position offers an excellent opportunity for someone looking to grow into a more advanced SRE or DevOps role.

Key Responsibilities :

For L0 Support (Level 0) :

  • Incident Monitoring & Triage :
  • Respond to system alerts, monitor infrastructure health using tools like Prometheus , Grafana , and Observability for both OpenStack and Kubernetes.
  • Identify low-level issues and follow runbooks or predefined scripts to perform first-level triage.
  • Document and escalate unresolved incidents to L1 or L2 based on established escalation protocols.
  • System Health Checks :
  • Perform daily health checks for Kubernetes pods, nodes, and OpenStack instances.
  • Verify basic functionality of VMs , containers , and network services within the environment.
  • Basic Troubleshooting :
  • Resolve simple issues such as VM reboots, pod failures, and network connectivity issues within OpenStack or Kubernetes environments.
  • Follow the predefined steps for basic troubleshooting tasks like restarting services or clearing logs.
  • Ticket Management :
  • Log incidents and issues into a ticketing system (e.g., JIRA , ServiceNow ) for tracking and escalation.
  • Update incident tickets and provide relevant information for ongoing resolution efforts.

=========================================================================================================

For L1 Support (Level 1) :

  • Incident Resolution :
  • Investigate and resolve more complex issues compared to L0, such as Kubernetes pod crashes, network misconfigurations in OpenStack, and minor service disruptions.
  • Work with tools like kubectl to troubleshoot Kubernetes pods and nodes, and OpenStack CLI to diagnose problems with VMs, storage, and networks.
  • Automation & Scripting :
  • Automate routine tasks, such as VM provisioning, pod deployments, or status checks, using basic scripting languages ( Python , Bash ).
  • Improve automation workflows based on feedback and frequently encountered issues.
  • Log Aggregation & Monitoring :
  • Review logs and metrics collected from ELK Stack , Prometheus , Grafana , or other logging tools to detect trends and potential issues.
  • Analyze logs and metrics from OpenStack and Kubernetes clusters to pinpoint underlying problems (e.g., high CPU usage, memory leaks).
  • Basic Network & Storage Management :
  • Investigate networking issues related to Neutron (for OpenStack) and CNI configurations (for Kubernetes).
  • Manage storage resources within OpenStack and Kubernetes (e.g., creating persistent volumes, debugging storage access issues).
  • Collaboration & Escalation :
  • Work closely with L2 and L3 engineers for complex troubleshooting or advanced system issues that require in-depth knowledge.
  • Share knowledge with the team and assist in creating new documentation or updating existing troubleshooting guides.
  • User and Permissions Management :
  • Perform basic user management tasks within OpenStack (e.g., creating and managing tenants, security groups).
  • Review and modify Kubernetes RBAC (Role-Based Access Control) settings based on user access needs.
  • Skills & Qualifications :

    Required Skills :

  • Basic Cloud & Kubernetes Knowledge :
  • Familiarity with OpenStack architecture (e.g., Nova , Neutron , Cinder ).
  • Basic understanding of Kubernetes components, including pods , services , deployments , and namespaces .
  • Systems & Networking :
  • Knowledge of Linux / Unix-based operating systems (e.g., Ubuntu , CentOS , Red Hat ).
  • Understanding of networking concepts like DNS , IP routing , and VLANs in cloud environments.
  • Monitoring & Alerting Tools :
  • Familiarity with monitoring tools like Prometheus , Grafana , Zabbix , or CloudWatch for alert management and system health monitoring.
  • Troubleshooting & Incident Response :
  • Experience in using log aggregation tools ( ELK stack , Splunk ) and interpreting logs for incident detection.
  • Ability to perform basic troubleshooting steps (e.g., restarting services, running basic shell commands) to resolve issues.
  • Communication Skills :
  • Strong communication skills to collaborate effectively with senior SREs, developers, and other teams.
  • Ability to document incidents, solutions, and troubleshooting steps clearly.
  • Preferred Skills :

  • Basic Scripting & Automation :
  • Exposure to scripting languages such as Bash , Python , or Go to automate basic administrative tasks.
  • Cloud Platform Experience :
  • Familiarity with other cloud technologies such as AWS , Azure , or Google Cloud Platform .
  • Certifications :
  • Basic certifications such as CompTIA Linux+ , AWS Certified Solutions Architect , Kubernetes Fundamentals (CKA), or OpenStack COA are a plus.
  • Create a job alert for this search

    Site Reliability Engineer • ballari, India

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Pagos Consultants • Bellary, IN
    This team will play a pivotal role in spearheading innovation.As such, you will have the opportunity to shape the early architecture and design of the system and set the trajectory for its future d...Show more
    Last updated: 1 hour ago • Promoted • New!
    Lead Engineer

    Lead Engineer

    Hyqoo • Bellary, IN
    Design, deploy, and manage AWS cloud infrastructure, including EC2 instances, S3 buckets, VPCs, RDS databases, and Lambda functions. Assist in the design, implementation, and maintenance of backup, ...Show more
    Last updated: 18 days ago • Promoted
    MLOps Engineer

    MLOps Engineer

    EROS GenAI • Bellary, IN
    We are looking for an experienced MLOps Engineer to build and scale our AI infrastructure across Kubernetes, cloud-native environments, and serverless GPU platforms. You will own the end-to-end oper...Show more
    Last updated: 1 hour ago • Promoted • New!
    HYPERVISOR TEST ENGINEER (Foundation Level)

    HYPERVISOR TEST ENGINEER (Foundation Level)

    Piepeople Consulting Inc. • Bellary, IN
    Solid understanding of hypervisors, virtual machines (VMs), and core concepts like CPU, memory, and I / O allocation.Basic operating systems (especially Linux), hardware basics, and fundamental progr...Show more
    Last updated: 4 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Capgemini • Bellary, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show more
    Last updated: 30+ days ago • Promoted
    Founding MLOps Engineer

    Founding MLOps Engineer

    Vectorial AI • Bellary, IN
    Vectorial is a simulation engine platform powered by millions of synthetic users—state-of-the-art models that capture real human behavior—to deliver instant, nuanced validation across the entire pr...Show more
    Last updated: 17 days ago • Promoted
    SDE III

    SDE III

    interface.ai • Bellary, IN
    Our cutting-edge Generative AI-powered platform serves over 100 banks and credit unions, delivering hyper-personalized customer interactions across voice, chat, and employee-assisting solutions.To ...Show more
    Last updated: 1 hour ago • Promoted • New!
    Senior Site Reliability Engineer (C# / Python)

    Senior Site Reliability Engineer (C# / Python)

    Entech • Bellary, IN
    Senior Software Site Reliability Engineer (C# / Python).You’ll ensure enterprise systems are reliable, scalable, and performant - driving improvements, leading SRE initiatives, and mentoring teams on...Show more
    Last updated: 8 days ago • Promoted
    Emulation Engineer / Lead

    Emulation Engineer / Lead

    eInfochips (An Arrow Company) • Bellary, IN
    Role : Emulation Engineer / Lead.Job Location : Noida, Chennai, Bangalore, Hyderabad, Ahmedabad.You must be having BS or MS in Electrical OR Electronics engineering. Minimum 4+ Years of Emulation Expe...Show more
    Last updated: 30+ days ago • Promoted
    Senior Dell Boomi Integration Engineer

    Senior Dell Boomi Integration Engineer

    Maitsys • Bellary, IN
    Job Description : Senior Boomi Integration Engineer.Atom migration (on-prem → cloud), integration development, and ongoing support. Senior Dell Boomi Integration Engineer.Boomi Atom to a cloud-hosted...Show more
    Last updated: 9 days ago • Promoted
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaService • Bellary, IN
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
    Last updated: 21 days ago • Promoted
    Technical Lead

    Technical Lead

    Mphasis • Bellary, IN
    Looking for Senior Ingenium Developer with 10+ years' experience and following skills.Experience in Mainframe O / S and Development using COBOL programming language & JCL. Experience in development an...Show more
    Last updated: 8 days ago • Promoted
    AWS Data Engineer (Remote)

    AWS Data Engineer (Remote)

    Mindcraft Labs • Bellary, IN
    Remote
    This role focuses on building and maintaining data pipelines and analytics infrastructure on AWS.You will work daily with S3, Glue, Redshift, Athena, Lake Formation, Airflow, SNS / SQS, and Postgres ...Show more
    Last updated: 1 hour ago • Promoted • New!
    Senior / Staff Full‑Stack Engineer — CEO’s Build Partner (AI‑Augmented)

    Senior / Staff Full‑Stack Engineer — CEO’s Build Partner (AI‑Augmented)

    Truey • Bellary, IN
    Senior / Staff Full‑Stack Engineer — CEO’s Build Partner (AI‑Augmented) 🚀.C2C with your own LLC considered; NO staffing vendors — direct to Truey. You’ll turn ambiguous ideas into working software : d...Show more
    Last updated: 13 days ago • Promoted
    Senior DevOps & Database Reliability Engineer – 100% Remote

    Senior DevOps & Database Reliability Engineer – 100% Remote

    Hyly.AI • Bellary, IN
    Remote
    AI, we’re building the first AI + Data Fabric for the multifamily industry, transforming how clients manage, secure, and scale their marketing and operational data. As the industry moves toward a co...Show more
    Last updated: 15 days ago • Promoted
    Remote GenAI Engineer

    Remote GenAI Engineer

    EazyML • Bellary, IN
    Remote
    Founded by Bell Labs research veterans, and associated with breakthrough startups like Amelia, EazyML, specializes in Transparent Machine Learning. Early on EazyML founders saw the need for Transpa...Show more
    Last updated: 30+ days ago • Promoted
    Design Engineer - Plumbing (Hospitals)

    Design Engineer - Plumbing (Hospitals)

    WSP in India • Bellary, IN
    The role involves raising the team's technical competence by fostering continuous learning and keeping skills aligned with the latest industry practices. This includes implementing robust delivery a...Show more
    Last updated: 7 days ago • Promoted
    Senior RTL Design Engineer

    Senior RTL Design Engineer

    MosChip® • Bellary, IN
    Experience in Logic design / RTL coding is a must.Experience is SoC design and integration for complex SoCs is a must.Experience in Verilog / System-Verilog is a must. Experience in Multi Clock design...Show more
    Last updated: 8 days ago • Promoted