Talent.com
This job offer is not available in your country.
▷ (3 Days Left) Site Reliability Engineer

▷ (3 Days Left) Site Reliability Engineer

TalentiserIndia
1 day ago
Job description

YOUR IMPACT :

Reliability, Automation, and Observability As a hybrid Site Reliability Engineer / DevOps Engineer, you'll be a key driver in ensuring the stability, performance, and scalability of our mission-critical SaaS platform. You'll apply engineering principles to operational challenges, constantly striving to eliminate toil through automation.

Operational Excellence & Reliability

  • Provide day-to-day management of system alerts, check system health, and escalate issues as necessary to maintain high availability.
  • Actively participate in a 24x7 on-call rotation for critical SaaS platform incidents, and be available in case of emergencies.
  • Lead the incident response process, ensuring fast and effective mitigation and resolution of production issues.
  • Perform thorough Root Cause Analysis (RCA) and lead blameless post-mortems to identify systemic weaknesses and create a corrective action plan to prevent recurrence.
  • Collaborate with engineering teams to set and enforce error budgets (derived from SLOs, or Service Level Objectives), ensuring a healthy balance between development speed and system stability.

Platform Automation & Infrastructure Development

  • Automate routine operational tasks to reduce manual effort and "toil" and increase overall team efficiency.
  • Design, deploy, and maintain cloud infrastructure using Infrastructure as Code (IaC), specifically leveraging Terraform and Helm for deployment to EKS / K8s clusters.
  • Improve existing infrastructure health by developing and implementing checks and scripts to proactively correct known issues and self-heal the platform.
  • Maintain, develop, and evolve our Continuous Integration / Continuous Delivery (CI / CD) deployment code and pipelines.
  • Learn and maintain existing infrastructure running under Docker and Docker Swarm while driving migration strategies toward EKS / K8s.
  • Implement and integrate new technologies and services into our Cloud Infrastructure to enhance platform capabilities and resilience.
  • Monitoring & Observability

  • Design and implement comprehensive Observability strategies across all three pillars : Metrics, Logs, and Traces.
  • Proactively create and refine robust monitoring and alerting configurations within the EKS / K8s ecosystem.
  • Utilize and maintain our Observability platform, Datadog, to gather performance data, create complex synthetic tests, and visualize system health via dashboards.
  • Leverage existing monitoring solutions such as Grafana and Prometheus while planning and executing the migration or integration of data into a unified platform.
  • Document all issues, remediation steps, system architecture, and runbooks to facilitate knowledge transfer and rapid incident response.
  • Collaborate closely with Support, Customer Success, Migration, and Professional Services teams to provide the highest level of SaaS service and minimize customer impact during changes.
  • Apply a real customer focus when planning deployments / updates, always considering the impact on the end-user before making changes.
  • YOUR EXPERIENCE : Essential Skills and Qualifications

  • Hands-on AWS Cloud Engineer experience, with expert working knowledge of the AWS Cloud ecosystem, including a good understanding of AWS IAM roles and policies.
  • Proficiency with container orchestration technologies : EKS / Kubernetes (K8s).
  • Demonstrable experience with Infrastructure as Code (IaC) tools, specifically Terraform and Helm.
  • Working experience with Docker and maintaining systems using Docker Swarm.
  • Expertise in setting up and managing logging and monitoring solutions. Direct experience with Datadog is highly preferred, with experience in setting up APM, infrastructure monitoring, and custom dashboards.
  • Experience with existing monitoring solutions such as Grafana and Prometheus is required.
  • Proficient in a Linux environment and strong skills in Bash and / or Python scripting for automation and troubleshooting.
  • A strong understanding of web technologies, including REST APIs, Systems Architecture, Design, and Databases.
  • Experience in Product / Application Support for high-availability SaaS-based products.
  • Experience in designing, implementing, and operating in a DevSecOps environment.
  • Excellent oral and written communication skills, with the ability to clearly explain complex technical issues and RCAs to both technical and customer-facing audiences.
  • Create a job alert for this search

    Site Reliability Engineer • India

    Related jobs
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Nagpur, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    WSO2nagpur, maharashtra, in
    Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    JRD SystemsIndia
    Site Reliability Engineer (SRE) Role Overview : We are seeking an experienced Site Reliability Engineer (SRE) with a strong background in. The ideal candidate will partner with development teams to i...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    QualityKiosk TechnologiesIndia
    QualityKiosk Technologies is one of the world's largest independent Quality Engineering (QE) providers and digital transformation enablers, helping companies build and manage applications for optim...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TalentiserIndia
    YOUR IMPACT : Reliability, Automation, and Observability As a hybrid Site Reliability Engineer / DevOps Engineer, you'll be a key driver in ensuring the stability, performance, and scalability of our ...Show moreLast updated: 13 days ago
    • Promoted
    • New!
    ▷ [Apply Now] Site Reliability Engineer

    ▷ [Apply Now] Site Reliability Engineer

    o9 Solutions, Inc.India
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 1 hour ago
    • Promoted
    Software Engineer, Site Reliability Engineering (Ecoh Core)

    Software Engineer, Site Reliability Engineering (Ecoh Core)

    EcohNagpur, IN
    Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.Strong problem-solving and analytical skills. Ability to debug, optimize code, and automate routine tasks.E...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TEKsystemsIndia
    SRE – Site Reliability Engineer : Experience : 6+ years Location : Bangalore Mode of work : Hybrid.Job Description The Resy Site Reliability Engineering group’s goal is to ensure Resy Customers can alw...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    o9 Solutions, Inc.nagpur, maharashtra, in
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmanagpur, maharashtra, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Endpoint ClinicalIndia
    Endpoint is an interactive response technology (IRT®) systems and solutions provider that supports the life sciences industry. Since 2009, we have been working with a single vision in mind, to help ...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Sonata SoftwareIndia
    We're Hiring : Senior Site Reliability Engineer.Onsite (Office : Hyderabad – Mandatory from Day 1).Senior Site Reliability Engineer (SRE). This is a high-impact role where you’ll design scalable archi...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Resource AlgorithmIndia
    Senior SRE (Engineering & Reliability) Job Summary : We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our...Show moreLast updated: 3 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    RecRootsIndia
    The core premise for the SRE lies in treating operational issues as a software problem.We code our way out of problems where operations are concerned, addressing availability, scalability, latency,...Show moreLast updated: 14 days ago
    Site Reliability Engineer- Platform Engineering

    Site Reliability Engineer- Platform Engineering

    Weekday AIIN
    Remote
    Quick Apply
    This role is for one of Weekday’s clients.We are looking for an experienced and motivated.Site Reliability Engineer (SRE) – Platform Engineering. In this role, you will be responsible for designing,...Show moreLast updated: 19 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Amicon Hub Servicesnagpur, maharashtra, in
    Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Batch Systems IncIndia
    Batch is a brand-first technology platform designed to amplify customer engagement, enable frictionless transactions, defend product authenticity, elevate customer loyalty, and ignite customer grow...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TechVeritoIndia
    As a SRE Engineer, you will have a strong background in cloud infrastructure management, migration and deployment, with expertise in Google Cloud Platform (GCP), DevOps tools, and Kubernetes ecosys...Show moreLast updated: 18 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    PoshmarkIndia
    We’re looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems are healthy, monitored, automated, and designed to scale...Show moreLast updated: 23 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    IntraEdgeIndia
    Job Title : Site Reliability Engineer (SRE) – Production Support Location : Bengaluru.Job Summary : We are looking for a skilled. Site Reliability Engineer (SRE).DevOps practices, and cloud infrastruct...Show moreLast updated: 30+ days ago