Site Reliability Engineer

ElgebraChennai

30+ days ago

Job description

Role Overview :

We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our client, Qincline. The ideal candidate will have 7 or more years of dedicated experience in Site Reliability Engineering or a closely related discipline. This pivotal role requires a strong focus on ensuring the reliability, scalability, performance, and operational efficiency of large-scale, complex production systems. You'll be instrumental in bridging the gap between development and operations by applying engineering principles to operational challenges.

Key Responsibilities :

Reliability & Performance Engineering :

System Reliability : Design, build, and maintain robust, fault-tolerant production systems and infrastructure to meet stringent Service Level Objectives (SLOs).
Performance Tuning : Proactively identify and resolve performance bottlenecks across the entire application stack, from infrastructure to application code.
Automation : Develop and implement automation for operational tasks, infrastructure provisioning, deployment, and monitoring to eliminate manual toil.
Capacity Planning : Collaborate with development teams on capacity planning, forecasting demand, and ensuring the infrastructure can scale efficiently to meet future business needs.

Operations & Incident Management :

Monitoring & Alerting : Establish and maintain comprehensive monitoring, logging, and alerting systems to gain deep visibility into system health and performance (e.g., using Prometheus, Grafana, ELK Stack, etc.).

Incident Response : Serve as a key responder during critical incidents, performing rapid triage, mitigation, and recovery.

Post-Mortems & RCA : Lead detailed Post-Mortem and Root Cause Analysis (RCA) processes for all significant incidents, ensuring that permanent fixes and preventative measures are implemented to prevent recurrence.

On-Call : Participate in a periodic on-call rotation to provide 24 / 7 support for critical production systems.

Tooling & Infrastructure :

CI / CD & DevOps : Enhance and manage CI / CD pipelines to facilitate fast, reliable, and automated software releases.

Containerization & Orchestration : Manage and optimize containerized environments using Docker and Kubernetes.

Infrastructure as Code (IaC) : Utilize IaC tools (e.g., Terraform, Ansible) to provision and manage infrastructure in a repeatable and documented manner.

Required Skills & Experience :

Core Experience (7+ Years) :

Minimum 7 years of hands-on experience in a Site Reliability Engineer, DevOps Engineer, or Production Engineer role supporting high-availability, mission-critical production environments.

Deep expertise in establishing and improving system monitoring, logging, alerting, and telemetry practices.

Demonstrated experience with formal Incident Management processes and leading thorough Root Cause Analysis (RCA).

Technical Expertise :

Cloud Platforms : Extensive, hands-on experience with at least one major cloud provider (e.g., AWS, Azure, or GCP). This includes managing compute, networking, storage, and managed services.

Scripting & Programming : Strong proficiency in scripting and programming languages, with mandatory expertise in Python and Shell scripting for automation and tooling.

DevOps Tooling : Proven experience with CI / CD pipeline tools (e.g., Jenkins, GitLab CI, Azure DevOps), Git, and artifact repositories.

Containerization : Expert-level knowledge of Docker and robust experience with orchestrating large-scale deployments using Kubernetes.

Operating Systems : Strong command of Linux / Unix operating systems and networking fundamentals (TCP / IP, DNS, Load Balancing).

Desired Qualifications (Good to Have) :

Experience with configuration management tools (e.g., Ansible, Chef, Puppet).

Familiarity with service mesh technologies (e.g., Istio, Linkerd).

Knowledge of database administration and performance tuning (SQL / NoSQL).

Certifications related to SRE, Cloud (e.g., AWS Certified DevOps Engineer), or Kubernetes (CKA, CKAD).

(ref : hirist.tech)

Create a job alert for this search

Site Reliability Engineer • Chennai

Related jobs

Promoted

Site Reliability Engineer

Tata Consultancy ServicesChennai, Tamil Nadu, India

GKE(Preferable); Kubernetes (Any cloud) + PostgresSQL, SQL(Must).Linux (Optional), Java (Optional) , Kubernetes (CLI), Prior Production support experience, Release Management, Prior Deployment expe...Show moreLast updated: 24 days ago

Promoted
New!

Subsurface Reliability Engineer

Chevronchennai, tamil nadu, in

The Subsurface Reliability Engineer is part of the Production Engineering team within the Chevron ENGINE Center and is responsible for ensuring the reliability and efficiency of subsurface operatio...Show moreLast updated: 22 hours ago

Promoted

AWS Site Reliability Engineer

HTC Global ServicesChennai, Tamil Nadu, India

Troy, Michigan, is a leading global Information Technology solution and BPO provider.HTC assists clients across multiple industry verticals, offering turnkey project lifecycle in, e-business, data ...Show moreLast updated: 14 days ago

Promoted

Site Reliability Engineer

Intellistaff Services Pvt. LtdChennai, Tamil Nadu, India

Role : Cloud Engineer - SRE Experience : 6+ Location : Chennai Fulltime - Hybrid Required Skills : - 6+ years' experience SRE, 3+ years in Public Cloud & Cloud Engineering - GCP experience (prefer...Show moreLast updated: 1 day ago

Promoted
New!

Senior Site Reliability Engineer

Peoplefychennai, tamil nadu, in

We’re looking for an SRE who can.Define SLIs / SLOs for Tier-0 / Tier-1 services & review quarterly.Change gating via CI / CD based on error budgets. Azure Monitor / Grafana / Prometheus / App Insights da...Show moreLast updated: 22 hours ago

Promoted

Site Engineer

Davidson Engineers and ContractorsChennai, Tamil Nadu, India

A Site Engineer is responsible for managing and supervising construction projects on-site.They work closely with the project team, subcontractors, and construction workers to.Oversee and manage the...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineering (SRE)

Tata Consultancy ServicesChennai, Tamil Nadu, India

TCS has been a great pioneer in feeding the fire of Young Techies like you.We are a global leader in the technology arena and there's nothing that can stop us from growing together.Location - Benga...Show moreLast updated: 3 days ago

Promoted

Site Reliability Engineer

People Prime Worldwidechennai, tamil nadu, in

Our client is a French multinational information technology (IT) services and consulting company, headquartered in Paris, France. Founded in 1967, It has been a leader in business transformation for...Show moreLast updated: 30+ days ago

Promoted
New!

Site Reliability Engineer

VXI Global Solutionschennai, tamil nadu, in

We are looking for a Site Reliability Engineer with 3+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications.The id...Show moreLast updated: 22 hours ago

Promoted

Staff Site Reliability Engineer

PoshmarkChennai, Tamil Nadu, India

We’re looking for an experienced.You will use your background as an operations generalist to work closely with our development teams from the early stages of design all the way through identifying ...Show moreLast updated: 27 days ago

Promoted

Site Reliability Engineer

Grootan TechnologiesChennai, Tamil Nadu, India

Site Reliability Engineer (SRE).In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications.You will leverage your e...Show moreLast updated: 4 days ago

Promoted
New!

Senior Site Reliability Engineer

Synechronchennai, tamil nadu, in

We have immediate opportunity for.SRE (Senior Site Reliability Engineer) 5+ years.SRE (Senior Site Reliability Engineer). We began life in 2001 as a small, self-funded team of technology specialists...Show moreLast updated: 22 hours ago

Promoted

Senior Site Reliability Engineer

IntraEdgeChennai, IN

Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 27 days ago

Promoted

Site Reliability Engineer

ACL Digitalchennai, tamil nadu, in

ACL Digital is Hiring for the Below position.ACL Digital, part of the ALTEN Group, is a trusted AI-led, Digital & Systems Engineering Partner driving innovation by designing and building intelligen...Show moreLast updated: 13 days ago

Promoted
New!

TCS Walkin Drive For Site Reliability Engineering (SRE)

Tata Consultancy ServicesChennai, Tamil Nadu, India

Site Reliability Engineering (SRE)Ops.TCS has been a great pioneer in feeding the fire of young Techies like you.We are a global leader in the technology arena and there’s nothing that can stop us ...Show moreLast updated: 7 hours ago

Promoted
New!

Site Engineer

Solarsurechennai, tamil nadu, in

We are hiring a detail-oriented and technically skilled Site Engineer to monitor and support on-ground civil, electrical and mechanical works as per engineering drawings and quality standards, ensu...Show moreLast updated: 22 hours ago

Promoted

Site Reliability Engineer

Datum Technologies GroupChennai, Tamil Nadu, India

Site Reliability Engineer (SRE) – Azure & AI.Work Location : Chennai / Mumbai / Gurgaon.We are looking for an experienced. Site Reliability Engineer (SRE).The ideal candidate will have a solid background...Show moreLast updated: 5 days ago

Promoted

Site Reliability Engineer (SRE) – Infrastructure & Automation

InstaServiceChennai, IN

InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show moreLast updated: 12 days ago