Talent.com
Principal Systems Reliability Engineer
Principal Systems Reliability EngineerElios Talent • Hyderabad, Telangana, India
Principal Systems Reliability Engineer

Principal Systems Reliability Engineer

Elios Talent • Hyderabad, Telangana, India
1 day ago
Job description

Senior Site Reliability Engineer

Key Highlights

️ Build, scale, and optimize cloud-native infrastructure powering global, high-availability platforms

⚡ Drive automation-first engineering across AWS, Terraform, CI / CD, observability, and resilient systems

Own reliability, uptime, system health, costs, and performance across mission-critical environments

Strengthen DevSecOps practices—improving security, delivery velocity, and operational excellence

Lead major incident response, troubleshoot complex issues, and uphold production stability at scale

Position Overview

We are seeking a Senior Site Reliability Engineer to drive reliability, automation, and performance for large-scale, cloud-based platforms. This role blends deep technical engineering, systems thinking, DevOps collaboration, and operational leadership.

You will design and implement scalable infrastructure, improve observability, enhance resiliency, manage incident operations, and champion modern DevSecOps practices. This role plays a critical part in supporting tens of thousands of daily users while ensuring platforms remain secure, fast, and highly available.

Key Responsibilities

Cloud Engineering

Architect, deploy, and optimize AWS environments using automation and Infrastructure-as-Code

Build tooling that increases predictability, stability, and delivery speed

Optimize systems for scale, reliability, cost, and performance

Maintain repeatable, traceable, and transparent infrastructure through Terraform and automation

Monitor cloud spend and usage, ensuring alignment with service-level objectives

Observability & Reliability

Own uptime, reliability, system security, performance metrics, and golden signals

Lead incident management and triage bridges during major events

Enhance telemetry systems (NewRelic, CloudWatch, DataDog) for deep operational visibility

Use data-driven analysis to improve system stability and customer experience

Ensure architecture and deployment patterns meet SLAs and reliability goals

DevSecOps & Automation

Strengthen CI / CD pipelines, code-review practices, and engineering standards

Partner with Cybersecurity to address vulnerabilities through automation

Support secure, consistent, and scalable delivery workflows across engineering teams

Resiliency Engineering

Identify failure points, blast-radius risks, and architectural gaps

Run failure-injection / chaos testing to validate resiliency

Forecast traffic, plan for seasonal peaks, and scale systems for 2x+ load scenarios

Drive improvements to infrastructure and software to meet resiliency targets

Leadership & Collaboration

Mentor engineers across levels, promoting high-quality engineering practices

Collaborate daily with product, engineering, and security teams in a DevOps model

Document, uplift, and share knowledge through cross-team forums and best practices

Qualifications

Experience as a software engineer with strong debugging + deployment skills

Hands-on expertise with AWS and Terraform (required)

Experience with ECS, and Kubernetes / EKS experience strongly preferred

Strong proficiency in Python, Golang, Bash, and automation frameworks

CI / CD experience with Jenkins, GitHub Enterprise, CircleCI, or similar

Ability to troubleshoot across web servers, app servers, OS, networks, storage, and databases

Experience running large-scale, high-availability production systems

Strong communication, root-cause analysis, and incident leadership skills

BS in Computer Science or equivalent industry experience

About Us

We build scalable, secure, and high-performing digital platforms that power global user experiences. By combining cloud engineering, automation, observability, and resilient systems design, we help organizations operate more reliably, innovate faster, and support long-term platform stability and growth.

Why Join Us

Join a forward-thinking engineering organization where reliability, automation, and performance are core values. You’ll work with a modern cloud stack, collaborate with exceptional engineers, and own meaningful technical impact across large-scale applications. This is an opportunity to shape infrastructure strategy, elevate engineering practices, and build systems that support millions with consistency and excellence.

Create a job alert for this search

Reliability Engineer • Hyderabad, Telangana, India

Related jobs
Lead Engineer

Lead Engineer

Hyqoo • Hyderabad, IN
Design, deploy, and manage AWS cloud infrastructure, including EC2 instances, S3 buckets, VPCs, RDS databases, and Lambda functions. Assist in the design, implementation, and maintenance of backup, ...Show more
Last updated: 25 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Insight Global • Hyderabad, IN
Contract with Insight Global Client.Join our Site Reliability Engineering (SRE) team as a Fullstack Developer, focused on building and maintaining highly reliable, automated, and scalable systems.Y...Show more
Last updated: 30+ days ago • Promoted
Sr Systems Engineer Linux – AI Infrastructure

Sr Systems Engineer Linux – AI Infrastructure

DC Tech Consulting • Hyderabad, IN
Position : Senior Linux Administrator – AI / ML Infrastructure.We are seeking a highly skilled Senior Linux Administrator to join our team, focusing on the implementation and management of on-premises...Show more
Last updated: 30+ days ago • Promoted
System Engineer II - SE 2

System Engineer II - SE 2

Straive • Hyderabad, IN
LearningMate / Straive and MGT Impact Solutions, LLC (MGT) have established a strategic global partnership designed to deliver world-class advisory, technology, and operational solutions for public s...Show more
Last updated: 6 days ago • Promoted
Lead Systems Reliability Engineer

Lead Systems Reliability Engineer

TMUS Global Solutions • Hyderabad, Telangana, India
About T-Mobile : T-Mobile US, Inc.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagshi...Show more
Last updated: 1 day ago • Promoted
Site Reliability Engineer (SRE) – Infrastructure & Automation

Site Reliability Engineer (SRE) – Infrastructure & Automation

InstaService • Hyderabad, IN
InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
Last updated: 28 days ago • Promoted
Senior Systems Engineer II, SRE

Senior Systems Engineer II, SRE

Marriott Tech Accelerator • Hyderabad, India
Marriott Tech Accelerator is part of Marriott International, a global leader in hospitality.American multinational company that operates a vast array of lodging brands, including hotels and residen...Show more
Last updated: 30+ days ago • Promoted
GCP Site Reliability Engineer

GCP Site Reliability Engineer

inTune Systems Inc • Hyderabad, Telangana, India
We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team.As an SRE, you will play a key role in ensuring the reliability, scalability, and performance of our...Show more
Last updated: 10 days ago • Promoted
Principal Systems Engineer

Principal Systems Engineer

FACTSET • Hyderabad, India
FactSet creates flexible, open data and software solutions for over 200,000 investment professionals worldwide, providing instant access to financial data and analytics that investors use to make c...Show more
Last updated: 4 days ago • Promoted
Principal Engineer, Site Reliability

Principal Engineer, Site Reliability

TMUS Global Solutions • Hyderabad, India
The Principal Engineer, Site Reliability (SRE) will play a critical role in ensuring the stability, scalability, and operational excellence of Accounting and Finance platforms.This role is focused ...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Elios Talent • Hyderabad, Telangana, India
Senior Site Reliability Engineer.Build, scale, and optimize cloud-native infrastructure powering global, high-availability platforms. Drive automation-first engineering across AWS, Terraform, CI / CD,...Show more
Last updated: 15 days ago • Promoted
Principal Systems Engineer

Principal Systems Engineer

Scale Global Ventures • Hyderabad, Telangana, India
About Us We are building the operating system that enables brands to expand and scale across global markets — integrating regulatory workflows, product launch and activation processes, and multi...Show more
Last updated: 7 hours ago • Promoted • New!
Site Reliability Engineer

Site Reliability Engineer

Capgemini • Hyderabad, IN
Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show more
Last updated: 30+ days ago • Promoted
Engineer - Site Relibility - FPT

Engineer - Site Relibility - FPT

Talent500 INC • Hyderabad, India
Engineer - Site Reliability - FPT.As a Site Reliability Engineer, youll play a crucial role in keeping our digital backbone running seamlessly for millions of customers. Your mission : reduce inciden...Show more
Last updated: 30+ days ago • Promoted
Senior Systems Reliability Engineer

Senior Systems Reliability Engineer

inTune Systems Inc • Hyderabad, Telangana, India
Job Summary : We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team.As an SRE, you will play a key role in ensuring the reliability, scalability, and per...Show more
Last updated: 1 day ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Pagos Consultants • Hyderabad, IN
This team will play a pivotal role in spearheading innovation.As such, you will have the opportunity to shape the early architecture and design of the system and set the trajectory for its future d...Show more
Last updated: 7 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Elios Talent • Hyderabad, Telangana, India
Build, automate, and support cloud-native infrastructure powering high-availability platforms.Contribute to automation-first engineering across AWS, Terraform, CI / CD, and observability tooling.Impr...Show more
Last updated: 15 days ago • Promoted
Engineer, Site Reliability [T500-20266]

Engineer, Site Reliability [T500-20266]

TMUS Global Solutions • Hyderabad, Telangana, India
NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show more
Last updated: 30+ days ago • Promoted