Talent.com
Principal Engineer, Site Reliability

Principal Engineer, Site Reliability

TMUS Global SolutionsHyderabad, India
30+ days ago
Job description

About the Role

The Principal Engineer, Site Reliability (SRE) will play a critical role in ensuring the stability, scalability, and operational excellence of Accounting and Finance platforms. This role is focused on leading the operational health of these platforms, ensuring the delivery of highly reliable financial applications and data services that meet the demanding requirements of accuracy, compliance, and availability to support business operations.

As a Principal SRE, you will build automation, implement monitoring, improve incident response, and champion DevOps practices that enable Finance and Accounting systems to operate with consistency and trustworthiness, while also coaching and mentoring junior SREs to ensure overall operational excellence.

What Youll Do

Operational Oversight : Own day-to-day operations for Accounting and Finance applications and data platforms, ensuring they run smoothly and meet business expectations.

Reliability & Availability : Ensure Accounting and Finance platforms meet defined SLAs, SLOs, and SLIs for performance, reliability, and uptime.

Automation & Efficiency : Build automation for deployments, monitoring, scaling, and self-healing capabilities to reduce manual effort and operational risk.

Observability & Monitoring : Implement and maintain comprehensive monitoring, alerting, and logging for accounting applications and data pipelines (e.g., Snowflake, dbt workflows, ERP integrations).

Incident Response : Lead and participate in on-call rotations, perform root cause analysis, and drive improvements to prevent recurrence of production issues.

Operational Excellence : Establish and enforce best practices for capacity planning, performance tuning, disaster recovery, and compliance controls in financial systems.

Collaboration with Engineering & Finance : Partner with software engineers, data engineers, and Finance / Accounting teams to ensure operational needs are met from development through production.

Team Coordination : Manage workload, priorities, and escalations for operations staff and partner teams, ensuring alignment with SLAs and compliance requirements.

Security & Compliance : Ensure financial applications and data pipelines meet audit, compliance, and security requirements.

Continuous Improvement : Drive post-incident reviews, implement lessons learned, and proactively identify opportunities to improve system resilience.

Audit & Compliance Support : Ensure operational practices meet internal controls, audit requirements, and financial compliance standards.

What Youll Bring

Bachelors in Computer Science, Engineering, Information Technology, or related field (or equivalent experience).

7-12 years of experience in Site Reliability Engineering, DevOps, or Production Engineering, ideally supporting financial or mission-critical applications.

Strong experience with monitoring / observability tools (Datadog, Prometheus, Grafana, Splunk, or equivalent).

Hands-on expertise with CI / CD pipelines, automation frameworks, and IaC tools (Terraform, Ansible, GitHub Actions, Azure DevOps, etc.).

Familiarity with Snowflake, dbt, and financial system integrations from an operational support perspective.

Strong scripting / programming experience (Python, Bash, Go, or similar) for automation and tooling.

Proven ability to manage incident response and conduct blameless postmortems.

Experience ensuring compliance, security, and audit-readiness in enterprise applications.

Must Have Skills

SRE

SQL

Snowflake OR Databricks

DevOps OR CICD OR Github Actions

monitoring / observability tools (Datadog, Prometheus, Grafana, Splunk, or equivalent)

Automation

Nice To Have

Experience supporting financial applications (ERP, revenue recognition systems, accounting platforms).

Exposure to FinOps practices for optimizing cloud spend in finance-related platforms.

Familiarity with containers and orchestration (Docker, Kubernetes).

Experience building resilience into data pipelines and ensuring auditability for accounting data.

Strong communication skills to articulate operational issues and risks to both technical and non-technical stakeholders.

Create a job alert for this search

Site Reliability Engineer • Hyderabad, India

Related jobs
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Zyoin GroupHyderabad
Description : As the most senior technical individual contributor within an entire division of Engine...Show moreLast updated: 23 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Tata Consultancy ServicesHyderabad, Telangana, India
GKE(Preferable); Kubernetes (Any cloud) + PostgresSQL, SQL(Must) Linux (Optional), Java (Optional) , Kubernetes (CLI), Prior Production support experience, Release Management, Prior Deployment exp...Show moreLast updated: 30+ days ago
  • Promoted
Sr Engineer, Site Reliability [T500-20425]

Sr Engineer, Site Reliability [T500-20425]

TMUS Global SolutionsHyderabad, Telangana, India
NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Prometheus consultingHyderabad
WHAT YOU'LL DO : - Support, maintain, and enhance the reliability, scalability, and performance of our Azure-based Data Analytics Platform. Collaborate closely with Data En...Show moreLast updated: 27 days ago
  • Promoted
  • New!
Site Reliability Engineer

Site Reliability Engineer

Awign ExpertGreater Hyderabad Area, India
Position : SRE Observability Engineer.Mandatory Skills : Observability, Grafana and Writing queries using Prometheus and Loki. We are seeking a highly experienced and driven Senior Observability Engin...Show moreLast updated: 21 hours ago
  • Promoted
  • New!
Site Reliability Engineer

Site Reliability Engineer

Yum! India Global Services Private Limitedsecunderabad, India
Design, test, implement, deploy, and support continuous integration pipelines that build and deploy to cloud-based environments (development, stage / testing, production). In this role, you will help ...Show moreLast updated: 6 hours ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Inspire Brands Hyderabad Support CenterHyderabad, India
Inspire Brands is disrupting the restaurant industry through digital transformation and operational efficiencies.The companys technology hub, Inspire Brands Hyderabad Support Center, India, will le...Show moreLast updated: 26 days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

AutoRABITHyderabad, Telangana, India
AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show moreLast updated: 30+ days ago
  • Promoted
Sr Engineer, Site Reliability

Sr Engineer, Site Reliability

TMUS Global SolutionsHyderabad, India
As a Senior Site Reliability Engineer, you will be a key member of the CFL Platform Engineering and Operations team you will play a pivotal role in building and scaling intelligent infrastructure t...Show moreLast updated: 30+ days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Elios TalentHyderabad, Telangana, India
Senior Site Reliability Engineer.Build, scale, and optimize cloud-native infrastructure powering global, high-availability platforms. Drive automation-first engineering across AWS, Terraform, CI / CD,...Show moreLast updated: 3 days ago
  • Promoted
Site Reliability Engineer [T500-21132]

Site Reliability Engineer [T500-21132]

InspireHyderabad, Telangana, India
Inspire Brands is disrupting the restaurant industry through digital transformation and operational efficiencies.The company’s technology hub, Inspire Brands Hyderabad Support Center, India, will l...Show moreLast updated: 18 days ago
  • Promoted
Engineer - Site Relibility - FPT

Engineer - Site Relibility - FPT

Talent500 INCHyderabad, India
Engineer - Site Reliability - FPT.As a Site Reliability Engineer, youll play a crucial role in keeping our digital backbone running seamlessly for millions of customers. Your mission : reduce inciden...Show moreLast updated: 30+ days ago
  • Promoted
Principal Site Reliability Engineer - IAC Terraform

Principal Site Reliability Engineer - IAC Terraform

TidyhireHyderabad
Description : This is a pure individual contributor role.Core Responsibilities : Infrastructure Design &...Show moreLast updated: 30+ days ago
  • Promoted
Lead Platform Engineer - Site Reliability

Lead Platform Engineer - Site Reliability

Prometheus consultingHyderabad
Description : What You Will Own : - Build, manage, and mentor a high-performing Platform Engineeri...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Elios TalentHyderabad, Telangana, India
Build, automate, and support cloud-native infrastructure powering high-availability platforms.Contribute to automation-first engineering across AWS, Terraform, CI / CD, and observability tooling.Impr...Show moreLast updated: 3 days ago
  • Promoted
Engineer, Site Reliability

Engineer, Site Reliability

TMUS Global SolutionsHyderabad, India
Engineer reliability : Identify potential system issues early, implement preventive measures, and boost system resilience. Automate for speed : Build tools, pipelines, and scripts that eliminate manua...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

VXI Global SolutionsHyderabad, Telangana, India
We are looking for a Site Reliability Engineer with 3+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications.The id...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

NationsBenefits IndiaHyderabad, Telangana, India
Site Reliability Engineer (SRE) | Fintech | Kubernetes | Datadog |.SRE team focused on maintaining the performance, reliability, and availability of our fintech platforms.Triage and resolve product...Show moreLast updated: 30+ days ago