Warner Bros. Discovery - Azure Site Reliability Engineer

Warner Bros. DiscoveryHyderabad

1 day ago

Job description

Description :

Key Responsibilities :

Primarily accountable for managing Azure environments.
Design, implement and maintain highly available, scalable, and resilient infrastructure.
Identify, optimize and eliminate performance bottlenecks and proactively remediating security concerns through monitoring, profiling, and tuning.
Establish and improve SLOs, SLIs, and error budgets to drive system reliability.
Collaborate with stakeholders, including application developers, to improve application observability and optimize performance.
Lead and mentor a team of engineers working to reduce toil across the total team load, and to implement security features, roles, user access and privileges according to best practices.
Proactively identify, design, and implement process and architectural improvements.
Stay informed on the latest features and best practices across the Azure Public Cloud and the WBD Azure environment.
Work with peer group of complementary public cloud leads (AWS / GCP) to facilitate consistency across WBD management of resources wherever possible.

Methodology :

Automate deployment, monitoring, and self-healing capabilities to improve operational efficiency.

Develop and manage infrastructure using Terraform and other IaC tools.

Drive incident response efforts, conduct root cause analyses (RCA), and implement preventative measures to minimize downtime.

Build and enhance monitoring, alerting, and observability systems to proactively resolve incidents before they impact users.

Evangelize telemetry and metrics-driven application development.

Improve on-call processes and reduce toil by automating repetitive tasks.

Contribute to the software development of cloud management tooling and support applications.

Develop detailed technical documentation, including runbooks, troubleshooting guides, and system diagrams.

Continuous Improvement :

Work with stakeholders to ensure systems meet security baselines, best practices, compliance requirements and resiliency standards.

Implement effective backup strategies and conduct regular disaster recovery testing.

Implement robust access controls, secrets management, and security monitoring solutions.

Collaborate with security teams to manage vulnerabilities and respond to threats.

Engage with our FinOps / CostOps team to optimize cloud costs by implementing efficient resource utilization and right-sizing strategies.

Work closely with development, infrastructure, and security teams to drive best practices and improvements.

Mentor junior engineers and contribute to a culture of continuous learning and improvement.

Participate in architectural discussions and provide guidance on reliability and scalability & Experiences :

8+ years of prior experience in a Site Reliability Engineering, DevOps, Cloud Infrastructure or related fields.

Expert in Microsoft Azure cloud.

Minimum of 5+ years of hands-on experience architecting, building and managing Azure tenants, management groups and the overall Azure control plane and its contents.

Demonstrable experience in Linux / Unix and Windows Server administration, networking, and distributed systems.

Fluency in two or more programming languages (PowerShell, Python, Golang, Javascript, etc.)

Extensive hands-on experience in container orchestration technologies, such as AKS, Kubernetes, Docker.

Deep knowledge of monitoring, logging and observability tools (Prometheus, Grafana, ELK, Splunk, etc.)

Hands-on experience with Infrastructure-as-Code (IaC) using Terraform and ARM templates.

Strong background in CI / CD pipelines, GitOps, and infrastructure automation (Terraform, Helm, Ansible or Chef).

Soft Skills :

Strong problem-solving, troubleshooting, and debugging skills.

Excellent written and verbal communication and collaboration abilities.

English language fluency required.

Ability to handle multiple assignments concurrently.

Passion for automation, reliability, and continuous improvement.

Move quickly and intelligently - seeing technical debt as your nemesis.

Ability to solve problems independently but knows when to request assistance.

Not Required but preferred experience :

Experience with other cloud providers such as AWS, Google Cloud Platform (GCP), Oracle etc.

Knowledge of and passion for media, entertainment, and technology industries (including key players, growth trends and drivers, new media models, industry structure, etc.)

Familiarity with streaming and similar products / services.

Experience working in a national or global company.

Comfortable working in a highly iterative and somewhat unstructured environment

(ref : hirist.tech)

Create a job alert for this search

Site Reliability • Hyderabad

Related jobs

Promoted

Senior Site Reliability Engineer

AutoRABIThyderabad, telangana, in

AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

CodeKarmahyderabad, telangana, in

Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 23 days ago

Promoted

Site Reliability Engineer

Sonata SoftwareHyderabad, Telangana, India

Details Role Site Reliability Engineer (SRE) III – Data Engineering Location Hyderabad- Employment Type Full Time Experience 7–12 years in. EdTech platforms (2U) Primary Skills (Must-Have) AWS, CI / C...Show moreLast updated: 24 days ago

Promoted

Site Reliability Engineer

Prometheus consultingHyderabad

WHAT YOU'LL DO : - Support, maintain, and enhance the reliability, scalability, and performance of our Azure-based Data Analytics Platform. Collaborate closely with Data En...Show moreLast updated: 11 days ago

Promoted

Site Reliability Engineer - AWS / Google Cloud Platform

INDIGLOBE IT SOLUTIONS PRIVATE LIMITEDHyderabad

Job Summary : We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team.As an SRE, you will play a key role in ensuring the rel...Show moreLast updated: 30+ days ago

Promoted

AutoRABIT - Senior Site Reliability Engineer - AWS Infrastructure

AutoRABIT Software Pvt LtdHyderabad

Description : AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Contro...Show moreLast updated: 30+ days ago

Promoted

Sr Engineer, Site Reliability Engineer [T500-20464]

TMUS Global Solutionshyderabad, telangana, in

NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 28 days ago

Promoted

Engineer, Site Reliability [T500-20517]

TMUS Global SolutionsHyderabad, Telangana, India

Promoted

Warner Bros. Discovery - Lead Site Reliability Engineer - AWS Cloud

Warner Bros. DiscoveryHyderabad

Description : Welcome to Warner Bros.Discovery the stuff dreams are made of.Who We Are : When we say, the stuff dreams ...Show moreLast updated: 15 days ago

Promoted

Site Reliability Engineer

CapgeminiHyderabad, IN

Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 12 days ago

Promoted

Senior Site Reliability Engineer

Nebula Tech Solutionshyderabad, telangana, in

SRE team supporting mission-critical applications for our.We’re now looking for engineers who can go beyond operations — those who can. Enhance application reliability through code.Add or modify cod...Show moreLast updated: 2 days ago

Promoted

Sr Engineer, Site Reliability [T500-20279]

TMUS Global Solutionshyderabad, telangana, in

Promoted

Senior Site Reliability Engineer

TMUS Global SolutionsHyderabad, Republic Of India, IN

Promoted

Site Reliability Engineer

CitNOW GroupHyderabad, IN

Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably.CitNOW’s app-based plat...Show moreLast updated: 1 day ago

Promoted

Engineer, Site Reliability [T500-20515]

TMUS Global SolutionsHyderabad, Telangana, India

Promoted

Engineer, Site Reliability [T500-20266]

TMUS Global SolutionsHyderabad, Telangana, India

Promoted

Site Reliability Engineer

TMUS Global SolutionsHyderabad, Republic Of India, IN

Promoted

Engineer, Site Reliability [T500-20519]

TMUS Global SolutionsHyderabad, Telangana, India