Talent.com
Warner Bros. Discovery - Azure Site Reliability Engineer

Warner Bros. Discovery - Azure Site Reliability Engineer

Warner Bros. DiscoveryHyderabad
1 day ago
Job description

Description :

Key Responsibilities :

  • Primarily accountable for managing Azure environments.
  • Design, implement and maintain highly available, scalable, and resilient infrastructure.
  • Identify, optimize and eliminate performance bottlenecks and proactively remediating security concerns through monitoring, profiling, and tuning.
  • Establish and improve SLOs, SLIs, and error budgets to drive system reliability.
  • Collaborate with stakeholders, including application developers, to improve application observability and optimize performance.
  • Lead and mentor a team of engineers working to reduce toil across the total team load, and to implement security features, roles, user access and privileges according to best practices.
  • Proactively identify, design, and implement process and architectural improvements.
  • Stay informed on the latest features and best practices across the Azure Public Cloud and the WBD Azure environment.
  • Work with peer group of complementary public cloud leads (AWS / GCP) to facilitate consistency across WBD management of resources wherever possible.

Methodology :

  • Automate deployment, monitoring, and self-healing capabilities to improve operational efficiency.
  • Develop and manage infrastructure using Terraform and other IaC tools.
  • Drive incident response efforts, conduct root cause analyses (RCA), and implement preventative measures to minimize downtime.
  • Build and enhance monitoring, alerting, and observability systems to proactively resolve incidents before they impact users.
  • Evangelize telemetry and metrics-driven application development.
  • Improve on-call processes and reduce toil by automating repetitive tasks.
  • Contribute to the software development of cloud management tooling and support applications.
  • Develop detailed technical documentation, including runbooks, troubleshooting guides, and system diagrams.
  • Continuous Improvement :

  • Work with stakeholders to ensure systems meet security baselines, best practices, compliance requirements and resiliency standards.
  • Implement effective backup strategies and conduct regular disaster recovery testing.
  • Implement robust access controls, secrets management, and security monitoring solutions.
  • Collaborate with security teams to manage vulnerabilities and respond to threats.
  • Engage with our FinOps / CostOps team to optimize cloud costs by implementing efficient resource utilization and right-sizing strategies.
  • Work closely with development, infrastructure, and security teams to drive best practices and improvements.
  • Mentor junior engineers and contribute to a culture of continuous learning and improvement.
  • Participate in architectural discussions and provide guidance on reliability and scalability & Experiences :
  • 8+ years of prior experience in a Site Reliability Engineering, DevOps, Cloud Infrastructure or related fields.
  • Expert in Microsoft Azure cloud.
  • Minimum of 5+ years of hands-on experience architecting, building and managing Azure tenants, management groups and the overall Azure control plane and its contents.
  • Demonstrable experience in Linux / Unix and Windows Server administration, networking, and distributed systems.
  • Fluency in two or more programming languages (PowerShell, Python, Golang, Javascript, etc.)
  • Extensive hands-on experience in container orchestration technologies, such as AKS, Kubernetes, Docker.
  • Deep knowledge of monitoring, logging and observability tools (Prometheus, Grafana, ELK, Splunk, etc.)
  • Hands-on experience with Infrastructure-as-Code (IaC) using Terraform and ARM templates.
  • Strong background in CI / CD pipelines, GitOps, and infrastructure automation (Terraform, Helm, Ansible or Chef).
  • Soft Skills :

  • Strong problem-solving, troubleshooting, and debugging skills.
  • Excellent written and verbal communication and collaboration abilities.
  • English language fluency required.
  • Ability to handle multiple assignments concurrently.
  • Passion for automation, reliability, and continuous improvement.
  • Move quickly and intelligently - seeing technical debt as your nemesis.
  • Ability to solve problems independently but knows when to request assistance.
  • Not Required but preferred experience :

  • Experience with other cloud providers such as AWS, Google Cloud Platform (GCP), Oracle etc.
  • Knowledge of and passion for media, entertainment, and technology industries (including key players, growth trends and drivers, new media models, industry structure, etc.)
  • Familiarity with streaming and similar products / services.
  • Experience working in a national or global company.
  • Comfortable working in a highly iterative and somewhat unstructured environment
  • (ref : hirist.tech)

    Create a job alert for this search

    Site Reliability • Hyderabad

    Related jobs
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    AutoRABIThyderabad, telangana, in
    AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmahyderabad, telangana, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 23 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Sonata SoftwareHyderabad, Telangana, India
    Details Role Site Reliability Engineer (SRE) III – Data Engineering Location Hyderabad- Employment Type Full Time Experience 7–12 years in. EdTech platforms (2U) Primary Skills (Must-Have) AWS, CI / C...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Prometheus consultingHyderabad
    WHAT YOU'LL DO : - Support, maintain, and enhance the reliability, scalability, and performance of our Azure-based Data Analytics Platform. Collaborate closely with Data En...Show moreLast updated: 11 days ago
    • Promoted
    Site Reliability Engineer - AWS / Google Cloud Platform

    Site Reliability Engineer - AWS / Google Cloud Platform

    INDIGLOBE IT SOLUTIONS PRIVATE LIMITEDHyderabad
    Job Summary : We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team.As an SRE, you will play a key role in ensuring the rel...Show moreLast updated: 30+ days ago
    • Promoted
    AutoRABIT - Senior Site Reliability Engineer - AWS Infrastructure

    AutoRABIT - Senior Site Reliability Engineer - AWS Infrastructure

    AutoRABIT Software Pvt LtdHyderabad
    Description : AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Contro...Show moreLast updated: 30+ days ago
    • Promoted
    Sr Engineer, Site Reliability Engineer [T500-20464]

    Sr Engineer, Site Reliability Engineer [T500-20464]

    TMUS Global Solutionshyderabad, telangana, in
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 28 days ago
    • Promoted
    Engineer, Site Reliability [T500-20517]

    Engineer, Site Reliability [T500-20517]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 27 days ago
    • Promoted
    Warner Bros. Discovery - Lead Site Reliability Engineer - AWS Cloud

    Warner Bros. Discovery - Lead Site Reliability Engineer - AWS Cloud

    Warner Bros. DiscoveryHyderabad
    Description : Welcome to Warner Bros.Discovery the stuff dreams are made of.Who We Are : When we say, the stuff dreams ...Show moreLast updated: 15 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiHyderabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 12 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Nebula Tech Solutionshyderabad, telangana, in
    SRE team supporting mission-critical applications for our.We’re now looking for engineers who can go beyond operations — those who can. Enhance application reliability through code.Add or modify cod...Show moreLast updated: 2 days ago
    • Promoted
    Sr Engineer, Site Reliability [T500-20279]

    Sr Engineer, Site Reliability [T500-20279]

    TMUS Global Solutionshyderabad, telangana, in
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 28 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 28 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CitNOW GroupHyderabad, IN
    Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably.CitNOW’s app-based plat...Show moreLast updated: 1 day ago
    • Promoted
    Engineer, Site Reliability [T500-20515]

    Engineer, Site Reliability [T500-20515]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 28 days ago
    • Promoted
    Engineer, Site Reliability [T500-20266]

    Engineer, Site Reliability [T500-20266]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 28 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 28 days ago
    • Promoted
    Engineer, Site Reliability [T500-20519]

    Engineer, Site Reliability [T500-20519]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 27 days ago