Talent.com
Infrastructure Reliability Engineer

Infrastructure Reliability Engineer

InspireHyderabad, Republic Of India, IN
5 days ago
Job description

About Inspire Brands :

Inspire Brands is disrupting the restaurant industry through digital transformation and operational efficiencies. The company’s technology hub, Inspire Brands Hyderabad Support Center, India, will lead technology innovation and product development for the organization and its portfolio of distinct brands. The Inspire Brands Hyderabad Support Center will focus on developing new capabilities in data science, data analytics, eCommerce, automation, cloud computing, and information security to accelerate the company’s business strategy. Inspire Brands Hyderabad Support Center will also host an innovation lab and collaborate with start-ups to develop solutions for productivity optimization, workforce management, loyalty management, payments systems, and more.

Job Description :

Job Title : Site Reliability Engineer

Position Summary :

In just a few sentences, broadly describe the main purpose of the job. Indicate what is done and why (outcome). i.E., answer the question, “Why does the job exist?”

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems enabling online ordering for thousands of restaurants across multiple brands. SRE ensures that Inspire Digital Platform (IDP) services have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally, SRE’s will keep an ever-watchful eye on our systems capacity and performance. SRE is also responsible for performing regular capacity planning exercises. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating toil through automation.

Essential Job Responsibilities :

List and describe the position’s key responsibilities in order of importance, and indicate the approximate percentage of time spent on the responsibility. (Percentages must add up to 100%.) For each, describe in simple terms what the job holder must do to accomplish the main purpose of the job and the amount of direction that is required to perform the job duties. If the job manages others, describe the management duties (including authority to hire / fire / recommend pay increases / manage overall work product / schedule, etc.) Insert additional rows as needed.

Note : These statements are not intended to be an exhaustive list of all responsibilities and duties .

Technical :

  • Review current workload patterns, understand the business case and prioritize areas of weakness within the platform through log and metric investigation as well as application profiling.
  • Work with senior engineering and testing team members to build tools and recommend testing strategies for problem prevention, detection.
  • Employ deep troubleshooting skills to improve the availability, performance, and security to ensure services are designed with 24 / 7 availability and operational readiness and rigor.
  • Perform in depth postmortem on production incidents, to assess effective business impact and for Engineering to learn from these.
  • Create Dashboards and alerts for Monitoring the IDP platform, define key metrics and service level indicators and ensure relevant metric data is collected to create actionable alerts for SRE and Network Operation Center.
  • Participate in the 24 / 7 on call rotation.
  • Automate toil, by building software and automation for seamless application deployment and third-party tool integration.
  • Ensure the platform holds a high degree of reliability, at least three 9s.
  • Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
  • own technically intricate issues that cross between DevOps, Databases, Networking, Code, Infrastructure and people;

drive them to satisfactory completion.

  • Provide recommendations and feedback in design reviews and review sessions.
  • Knowledge, Skills and Abilities :

  • Indicate the education level, previous experience, specific knowledge, skills and abilities required to meet minimum requirements for this position.
  • Education :

  • 4-year degree in computer science, Information Technology, or related field
  • Experience :

  • Minimum 5 years of experience as a Software Engineer, Platform, SRE or Devops engineer supporting large scale SAAS Production B2C or B2B Cloud Platforms.
  • Hands-on problem-solving and troubleshooting Knowledge and skills (general and technical)
  • Minimum 5 years of experience as a Software Engineer, Platform, SRE or Devops engineer supporting large scale SAAS Production B2C or B2B Cloud Platforms.
  • Development skills, Java, TypeScript, python, OOP expertise is a must.
  • Hands on Azure Cloud experience particularly with AKS, API management, Azure Cache for Redis, Azure Blob Storage, Cosmo DB, Service Bus, Azure Functions.
  • Proficiency in monitoring, APM and profiling tools, New Relic, Splunk, Prometheus, Grafana.
  • Working experience with containers, Kubernetes and Helm.
  • Functional knowledge of Cloud Network, Firewalls, Ingress and Egress controllers, Service Mesh and
  • Experience with Auth0 Secret management and Cloudflare, CDN, Load Balancer, Cache, Firewall, worker features.
  • Experience with ArgoCD, GitLab, CICD, Terraform, Infrastructure as Code.
  • Strong communication skills and ability to explain technical concepts clearly
  • A willingness to dive into understanding, debugging, and improving any layer of the stack
  • Technical Skills :

  • Level of competency 3 on a scale of 5 for skills mentioned below.
  • Cloud Provider : Azure
  • Core Services : Elasticpool, SQL, Application Gateway, API Management (APIM), Key Vaults, AKS (Azure Kubernetes Service), VMSS (Virtual Machine Scale Sets), VM
  • Networking : NSG (Network Security Groups), Private Endpoints, Private Linked Service, VNet, Subnets, WAF (Web Application Firewall), GeoReplication
  • Storage : Storage Accounts
  • Messaging and Events : EventHub, EventGrid, Azure Service Bus (Namespaces, Queues, Topics)
  • Identity and Security : Managed Identities / Workload Identities, Private DNS, Auth0
  • Containerization and Orchestration :

  • Kubernetes (K8s) : For container orchestration
  • Helm : For Kubernetes package management
  • Docker : For containerization
  • Monitoring and Observability :

  • New Relic / Splunk
  • Automation and Scripting :

  • PowerShell
  • Python
  • Other requirements (licenses, certifications, specialized training)
  • Good to have certifications :

  • Certified Kubernetes Administrator / Developer
  • AZ-104 (Microsoft Certified : Azure Administrator Associate)
  • AZ-305 : Designing Microsoft Azure Infrastructure Solutions
  • Create a job alert for this search

    Reliability Engineer • Hyderabad, Republic Of India, IN

    Related jobs
    • Promoted
    Cloud Infrastructure Reliability Engineer

    Cloud Infrastructure Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    The Goodyear Tire & Rubber CompanyHyderabad, Republic Of India, IN
    Proven experience building and scaling.Terraform, GitHub Actions, CI / CD pipelines.Salesforce platform integration.Experience Cloud, Data Cloud, APIs). Experience guiding multiple squads from.Experie...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Reliability Engineer

    Reliability Engineer

    People Prime WorldwideHyderabad, Republic Of India, IN
    Our Client is a global IT services company headquartered in Southborough, Massachusetts, USA.Founded in 1996, with a revenue of $1. B, with 35,000+ associates worldwide, specializes in digital engin...Show moreLast updated: 11 hours ago
    • Promoted
    Infrastructure Engineer - Tier3

    Infrastructure Engineer - Tier3

    NEXPLAY SECUREsecunderabad, telangana, in
    The Infrastructure Engineer (Tier III, remote) serves as the senior technical authority within Nexplay Secure's Managed Services division. This role leads the deployment and ongoing support of criti...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmasecunderabad, telangana, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 26 days ago
    • Promoted
    Release and Infrastructure Engineer

    Release and Infrastructure Engineer

    C5iHyderabad, Republic Of India, IN
    We are looking for an experienced DevOps Engineer with strong expertise in supporting.NET 8+ applications, CI / CD pipelines, and cloud-native deployments. The ideal candidate will be responsible for ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Network Reliability Engineer - Observability

    Senior Network Reliability Engineer - Observability

    Marriott Tech AcceleratorHyderabad, Republic Of India, IN
    Bethesda, Maryland, USA, was founded in May 1927 by J.Marriott with a modest nine-seat A&W root beer stand.Guided by the family's leadership and core principles, Marriott International today has gr...Show moreLast updated: 30+ days ago
    • Promoted
    Lead - Cloud Reliability Engineer

    Lead - Cloud Reliability Engineer

    Searce Inchyderabad, telangana, in
    The ‘process-first’ AI-native modern tech consultancy that's rewriting the rules.As an engineering-led consultancy, we are dedicated to relentlessly improving the real business outcomes.Our solvers...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Infrastructure Reliability Engineer

    Senior Infrastructure Reliability Engineer

    AutoRABITHyderabad, Republic Of India, IN
    AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show moreLast updated: 5 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    FACTSETHyderabad, India
    FactSet creates flexible, open data and software solutions for over 200,000 investment professionals worldwide, providing instant access to financial data and analytics that investors use to make c...Show moreLast updated: 20 days ago
    • Promoted
    Infrastructure Reliability Engineer

    Infrastructure Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiHyderabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 15 days ago
    • Promoted
    Principal Site Reliability Engineer - IAC Terraform

    Principal Site Reliability Engineer - IAC Terraform

    TidyhireHyderabad
    Description : This is a pure individual contributor role.Core Responsibilities : Infrastructure Design &...Show moreLast updated: 27 days ago
    • Promoted
    Lead Systems Reliability Engineer

    Lead Systems Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    Roku - DevOps / Site Reliability Engineer - Cloud Infrastructure

    Roku - DevOps / Site Reliability Engineer - Cloud Infrastructure

    RokuHyderabad
    Description : About the Role : We're looking for a dedicated DevOps / SRE Engineer to own and enhance the reliability, scalability, and effici...Show moreLast updated: 3 days ago
    • Promoted
    Regional Cloud Infrastructure Engineer

    Regional Cloud Infrastructure Engineer

    Argyll ScottHyderabad, IN
    This position offers an opportunity to lead and support a diverse hybrid IT landscape across the APAC region.The Regional IT and Cloud Specialist will be responsible for managing, optimizing, and s...Show moreLast updated: 5 days ago
    • Promoted
    Senior Systems Reliability Engineer

    Senior Systems Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    Cloud Reliability Engineer

    Cloud Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago