Site Reliability EngineerInspire Brands Hyderabad Support Center • Hyderabad, India

Site Reliability Engineer

Inspire Brands Hyderabad Support Center • Hyderabad, India

12 hours ago

Job description

About Inspire Brands :

Inspire Brands is disrupting the restaurant industry through digital transformation and operational efficiencies. The companys technology hub, Inspire Brands Hyderabad Support Center, India, will lead technology innovation and product development for the organization and its portfolio of distinct brands. The Inspire Brands Hyderabad Support Center will focus on developing new capabilities in data science, data analytics, eCommerce, automation, cloud computing, and information security to accelerate the companys business strategy. Inspire Brands Hyderabad Support Center will also host an innovation lab and collaborate with start-ups to develop solutions for productivity optimization, workforce management, loyalty management, payments systems, and more.

Job Description :

Job Title : Site Reliability Engineer

Position Summary :

In just a few sentences, broadly describe the main purpose of the job. Indicate what is done and why (outcome). i.e., answer the question, Why does the job exist?

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems enabling online ordering for thousands of restaurants across multiple brands. SRE ensures that Inspire Digital Platform (IDP) services have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally, SREs will keep an ever-watchful eye on our systems capacity and performance. SRE is also responsible for performing regular capacity planning exercises. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating toil through automation.

Essential Job Responsibilities :

List and describe the positions key responsibilities in order of importance, and indicate the approximate percentage of time spent on the responsibility. (Percentages must add up to 100%.) For each, describe in simple terms what the job holder must do to accomplish the main purpose of the job and the amount of direction that is required to perform the job duties. If the job manages others, describe the management duties (including authority to hire / fire / recommend pay increases / manage overall work product / schedule, etc.) Insert additional rows as needed.

Note : These statements are not intended to be an exhaustive list of all responsibilities and duties .

Technical :

Review current workload patterns, understand the business case and prioritize areas of weakness within the platform through log and metric investigation as well as application profiling.
Work with senior engineering and testing team members to build tools and recommend testing strategies for problem prevention, detection.
Employ deep troubleshooting skills to improve the availability, performance, and security to ensure services are designed with 24 / 7 availability and operational readiness and rigor.
Perform in depth postmortem on production incidents, to assess effective business impact and for Engineering to learn from these.
Create Dashboards and alerts for Monitoring the IDP platform, define key metrics and service level indicators and ensure relevant metric data is collected to create actionable alerts for SRE and Network Operation Center.
Participate in the 24 / 7 on call rotation.
Automate toil, by building software and automation for seamless application deployment and third-party tool integration.
Ensure the platform holds a high degree of reliability, at least three 9s.
Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
own technically intricate issues that cross between DevOps, Databases, Networking, Code, Infrastructure and people; drive them to satisfactory completion.
Provide recommendations and feedback in design reviews and review sessions.

Knowledge, Skills and Abilities :

Indicate the education level, previous experience, specific knowledge, skills and abilities required to meet minimum requirements for this position.

Education

4-year degree in computer science, Information Technology, or related field

Experience

Minimum 5 years of experience as a Software Engineer, Platform, SRE or Devops engineer supporting large scale SAAS Production B2C or B2B Cloud Platforms.

Hands-on problem-solving and troubleshooting Knowledge and skills (general and technical)

Minimum 5 years of experience as a Software Engineer, Platform, SRE or Devops engineer supporting large scale SAAS Production B2C or B2B Cloud Platforms.

Development skills, Java, TypeScript, python, OOP expertise is a must.

Hands on Azure Cloud experience particularly with AKS, API management, Azure Cache for Redis, Azure Blob Storage, Cosmo DB, Service Bus, Azure Functions.

Proficiency in monitoring, APM and profiling tools, New Relic, Splunk, Prometheus, Grafana.

Working experience with containers, Kubernetes and Helm.

Functional knowledge of Cloud Network, Firewalls, Ingress and Egress controllers, Service Mesh and

Experience with Auth0 Secret management and Cloudflare, CDN, Load Balancer, Cache, Firewall, worker features.

Experience with ArgoCD, GitLab, CICD, Terraform , Infrastructure as Code.

Strong communication skills and ability to explain technical concepts clearly

A willingness to dive into understanding, debugging, and improving any layer of the stack

Technical Skills :

Level of competency 3 on a scale of 5 for skills mentioned below.

Cloud Provider : Azure

Core Services : Elasticpool, SQL, Application Gateway, API Management (APIM), Key Vaults, AKS (Azure Kubernetes Service), VMSS (Virtual Machine Scale Sets), VM

Networking : NSG (Network Security Groups), Private Endpoints, Private Linked Service, VNet, Subnets, WAF (Web Application Firewall), GeoReplication

Storage : Storage Accounts

Messaging and Events : EventHub, EventGrid, Azure Service Bus (Namespaces, Queues, Topics)

Identity and Security : Managed Identities / Workload Identities, Private DNS, Auth0

Containerization and Orchestration :

Kubernetes (K8s) : For container orchestration

Helm : For Kubernetes package management

Docker : For containerization

Monitoring and Observability :

New Relic / Splunk

Automation and Scripting :

PowerShell

Python

Other requirements (licenses, certifications, specialized training)

Good to have certifications

Certified Kubernetes Administrator / Developer

AZ-104 (Microsoft Certified : Azure Administrator Associate)

AZ-305 : Designing Microsoft Azure Infrastructure Solutions

Create a job alert for this search

Site Reliability Engineer • Hyderabad, India

Related jobs

Senior Site Reliability Engineer

Elios Talent • Hyderabad, Telangana, India

Senior Site Reliability Engineer Key Highlights ️ Build, scale, and optimize cloud-native infrastructure powering global, high-availability platforms ⚡ Drive automation-first engineering across ...Show more

Last updated: 16 hours ago • Promoted • New!

Site Reliability Engineer

Tata Consultancy Services • Hyderabad, Telangana, India

GKE(Preferable); Kubernetes (Any cloud) + PostgresSQL, SQL(Must) Linux (Optional), Java (Optional) , Kubernetes (CLI), Prior Production support experience, Release Management, Prior Deployment exp...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Elios Talent • Hyderabad, Telangana, India

Site Reliability Engineer Key Highlights ️ Build, automate, and support cloud-native infrastructure powering high-availability platforms ⚡ Contribute to automation-first engineering across AWS, Te...Show more

Last updated: 16 hours ago • Promoted • New!

Site Reliability Engineer

inTune Systems Inc • Hyderabad, Telangana, India

SRE / App Support Engineer Location Hyderabad Job Summary : We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team. As an SRE, you will play a key role in en...Show more

Last updated: 16 hours ago • Promoted • New!

Sr Engineer, Site Reliability [T500-20425]

TMUS Global Solutions • Hyderabad, Telangana, India

NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show more

Last updated: 30+ days ago • Promoted

Sr Engineer, Site Reliability

TMUS Global Solutions • Hyderabad, India

The Senior Engineer, Site Reliability (SRE) will play a critical role in ensuring the stability, scalability, and operational excellence of Accounting and Finance platforms.This role is focused on ...Show more

Last updated: 12 hours ago • Promoted • New!

Lead Site Reliability Engineer

AutoRABIT • Hyderabad, Republic Of India, IN

AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show more

Last updated: 30+ days ago • Promoted

SRE (Site Reliability Engineer)

Tata Consultancy Services • Hyderabad, Republic Of India, IN

Kubernetes (Any cloud) + PostgresSQL, SQL(Must).Linux (Optional), Java (Optional), Kubernetes (CLI), Prior Production support experience, Release Management, Prior Deployment experience,.Show more

Last updated: 5 days ago • Promoted

Senior Site Reliability Engineer

AutoRABIT • Hyderabad, Telangana, India

Last updated: 30+ days ago • Promoted

Engineer - Site Relibility - FPT

Talent500 INC • Hyderabad, India

Engineer - Site Reliability - FPT.As a Site Reliability Engineer, youll play a crucial role in keeping our digital backbone running seamlessly for millions of customers. Your mission : reduce inciden...Show more

Last updated: 12 hours ago • Promoted • New!

Site Reliability Engineer [T500-21132]

Inspire • Hyderabad, Telangana, India

Inspire Brands is disrupting the restaurant industry through digital transformation and operational efficiencies.The company’s technology hub, Inspire Brands Hyderabad Support Center, India, will l...Show more

Last updated: 15 days ago • Promoted

Lead Site Reliability Engineer

GSPANN Technologies, Inc • Hyderabad, Telangana, India

Headquartered in California, U.GSPANN provides consulting and IT services to global clients.We help clients transform how they deliver business value by helping them optimize their IT capabilities,...Show more

Last updated: 1 day ago • Promoted

Site Reliability Engineer

Foodsmart • Hyderabad, Republic Of India, IN

Foodsmart is the leading telenutrition and foodcare solution, backed by a robust network of Registered Dietitians.Our platform is designed to foster healthier food choices, drive lasting behavior c...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

TMUS Global Solutions • Hyderabad, Republic Of India, IN

Last updated: 30+ days ago • Promoted

Engineer, Site Reliability

TMUS Global Solutions • Hyderabad, India

The Systems Reliability Engineer (SRE) ensures the stability, performance, and reliability of IT services and infrastructure. This role combines software engineering and operations expertise to buil...Show more

Last updated: 12 hours ago • Promoted • New!

Site Reliability Engineer

TMUS Global Solutions • Hyderabad, Republic Of India, IN

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

VXI Global Solutions • Hyderabad, Telangana, India

We are looking for a Site Reliability Engineer with 3+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications.The id...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

NationsBenefits India • Hyderabad, Telangana, India

Site Reliability Engineer (SRE) | Fintech | Kubernetes | Datadog |.SRE team focused on maintaining the performance, reliability, and availability of our fintech platforms.Triage and resolve product...Show more

Last updated: 30+ days ago • Promoted