This job offer is not available in your country.

Site Reliability Engineer - Azure / Cloud Services

Leapwork India Private LimitedGurugram

16 days ago

Job description

At Leapwork, our vision is to break down the barriers between humans and computers through the worlds most accessible automation platform. We are the leading global AI-powered visual test automation solution, enabling some of the worlds largest enterprises to adopt, scale, and maintain automation - in under 30 days.

In today's environment, where efficiency, automation, and cost optimization are essential to enterprise growth, we are uniquely positioned to deliver impact.

In 2023, Microsoft, the world's largest and most recognizable software company, recognised Leapwork as a truly innovative and disruptive product, leading to a strategic partnership that continues to be a major growth catalyst.

If you're contemplating the next step in your career and seek a fast-paced company where you can impact the build and growth of something truly special, look no further!

We are headquartered in Copenhagen, Denmark and have local offices across Europe, the US and Asia.

Job Description :

We are looking for an experienced and forward-thinking Senior Site Reliability Engineer (SRE) with deep expertise in Microsoft Azure Cloud. In this role, you will ensure the reliability, availability, scalability, and performance of our Azure-based platforms and applications.

You will partner with cross-functional teams to design, implement, and maintain resilient infrastructure while driving automation, monitoring, and optimization initiatives across our cloud environment.

Role Responsibilities :

Service Reliability & SLOs : Define and maintain Service Level Objectives (SLOs) for the systems you own.

Continuously measure and improve availability, latency, and overall system health.

Automation & Scalability : Develop automation to scale systems sustainably, prevent service issues, and enable rapid recovery when incidents occur.

Collaboration & Architecture Influence : Partner with development teams to improve reliability, observability, and release velocity. Influence architectural decisions to embed high availability and operability into applications.

Incident Management : Participate in on-call rotations, lead incident response, conduct postmortems, and drive root cause resolution with a focus on prevention.

Monitoring & Observability : Implement and refine monitoring, alerting, and observability solutions (Azure Monitor, Datadog, Grafana, Prometheus, Loki, Tempo) to ensure proactive detection of issues.

Disaster Recovery & BCP : Design, test, and maintain disaster recovery and business continuity strategies to safeguard system availability and data integrity.

Cost Optimization : Monitor and optimize Azure resource usage for performance and cost efficiency.

Engineering Best Practices : Be a vocal advocate for strong engineering practices, enabling scalable, reliable, and performant systems.

Cloud Migration Enablement : Support cloud migration initiatives in partnership with foundation and migration teams - from architectural reviews to operational acceptance testing and configuring Grafana dashboards and Azure Log Analytics metrics.

AI & Intelligent Automation : Leverage AI / ML-driven tools to improve system observability, incident prediction, and automated remediation, ensuring faster recovery and reduced downtime.

SRE Agents : Work with or build SRE Agents to automate routine operational tasks such as log analysis, anomaly detection, incident triage, and performance tuning.

Data-Driven Reliability : Analyse monitoring data using AI / ML to identify hidden trends, optimize system health, and drive continuous improvement in reliability practices.

Documentation & Knowledge Sharing : Maintain detailed documentation of systems, processes, and architecture to ensure alignment and smooth onboarding of team members.

Continuous Learning : Actively participate in and foster a culture of continuous learning and development within the team.

Mentorship : Guide and mentor junior engineers, promoting collaboration and technical growth

Technical Qualifications / Role Requirements (Must - Have Skills) :

Bachelor's degree in computer science, Engineering, or a related technical field. Master's degree is a plus.

Proven experience (7+ years) working as an SRE with a specific focus on Microsoft Azure Cloud services.

Deep understanding of Azure services, including Azure Kubernetes Service (AKS), Azure App Service, Azure Functions, Azure Monitor, and Azure Resource Manager.

Proficiency in scripting and programming languages (e.g., PowerShell, Python) for automation, infrastructure management, and tool development.

Hands-on experience with containerization and orchestration technologies, such as Docker and Kubernetes, in an Azure context.

Strong incident management skills, with a data-driven and analytical approach to diagnosing complex issues.

Familiarity with Infrastructure as Code (IaC) tools (e.g., Terraform, ARM templates) and configuration management tools (e.g., Ansible, Chef, Puppet).

Familiarity with AI-powered monitoring, anomaly detection, and auto-remediation tools.

Experience working with SRE Agents or similar intelligent automation frameworks for operational efficiency.

Ability to integrate AI-driven insights into incident response, root cause analysis, and reliability engineering

Excellent problem-solving skills, attention to detail, and a proactive attitude towards addressing operational challenges.

Effective communication and collaboration skills, with the ability to work across teams and influence technical decisions.

Experience with CI / CD pipelines and version control systems (e.g., Git).

Relevant Azure certifications (e.g., Microsoft Certified : Azure Solutions Architect Expert, Microsoft Certified : Azure DevOps Engineer Expert) are highly advantageous.

In-depth knowledge of monitoring and alerting tools like Grafana, Prometheus, Loki, and Tempo.

Analyze monitoring data to identify trends and root causes of incidents, leading to continuous improvement of system health.

A strong understanding of DevOps principles and automation practices.

Why Leapwork?

We are on an exciting journey of global growth - and this is your chance to get onboard and an opportunity to lead and shape digital transformation initiatives in a forward-thinking company, working with and learning from a talented and passionate team committed to innovation and excellence

By joining our team, you'll become part of a fast-paced international environment where you can grow, challenge yourself, and do what inspires you. We work hard, but have fun while doing it - and we believe that collaboration, social activities and celebration are keys to success.

Our Leapwork principles :

Our five key principles capture the essence of what it means to be a part of our world-class team! They are integral to how we approach our work and one another, and they serve as a roadmap to our continued growth, development, achievements, and success.

Customer first; We listen to our customers, understand their pain points and focus on what matters to them.

Lead from the front; Leading means guiding others towards the solutions to our challenges.

Get it done; We make commitments, follow through and deliver work we're proud of.

Build excellence; We do our best work every day, holding ourselves and others to the highest standards.

Respectfully different; We treat each other with respect, always. We're different, not indifferent.

(ref : hirist.tech)

Create a job alert for this search

Site Reliability Engineer • Gurugram

Related jobs

Promoted

Site Reliability Engineer - Azure / Cloud Services

Leapwork India Private LimitedGurgaon

Promoted

Site Reliability Engineer

XebiaGhaziabad, IN

AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 25 days ago

Promoted

Site Reliability Engineer

ConcordGhaziabad, IN

Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 16 days ago

Promoted

Cloud Engineer

Delta System & Software, Inc.Ghaziabad, IN

AWS Cloud Development Kit (AWS CDK) in TypeScript.Solid experience with TypeScript fundamentals : interfaces, types, classes, generics. Experience writing unit tests for infrastructure code using Jes...Show moreLast updated: 4 days ago

Promoted

AWS Cloud Engineer

ProgliteDelhi, IN

Infrastructure & System Administration : .Deploy, manage, and optimize EC2 instances across dev, test, and production environments. Perform system administration and troubleshooting for Linux and Wind...Show moreLast updated: 6 days ago

Promoted

Cloud Engineer

DBiz.aiDelhi, IN

We are seeking a dynamic and skilled AWS Cloud & DevOps Engineer to design, implement, and maintain scalable, secure, and automated cloud environments on Amazon Web Services.The ideal candidate wil...Show moreLast updated: 6 days ago

Promoted

Senior Cloud Platform Engineer -AWS-Salary 70LPA

The BigCjobs.comDelhi, IN

We are looking for a Senior Cloud Platform Engineer to lead the automation, reliability, and performance of our AWS-based infrastructure. You will architect, optimize, and scale mission-critical sys...Show moreLast updated: 4 days ago

Promoted

Site Reliability Engineer

Amicon Hub Servicesgurgaon, haryana, in

Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 4 days ago

Promoted

Senior Site Reliability Engineer

WSO2gurugram, uttar pradesh, in

Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 6 days ago

Promoted

Senior Cloud Engineer AWS

Matrix USAgurugram, uttar pradesh, in

We are seeking an experienced AWS Developer proficient in Python and PySpark to design, develop, and maintain scalable, serverless data processing and workflow automation solutions on AWS.The ideal...Show moreLast updated: 3 days ago

Promoted

Site Reliability Engineer - Chaos Management

Xebiaghaziabad, uttar pradesh, in

Promoted

Cloud Engineer

Strobes Security, Inc.Delhi, IN

We are looking for a Mid-level Cloud Engineer with hands-on expertise in designing, automating, and operating production-grade cloud infrastructure. This role requires a strong background in AWS ser...Show moreLast updated: 25 days ago

Promoted

Cloud Engineer

ValueMomentumnoida, delhi, in

We are seeking a highly skilled.You will work closely with development, operations, and security teams to ensure continuous delivery, high availability, and optimal performance of our applications....Show moreLast updated: 5 days ago

Promoted

Cloud Engineer

RAVSoft Solutions Inc.gurgaon, haryana, in

Job Title : Mid-Level AWS Cloud Engineer (AWS Lambda + TypeScript).You will be responsible for building scalable, secure, and high-performance backend systems and infrastructure that power our core ...Show moreLast updated: 6 days ago

Promoted

Site Reliability Engineer - AWS / Azure Cloud Services

SkyFlowDelhi, IN

Skyflow is a data privacy vault company built to radically simplify how companies isolate, protect, and govern their customers most sensitive data. With its global network of data privacy vaults, Sk...Show moreLast updated: 5 days ago

Promoted

Site Reliability Engineer - CI / CD

hirezy.aiDelhi, IN

Remote

Technical Skills : - Programming : Proficiency in languages like Python, Bash, or Java is essential.Operating Systems : Deep understanding of Linux / Windows operating ...Show moreLast updated: 30+ days ago

Promoted

Sr. AWS Cloud Engineer

MastekDelhi, IN

Cloud Engineer Job description : .Have work experience in the following areas : .Experience in designing, building, and maintaining AWS Cloud Infrastructure. Proficient in AWS services including EC2, S3...Show moreLast updated: 23 days ago

Promoted

Gemini Solutions - Site Reliability Engineer - Cloud Solutions

Gemini Solutions Private LimitedGurgaon

Position Summary : In this role, you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliab...Show moreLast updated: 20 days ago

Promoted

Site Reliability Engineer

UplersGhaziabad, IN

Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 23 days ago

Promoted

Site Reliability Engineer - AWS / Azure Cloud Services

DeqodeGurgaon

Profile : Site Reliability Engineer (SRE) Experience Required : 6+ Years Locations : Mumbai, Gurgaon, Ch...Show moreLast updated: 30+ days ago