Senior Site Reliability Engineer - AWS / Google Cloud Platform

N-iX LTDDelhi, IN

10 days ago

Job type

Remote

Job description

About the Company :

Our Client is defining the future of cybersecurity through our XDR platform that automatically prevents, detects, and responds to threats in real time. Singularity XDR ingests data and leverages our patented AI models to deliver autonomous protection. With the Client, organizations gain full transparency into everything happening across the network at machine speed to defeat every attack at every stage of the threat lifecycle.

We are a values-driven team where names are known, results are rewarded, and friendships are formed. Trust, accountability, relentlessness, ingenuity, and our client-centric approach define the pillars of our collaborative and unified global culture. Were looking for people who will drive team success and collaboration across SentinelOne. If youre enthusiastic about innovative approaches to problem-solving, we would love to speak with you about joining our team!

What Are We Looking For?

We are seeking a Site Reliability Engineer (SRE) with extensive operational experience managing large-scale SaaS infrastructures. You will be responsible for designing and maintaining data infrastructure that emphasizes automation, self-service, and scalability. This role is vital to ensuring that we meet and exceed our Service Level Objectives (SLOs) and uptime commitments to customers.

You will partner closely with engineering teams to help them deliver software faster, safer, and with higher quality, while driving initiatives that enhance the reliability, stability, and cost efficiency of our production environments. Youll join a world-class team of like-minded SREs who manage complex, high-traffic systems that operate at global scale.

What Will You Do?

As a Site Reliability Engineer, you will play a critical role in ensuring the availability, scalability, and performance of SentinelOnes large-scale distributed systems. Working at the intersection of software development and operations, youll focus on making our infrastructure more reliable, automated, and efficient, while empowering development teams to deliver at speed and with confidence.

In this role, you will :

Drive Continuous Deployment & Delivery Excellence :

Design, implement, and optimize CI / CD pipelines for efficient, secure, and reliable software releases.
Automate build, test, and deployment processes to enhance release velocity and reduce manual intervention.

Manage and Command Production Incidents :

Lead the response to production incidents, ensuring timely mitigation and root cause identification.

Conduct post-incident reviews, define corrective actions, and drive continuous improvements to prevent recurrence.

Partner with Product Engineering Teams :

Collaborate with product, platform, and infrastructure teams to embed reliability and scalability into design and architecture.

Provide technical guidance to improve system performance, fault tolerance, and observability.

Automate Operations and Streamline Processes :

Build automation tools and frameworks that eliminate repetitive tasks, standardize operational procedures, and support a self-service infrastructure model for development teams.

Monitor, Measure, and Optimize Reliability :

Establish metrics for system performance and reliability (availability, latency, throughput).

Proactively identify and resolve potential issues using data-driven insights and continuous monitoring.

Eliminate Infrastructure Bottlenecks :

Analyze production systems to identify performance and scalability limitations.

Implement architectural improvements to enhance throughput, reliability, and cost efficiency across AWS and GCP environments.

Enhance Observability & Incident Readiness :

Develop and maintain observability stacks with advanced monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Datadog).

Conduct chaos engineering experiments to validate system resilience and ensure operational preparedness.

Ensure Security, Compliance & Resilience :

Work with security and compliance teams to enforce secure configurations, data integrity, and regulatory adherence.

Participate in disaster recovery planning and capacity forecasting for high availability.

Mentor and Collaborate Across Teams :

Share best practices through documentation, technical discussions, and internal workshops.

Foster a reliability-driven culture and promote continuous improvement across engineering functions.

What Skills and Experience Will You Need?

5+ years of experience managing large-scale SaaS operations or distributed systems

Strong expertise in orchestration systems like Kubernetes, Nomad, or Mesos

Proficiency in Python (preferred), Golang, or Java for automation and tooling

Hands-on experience running and deploying Java and JavaScript applications

Proven experience in AWS and GCP environments

Practical knowledge of Infrastructure as Code (Terraform, CloudFormation, etc.)

Experience with CI / CD tools such as Jenkins, GitHub Actions, or ArgoCD, and deployment strategies like blue-green, rolling, or canary deploys

Familiarity with SRE principles SLOs, SLIs, and error budgets

Strong problem-solving, communication, and collaboration skills within distributed teams

Self-starter attitude with a passion for automation, reliability, and continuous learning

Prior product development or software engineering experience is a strong plus

What We Offer :

Flexible working format remote, office-based, or hybrid

Competitive salary and comprehensive compensation package

Personalized career growth opportunities and mentorship programs

Professional development tools : tech talks, training sessions, and centers of excellence

Active technical communities with regular knowledge-sharing

Education reimbursement for continued learning and certifications

Memorable milestone celebrations and company-sponsored events

Corporate gatherings and team-building initiatives

(ref : hirist.tech)

Create a job alert for this search

Senior Site Reliability Engineer • Delhi, IN

Related jobs

Promoted

AWS Site Reliability Engineer

HTC Global ServicesDelhi, India

HTC – A brief profile Established in 1990, HTC Inc.Troy, Michigan, is a leading global Information Technology solution and BPO provider. HTC assists clients across multiple industry verticals, offer...Show moreLast updated: 30+ days ago

Promoted
New!

Senior Site Reliability Engineer (SRE)

Voya IndiaDelhi, IN

We are seeking a strategic and technically adept leader to drive the scalability, resilience, and operational excellence of our enterprise systems. This role will set the vision for site reliability...Show moreLast updated: 13 hours ago

Promoted

Senior Site Reliability Engineer

o9 Solutions, Inc.Delhi, India

Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 7 days ago

Promoted

Senior Site Reliability Engineer

PeoplefyDelhi, India

We’re looking for an SRE who can own reliability for mission-critical services on Azure, shape standards, lead incidents with calm clarity, and drive engineering excellence across teams.Strong site...Show moreLast updated: 2 days ago

Promoted

Senior Site Reliability Engineer

Allegion IndiaDelhi, India

About Allegion : Allegion is a global leader in security products and solutions, dedicated to creating safer environments for homes and businesses. With a focus on innovation and technology, Allegion...Show moreLast updated: 8 days ago

Promoted
New!

Site Reliability Engineer (SRE) / DevOps Engineer

Stoopa AIGhaziabad, IN

AI is building next-generation AI-driven platforms for ports and is focused on reliability, speed, and intelligent automation. As we scale our next generation smart port product Turi, we are hiring ...Show moreLast updated: 13 hours ago

Promoted

Senior DevOps & Database Reliability Engineer – 100% Remote

Hyly.AIDelhi, IN

Remote

AI, we’re building the first AI + Data Fabric for the multifamily industry, transforming how clients manage, secure, and scale their marketing and operational data. As the industry moves toward a co...Show moreLast updated: 8 days ago

Promoted
New!

Site Reliability Engineer

Awign ExpertDelhi, IN

Position : SRE Observability Engineer.Mandatory Skills : Observability, Grafana and Writing queries using Prometheus and Loki. We are seeking a highly experienced and driven Senior Observability Engin...Show moreLast updated: 13 hours ago

Promoted

Site Reliability Engineer

JRD SystemsDelhi, India

Site Reliability Engineer (Windows / Cloud / Automation).We are seeking an experienced Site Reliability Engineer with a strong background in managing Windows infrastructure and cloud environments.T...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Elios TalentDelhi, India

Key Highlights ️ Build, automate, and support cloud-native infrastructure powering high-availability platforms ⚡ Contribute to automation-first engineering across AWS, Terraform, CI / CD, and observa...Show moreLast updated: 1 day ago

Promoted

Site Reliability Engineer - Azure

PhonePeDelhi, India

We are looking for engineers who are passionate about reliability, performance, and efficiency, and with experience in building tools, services, and automation to manage and improve production serv...Show moreLast updated: 15 days ago

Promoted

Site Reliability Engineer

SynamediaDelhi, India

At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the age of infinite ...Show moreLast updated: 10 days ago

Promoted

Site Reliability Engineer

Datum Technologies GroupGhaziabad, IN

Job Title : Site Reliability Engineer (SRE) – AWS.AWS, Terraform, Kubernetes, Docker, Grafana, Prometheus, Datadog.We are looking for a skilled Site Reliability Engineer (SRE) with strong AWS experi...Show moreLast updated: 8 days ago

Promoted

Senior Site Reliability Engineer

Elios TalentDelhi, India

Senior Site Reliability Engineer.Key Highlights ️ Build, scale, and optimize cloud-native infrastructure powering global, high-availability platforms ⚡ Drive automation-first engineering across AWS...Show moreLast updated: 1 day ago

Promoted
New!

Site Reliability Engineer

KarixDelhi, India

We are seeking an experienced professional Site Reliability Engineer who acts as a bridge between development and IT operations, taking operational tasks to ensure the efficient functioning of Serv...Show moreLast updated: 2 hours ago

Promoted
New!

Senior DataOps Engineer (AWS)

MSBC GroupDelhi, IN

Join us as a Senior DataOps Engineer (AWS)—Drive High-Performance Data Systems for Financial Services.Lead the E-Comms data pipeline within Compass’s Application Simplification workstream : design, ...Show moreLast updated: 13 hours ago

Promoted

Senior Site Reliability Engineer (C# / Python)

EntechDelhi, IN

Senior Software Site Reliability Engineer (C# / Python).You’ll ensure enterprise systems are reliable, scalable, and performant - driving improvements, leading SRE initiatives, and mentoring teams on...Show moreLast updated: 1 day ago

Promoted

Site Reliability Engineer

PhonePeDelhi, IN

SRE We are looking for engineers who are passionate about reliability, performance, and efficiency, and with experience in building tools, services, and automation to manage and improve production ...Show moreLast updated: 16 days ago