Staff Site Reliability Engineer

ConfidentialMumbai, Kolkata, Delhi

30+ days ago

Job description

Roles and Responsibilities :

What are we looking for

SRE organization s mission at SentinelOne (S1) is to keep our uptime promise to our customers by ensuring we meet our SLOs / SLAs, help our engineering teams ship software to our customers fast and with quality and ensure our customers are successful.

In this job as Staff SRE, you will join the Core SRE team at S1 and have an amazing opportunity to drive outcomes that improve reliability, stability and cost efficiency of S1 s Singularity Platform - our largest customer facing service, which has over 12,000 B2B / B2G customers deployed across over 6 regions and 2 cloud service providers.

Big projects that are upcoming that you could work on include e.g. : Monitoring and Observability Uplift, Logging Pipeline modernization, Toil automatisation and more!

What will you do

We are looking to add a Staff SRE with prior extensive operations experience for a SaaS product, who can drive deployment re-architecture with focus on self-service and automation. Someone who has delivered SaaS products on multi-cloud, on-prem and air gapped environments, driven continuous delivery of software, has run incident post-mortems, has provided feedback to engineering architecture decisions and has automated repetitive operational tasks.
You will join a like minded team of SRE s who help run our operations smoothly at scale by building a platform on which S1 s services can run. If the thought of running a large scale cybersecurity platform on various cloud providers and air gapped environments excite you, you ve found the right place!
As a team we value good written communication skills, data driven decisions and a keen eye for continuous improvements. You ll help simplify, have a passion for new ideas and know how to execute iteratively towards the final goal. We value candor and collaboration.

What skills and knowledge should you bring

Several years of experience in running site reliability for SaaS products, running operations at a large scale and proven experience in leading design and architecture of infrastructure (cloud and on-prem combined)

Multi-cloud experience, deep expertise with at least one of AWS / GCP / Azure platforms

Production experience with orchestration systems like Kubernetes, Nomad or Mesos (We are a Kubernetes shop)

Any experience with Rancher, Platform9 or other managed k8s providers is desired

Familiarity with air gapped deployments on top of k8s

Familiarity with Kafka and Redis

Familiar with IaaC and tools (Terraform or Pulumi)

Familiarity with CI and practical delivery using any of the major tools, familiarity with deployment strategies like blue green, rolling deploys, canary deploys and best practices around deployment automation (with tools like shipit or spinnaker) is desired

Demonstrated Proficiency in at least 1 mainstream language (Python / GoLang / Ruby / etc)

Familiarity with SecOps & Compliance processes and their touch points with SRE is desired

Polyglot experience with other SRE tools - we integrate with more tools every day

Keeping a pulse on latest SRE trends and Open Source

Prior product building experience

Apart from the above technical skills, following soft skills are required :

Curiosity, fast-learning, pursuit to improvements, great communication

Ability to work in a diverse and distributed team

A self-starter that is passionate and motivated by new technologies and has empathy for legacy systems

A quick learner that can navigate through unfamiliar programming languages, systems and processes

Skills Required

Site Reliability Engineer, Saas, Kafka, Python

Create a job alert for this search

Site Reliability Engineer • Mumbai, Kolkata, Delhi