Talent.com
Staff Site Reliability Engineer

Staff Site Reliability Engineer

ConfidentialMumbai, Kolkata, Delhi
30+ days ago
Job description

Roles and Responsibilities :

What are we looking for

SRE organization s mission at SentinelOne (S1) is to keep our uptime promise to our customers by ensuring we meet our SLOs / SLAs, help our engineering teams ship software to our customers fast and with quality and ensure our customers are successful.

In this job as Staff SRE, you will join the Core SRE team at S1 and have an amazing opportunity to drive outcomes that improve reliability, stability and cost efficiency of S1 s Singularity Platform - our largest customer facing service, which has over 12,000 B2B / B2G customers deployed across over 6 regions and 2 cloud service providers.

Big projects that are upcoming that you could work on include e.g. : Monitoring and Observability Uplift, Logging Pipeline modernization, Toil automatisation and more!

What will you do

  • We are looking to add a Staff SRE with prior extensive operations experience for a SaaS product, who can drive deployment re-architecture with focus on self-service and automation. Someone who has delivered SaaS products on multi-cloud, on-prem and air gapped environments, driven continuous delivery of software, has run incident post-mortems, has provided feedback to engineering architecture decisions and has automated repetitive operational tasks.
  • You will join a like minded team of SRE s who help run our operations smoothly at scale by building a platform on which S1 s services can run. If the thought of running a large scale cybersecurity platform on various cloud providers and air gapped environments excite you, you ve found the right place!
  • As a team we value good written communication skills, data driven decisions and a keen eye for continuous improvements. You ll help simplify, have a passion for new ideas and know how to execute iteratively towards the final goal. We value candor and collaboration.

What skills and knowledge should you bring

  • Several years of experience in running site reliability for SaaS products, running operations at a large scale and proven experience in leading design and architecture of infrastructure (cloud and on-prem combined)
  • Multi-cloud experience, deep expertise with  at least one  of AWS / GCP / Azure platforms
  • Production experience with orchestration systems like Kubernetes, Nomad or Mesos (We are a Kubernetes shop)
  • Any experience with Rancher, Platform9 or other managed k8s providers is desired
  • Familiarity with air gapped deployments on top of k8s
  • Familiarity with Kafka and Redis
  • Familiar with IaaC and tools (Terraform or Pulumi)
  • Familiarity with CI and practical delivery using any of the major tools, familiarity with deployment strategies like blue green, rolling deploys, canary deploys and best practices around deployment automation (with tools like shipit or spinnaker) is desired
  • Demonstrated Proficiency in  at least 1 mainstream language (Python / GoLang / Ruby / etc)
  • Familiarity with SecOps & Compliance processes and their touch points with SRE is desired
  • Polyglot experience with other SRE tools - we integrate with more tools every day
  • Keeping a pulse on latest SRE trends and Open Source
  • Prior product building experience
  • Apart from the above technical skills, following soft skills are required :

  • Curiosity, fast-learning, pursuit to improvements, great communication
  • Ability to work in a diverse and distributed team
  • A self-starter that is passionate and motivated by new technologies and has empathy for legacy systems
  • A quick learner that can navigate through unfamiliar programming languages, systems and processes
  • Skills Required

    Site Reliability Engineer, Saas, Kafka, Python

    Create a job alert for this search

    Site Reliability Engineer • Mumbai, Kolkata, Delhi