Talent.com
Staff Site Reliability Engineer

Staff Site Reliability Engineer

SaviyntBengaluru, Karnātaka, India, 560023
13 days ago
Job description

Staff Site Reliability Engineer

Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the worlds leading brands, Fortune 500 companies and government institutions. For more information, please visit www.saviynt.com .

Our Monitoring and Alerting team within the SaaS Operations team combines Operations Excellence with the Development Experience to deliver services at high scale, high availability with resilience by using automation and Infrastructure Code. We build reliability into our ecosystem by applying best practices in Resiliency Engineering, Automation, Observability & Chaos Testing.

The team comes from diverse technical backgrounds, and the responsibilities provide the opportunity for a variety of challenges. Ideal candidates will have a background in either software engineering or systems engineering with a desire to learn the other or previous experience with building and managing Monitoring and Alerting systems. We are looking for a Systems Thinking, Principal Engineer who has helped teams scale through production insights, operational automation, building observability program, developer guidance, real-time metrics, automation, automation, automation!

WHAT YOU WILL BE DOING

  • Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics.
  • Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures.
  • Design and implement strategies, tooling, and processes to enhance system uptime and reliability.
  • Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability.
  • Align the platform with customer needs and business goals by working closely with cross-functional teams.
  • Run the production environment by monitoring availability and taking a holistic view of system health.
  • Build software and systems to monitor platform infrastructure and applications.
  • Monitor and Improve reliability, quality, and time-to-market of our suite of software solutions.
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
  • Provide primary operational support and engineering for multiple large-scale distributed software applications.
  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.

WHAT YOU BRING

  • Bachelors degree or higher in a technology related field (e.g. Engineering, Computer Science, etc.) required, Masters degree a plus
  • 6+ years professional experience Monitoring and Alerting roles on major cloud platforms (AWS, Azure), preferably someone with project leadership roles.
  • 4+ experience in Cloud development (AWS, Azure) and observability skills; Experience with building and operating highly resilient platforms in AWS cloud environments.
  • 3+ years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation
  • Hands-on experience with container orchestration, preferably with Kubernetes
  • Hands-on experience with building observability, monitoring and alerting on large scale distributed systems.
  • Leadership / design of application and / or infrastructure migration projects from on-prem to cloud
  • Cloud architecture design and implementation to solve key business needs and meet team goals.
  • Familiarity with current AWS solutions; Azure experience also considered.
  • Containerized workloads (Prefer Helm; Related : AKS & EKS, other K8s distributions, Docker, JFrog)
  • Logging and monitoring tools (Prefer : Prometheus, Grafana, Dataddon, AWS Cloudwatch; Related, , Azure Monitor, Log Analytics, Fluentd)
  • Network Security (e.g. AWZ Policy, Azure Policy, VPN, Active Directory / RBAC, ACLs, NSG rules, private endpoints)
  • Proven experience in implementing advanced observability practices and techniques at scale.
  • Hands on experience with one or more observability tools (Prometheus, Grafana,
  • ELK / OpenSearch, OpenTelemetry, Datadog, etc.)
  • Experienced in Instrumentation with systems skills on building and operating,
  • monitoring, logging, alerting services of distributed systems at scale.
  • Demonstrated ability to utilize modern monitoring tools (DataDog, Prometheus, etc)
  • Experienced in Instrumentation with systems skills on building and operating,
  • monitoring, logging, alerting services of distributed systems at scale.
  • Ability to build monitoring ecosystem with high fidelity alerting.
  • Ability to automate resolution of alerts.
  • Ability to automate with various scripting languages (Python, Golang, Shell scripting,etc.)
  • Knowledge of managing systems using infrastructure as code tools (IAM, ARM,Terraform, Chef)
  • Solid understanding of Cloud Computing and DevOps concepts.
  • Hands-on Kubernetes skills and knowledge.
  • Proven experience in maintaining scalability and resiliency of complex environment.
  • Ability to triage, execute root cause analysis, and be decisive under pressure
  • Experience managing and interpreting large datasets using query languages and visualization tools
  • Proficient communication skills with an ability to reach both technical and non-technical audience
  • Ability to learn new software, method and practices and bringing them to our developers
  • Ability to work with a variety of individuals and groups, both in person and virtually, in a
  • constructive and collaborative manner and build and maintain effective relationships
  • PI528901a96aa3-30511-38676541

    Create a job alert for this search

    Site Reliability Engineer • Bengaluru, Karnātaka, India, 560023

    Related jobs
    • Promoted
    Sr Site Reliability Engineer

    Sr Site Reliability Engineer

    Media.netbangalore, karnataka, in
    Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show moreLast updated: 17 days ago
    • Promoted
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Tata Consultancy ServicesBengaluru, Karnataka, India
    Senior Site Reliability Engineer (SRE).Senior Site Reliability Engineer (SRE).Desired Experience Range : 7 - 10 yrs.Notice Period : Immediate to 90Days only. We are currently planning to do a Virtual....Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmahosur, tamil nadu, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 27 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Tata Consultancy ServicesBengaluru, Republic Of India, IN
    Senior Site Reliability Engineer (SRE).Senior Site Reliability Engineer (SRE).Desired Experience Range : 7 - 10 yrs.Notice Period : Immediate to 90Days only. We are currently planning to do a Virtual....Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    SynechronBengaluru, Karnataka, India
    We have immediate opportunity for Senior Site Reliability Engineer.Senior Site Reliability Engineer.At Synechron, we believe in the power of digital to transform businesses for the better.Our globa...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.hosur, tamil nadu, in
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Tata Consultancy ServicesBengaluru, Karnataka, India
    Role : GCP SRE Required Technical Skill Set : GCP SRE Desired Experience Range : 6-8 yrs Location of Requirement : Bangalore Notice period : 90 We are currently planning to do Virtual Interview o...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Nebula Tech Solutionshosur, tamil nadu, in
    SRE team supporting mission-critical applications for our.We’re now looking for engineers who can go beyond operations — those who can. Enhance application reliability through code.Add or modify cod...Show moreLast updated: 7 days ago
    • Promoted
    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Jade Globalhosur, tamil nadu, in
    Senior Site Reliability Engineer (SRE) – Datadog Observability.SRE and Infrastructure Operations with minimum 3.Hyderabad preferable but open for Pune and remote. Site Reliability Engineer (SRE).SRE...Show moreLast updated: 7 days ago
    • Promoted
    Senior Staff Site Reliability Engineer

    Senior Staff Site Reliability Engineer

    Palo Alto NetworksBengaluru, Karnataka, India
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    ThalesBengaluru, Republic Of India, IN
    Apply SRE core tenets of measurement (SLI / SLO / SLA), eliminate toil, and reliability modeling.Enable and educate development teams on industry best practice design patterns, ways of working and oper...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Sonata SoftwareBengaluru, Republic Of India, IN
    In today's market, there is a unique duality in technology adoption.On one side, extreme focus on cost containment by clients, and on the other, deep motivation to modernize their Digital storefron...Show moreLast updated: 28 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ImpetusBengaluru, Republic Of India, IN
    You will be a key contributor in the implementation of CI / CD pipelines, managing infrastructure, container orchestration, and system monitoring. Good hands-on experience on Azure Cloud and leading t...Show moreLast updated: 20 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    6thStreet.comBengaluru, Republic Of India, IN
    Com is a one-stop shop for style-conscious women, men and kids in the UAE, KSA and Kuwait.The fashion-savvy destination offers collections from over 150 international fashion brands such as Dune Lo...Show moreLast updated: 7 days ago
    • Promoted
    Staff Site Reliability Engineer (Observability)

    Staff Site Reliability Engineer (Observability)

    Palo Alto NetworksBengaluru, Karnataka, India
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    super.moneyBengaluru, Karnataka, India
    Site Reliability Engineer (SRE) Level 3.A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiBangalore, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 17 days ago
    • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    ConfidentialBengaluru / Bangalore
    SRE organization s mission at SentinelOne (S1) is to keep our uptime promise to our customers by ensuring we meet our SLOs / SLAs, help our engineering teams ship software to our customers fast and w...Show moreLast updated: 30+ days ago