This job offer is not available in your country.

Site Reliability Engineer - OpenShift

ConfidentialBengaluru / Bangalore

4 days ago

Job description

What will you do :

Applies software engineering principles to the operations domain.
Contributes to a service's codebase, writes automation that aids in the management of a service, and performs operational engineering work to support a service's Service Level Objectives (SLO).
Ensures service reliability meets users needs, including internally critical and externally visible services
Uses software & systems engineering to design, build, and run large-scale, distributed, fault-tolerant systems
Focuses on iterative improvement through toil reduction and error-budget enforcement
Interfaces with both cloud IaaS and SaaS providers and internal stakeholders, including Support, IT, and Product Engineering, to achieve desired outcomes.
Participates in an on-call rotation within a geographically distributed team to provide 24x7x365 production support, with responsibility to respond to urgent customer issues
Practice sustainable incident response and blameless postmortems
Work within a small agile team to develop and improve SRE methodologies, support your peers, plan and self-improve
Provide feedback around bugs and feature improvements to the various Red Hat Product Engineering teams

What will you bring

Bachelor's degree in computer science or a related technical field involving software or systems engineering, or practical experience demonstrating interest in SRE

2+ years of experience of using cloud providers and technologies (Google, Azure, Amazon, OpenStack, etc.)

1+ years of experience administering a kubernetes-based production environment

2+ years of experience programming with at least one object-oriented language; Golang, or Python are a big plus

Ability to collaboratively troubleshoot and solve problems in a team setting

Basic understanding of UNIX or Linux operating systems The following will be considered a plus

Demonstrated comfort with collaboration, open communication, and reaching across functional boundaries

Passion for understanding users needs and delivering outstanding user experiences

Additional

Skills :

Demonstrated ability to quickly and accurately troubleshoot system issues

Solid understanding of standard TCP / IP networking and common protocols like DNS and HTTP

2+ years of experience managing Linux servers running Red Hat Enterprise Linux (RHEL), CentOS, or Fedora hosted at a cloud provider such as Amazon Web Services (AWS), Google Compute Engine (GCE), or Microsoft Azure

1+ years of experience with enterprise systems monitoring

2+ years of experience with enterprise configuration management software like Red Hat Ansible Automation Platform (AAP)

Experience with static code analysis tools

Some experience with code deployment across cloud-based environments

Some experience with continuous Integration and continuous deployment approaches

Some experience working with complex distributed systems

Demonstrated ability to debug, optimize code and automate routine tasks

Ability to work with minimal supervision and as part of a global team, and problem solving skills

Experience working with agile development methodologies

Skills Required

Aws, Saas, SRE

Create a job alert for this search

Site Reliability Engineer • Bengaluru / Bangalore