Talent.com
This job offer is not available in your country.
Site Reliability Engineer - OpenShift

Site Reliability Engineer - OpenShift

ConfidentialBengaluru / Bangalore
4 days ago
Job description

Job description

What will you do :

  • Applies software engineering principles to the operations domain.
  • Contributes to a service's codebase, writes automation that aids in the management of a service, and performs operational engineering work to support a service's Service Level Objectives (SLO).
  • Ensures service reliability meets users needs, including internally critical and externally visible services
  • Uses software & systems engineering to design, build, and run large-scale, distributed, fault-tolerant systems
  • Focuses on iterative improvement through toil reduction and error-budget enforcement
  • Interfaces with both cloud IaaS and SaaS providers and internal stakeholders, including Support, IT, and Product Engineering, to achieve desired outcomes.
  • Participates in an on-call rotation within a geographically distributed team to provide 24x7x365 production support, with responsibility to respond to urgent customer issues
  • Practice sustainable incident response and blameless postmortems
  • Work within a small agile team to develop and improve SRE methodologies, support your peers, plan and self-improve
  • Provide feedback around bugs and feature improvements to the various Red Hat Product Engineering teams

What will you bring

Bachelor's degree in computer science or a related technical field involving software or systems engineering, or practical experience demonstrating interest in SRE

2+ years of experience of using cloud providers and technologies (Google, Azure, Amazon, OpenStack, etc.)

1+ years of experience administering a kubernetes-based production environment

2+ years of experience programming with at least one object-oriented language; Golang, or Python are a big plus

Ability to collaboratively troubleshoot and solve problems in a team setting

Basic understanding of UNIX or Linux operating systems The following will be considered a plus

Demonstrated comfort with collaboration, open communication, and reaching across functional boundaries

Passion for understanding users needs and delivering outstanding user experiences

Additional

Skills :

  • Demonstrated ability to quickly and accurately troubleshoot system issues
  • Solid understanding of standard TCP / IP networking and common protocols like DNS and HTTP
  • 2+ years of experience managing Linux servers running Red Hat Enterprise Linux (RHEL), CentOS, or Fedora hosted at a cloud provider such as Amazon Web Services (AWS), Google Compute Engine (GCE), or Microsoft Azure
  • 1+ years of experience with enterprise systems monitoring
  • 2+ years of experience with enterprise configuration management software like Red Hat Ansible Automation Platform (AAP)
  • Experience with static code analysis tools
  • Some experience with code deployment across cloud-based environments
  • Some experience with continuous Integration and continuous deployment approaches
  • Some experience working with complex distributed systems
  • Demonstrated ability to debug, optimize code and automate routine tasks
  • Ability to work with minimal supervision and as part of a global team, and problem solving skills
  • Experience working with agile development methodologies
  • Skills Required

    Aws, Saas, SRE

    Create a job alert for this search

    Site Reliability Engineer • Bengaluru / Bangalore