Talent.com
This job offer is not available in your country.
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Rakuten IndiaBengaluru, Karnataka, India
7 days ago
Job description

Responsibilities :

  • Design, develop SLA, SLO, SLI of services within the Business Unit.
  • Involve in whole process of Development, Production System Operation including system maintenance, monitoring, automation, backend operation, ensuring high availability, regular application release, troubleshooting, middleware performance tuning and collaborating with functional, technical team members to provide high quality services.
  • Involve in automation of routine manual production / non-production operation using technologies like Ansible, Chief etc. Will be the key person to propose, implement automation to increase productivity, quality.
  • Always improve the system performance, reliability
  • Should have service ownership mind & proactively able to react to the production issues.
  • Propose new technologies, tools etc. to improve the whole process of development, testing and production operations. Strong self-learning ability, motivation to work on new Technologies.
  • Work closely with developers, product manager, project manager, team lead, security, and QA team members in different location (Singapore, Japan, India etc.)

Exp : 8 Years - 14 Years

Qualifications :
  • Must-have
  • Over 8 years of experience on SRE, handling high traffic production system independently, troubleshooting (middleware, infra), automation, regular operation etc.
  • Implement Site Reliability Engineering principles regarding performance, reliability, monitoring, alerting in Production environment
  • Experience in management of large-scale service.
  • Experience in design and construction of public cloud (Ex. GCP, Azure), preferably GCP.
  • Good knowledge in CI / CD / CT pipeline using tools such as Jenkins / Bamboo and VCS such as GIT / SVN
  • Strong knowledge in LINUX based system operation and extensive skills in Linux commands.
  • Hands-on experience in Unix / Linux / Shell / Python scripting
  • Experience with automation / configuration management, e.g., Terraform, Puppet, Chef, Ansible
  • Experience in developing and operating one or more of following systems : Kubernetes, Nginx, ELK stack, Hadoop, etc.
  • Identify process gaps and recommend on best practices based on industry standards.
  • Provide technical expertise on complex automation and functional issues.
  • Flexible emergency support timing based on the business requirement. Must adapt to business needs in terms of working hours.
  • Big Data technologies such as Hadoop, NoSQL - Couchbase, Cassandra
  • Create a job alert for this search

    Site Reliability Engineer • Bengaluru, Karnataka, India