Talent.com
Systems Reliability Engineer

Systems Reliability Engineer

PhonePePune, Republic Of India, IN
1 day ago
Job description

Roles and Responsibilities

  • Troubleshoot issues across the entire stack - hardware, software, application, and network
  • Work to improve the reliability and performance of the next generation of distributed systems
  • and containerized deployments
  • Work to improve the reliability and performance of the next generation of distributed systems
  • and containerized deployments
  • Diagnose and troubleshoot complex distributed systems handling millions of queries per second
  • Day-to-day work is heavily command-line driven, which requires a strong understanding of Linux.
  • Participate in on call rotation Design build and maintain core infrastructure that enables Phonepe scaling to support hundreds of thousands of concurrent users
  • Actively take part in the Analysis and System improvement plan.
  • Drive performance testing, capacity planning and high availability practices.
  • Own implementations of new technologies while ensuring proper testing and documentation.
  • Proactively monitor / identify / solve issues which could have a potential impact to our Infrastructure.
  • Natural team player and also have a resourceful attitude.
  • Buddy new team members, and get them production ready.

Skills Required

  • Minimum of 7-13 years of strong hands-on experience in Linux / Unix System Administration, including TCP / IP, DNS, and load balancers.
  • Expertise in managing and scaling proxy infrastructure, including configuring and optimizing
  • proxies (e.G. Nginx, HAProxy).
  • Knowledge in Database technologies, specifically in MySQL / NoSQL. Good to have exposure on Aerospike NoSQL.
  • In-depth knowledge in Python to automate tasks with minimal intervention.
  • Knowledge of Linux cloud services using kvm / qemu / lvm.
  • Create a job alert for this search

    Reliability Engineer • Pune, Republic Of India, IN