Roles and Responsibilities
- Troubleshoot issues across the entire stack - hardware, software, application, and network
- Work to improve the reliability and performance of the next generation of distributed systems
- and containerized deployments
- Work to improve the reliability and performance of the next generation of distributed systems
- and containerized deployments
- Diagnose and troubleshoot complex distributed systems handling millions of queries per second
- Day-to-day work is heavily command-line driven, which requires a strong understanding of Linux.
- Participate in on call rotation Design build and maintain core infrastructure that enables Phonepe scaling to support hundreds of thousands of concurrent users
- Actively take part in the Analysis and System improvement plan.
- Drive performance testing, capacity planning and high availability practices.
- Own implementations of new technologies while ensuring proper testing and documentation.
- Proactively monitor / identify / solve issues which could have a potential impact to our Infrastructure.
- Natural team player and also have a resourceful attitude.
- Buddy new team members, and get them production ready.
Skills Required
Minimum of 7-13 years of strong hands-on experience in Linux / Unix System Administration, including TCP / IP, DNS, and load balancers.Expertise in managing and scaling proxy infrastructure, including configuring and optimizingproxies (e.g. Nginx, HAProxy).Knowledge in Database technologies, specifically in MySQL / NoSQL. Good to have exposure on Aerospike NoSQL.In-depth knowledge in Python to automate tasks with minimal intervention.Knowledge of Linux cloud services using kvm / qemu / lvm.Skills Required
Nginx, Kvm, Lvm, qemu, Tcp Ip, Dns, Nosql, Mysql, Load Balancers, Haproxy, Python