Key Job Responsibilities and Duties :
The core premise for the SRE lies in treating operational issues as a software problem.
We code our way out of problems where operations are concerned addressing availability,
scalability, latency, and efficiency challenges within the vast infrastructure here.
- You will impact millions of people all over the globe with your creative solutions
- You work in one of the biggest e-commerce companies in the world
- You will solve exciting problems at scale by writing and deploying code across tens of thousands of servers
- You will have the opportunity to collaborate with many of the world’s leading SREs
- You will be free to launch your own ideas and solutions within our sophisticated production environment
- Here are some of the tools and technologies we use to achieve this : Python, Go, Puppet, Kubernetes, Elasticsearch, Prometheus, HAProxy, Cassandra, Kafka etc
What you’ll be Doing :
Design, develop and implement systems software that improves the stability, scalability, availability and latency of the products;Take ownership of one or more services and have the freedom to do what is best for our business and customers;Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again;Build effective monitoring to monitor the health of your system, and jump in to handle outages;Build and run capacity tests to handle the growth of your systems;Plan for reliability by designing systems to work across our multinational data centers;Develop tools to assist the product development teams with successfully deploying 1000s of change sets every day;Share the on-call rotation and be an escalation contact for incidents (depending on level of role)What you’ll bring :
Solid experience in at least one programming language.Experience with building, operating and maintaining scalable distributed systems, and with operations automation;Experience with Infrastructure as Code technologies;Knowledge of cloud computing fundamentals;Solid foundation in Linux administration and troubleshooting;Understanding of Service level agreements and objectives;Additional experience in OpenStack, Kubernetes, Networking, Security or Storage is desirable;Monitoring / observability technologies like Prometheus, Graphite, Grafana, Kibana, Elasticsearch are a plus;Good interpersonal skillsProficient command of the English language, both written and spoken