Talent.com
This job offer is not available in your country.
[3 Days Left] Site Reliability Engineer II

[3 Days Left] Site Reliability Engineer II

RecRootsBengaluru, Karnataka, India
16 hours ago
Job description

Key Job Responsibilities and Duties :

The core premise for the SRE lies in treating operational issues as a software problem.

We code our way out of problems where operations are concerned addressing availability,

scalability, latency, and efficiency challenges within the vast infrastructure here.

  • You will impact millions of people all over the globe with your creative solutions
  • You work in one of the biggest e-commerce companies in the world
  • You will solve exciting problems at scale by writing and deploying code across tens of thousands of servers
  • You will have the opportunity to collaborate with many of the world’s leading SREs
  • You will be free to launch your own ideas and solutions within our sophisticated production environment
  • Here are some of the tools and technologies we use to achieve this : Python, Go, Puppet, Kubernetes, Elasticsearch, Prometheus, HAProxy, Cassandra, Kafka etc

What you’ll be Doing :

  • Design, develop and implement systems software that improves the stability, scalability, availability and latency of the products;
  • Take ownership of one or more services and have the freedom to do what is best for our business and customers;
  • Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again;
  • Build effective monitoring to monitor the health of your system, and jump in to handle outages;
  • Build and run capacity tests to handle the growth of your systems;
  • Plan for reliability by designing systems to work across our multinational data centers;
  • Develop tools to assist the product development teams with successfully deploying 1000s of change sets every day;
  • Share the on-call rotation and be an escalation contact for incidents (depending on level of role)
  • What you’ll bring :

  • Solid experience in at least one programming language.
  • Experience with building, operating and maintaining scalable distributed systems, and with operations automation;
  • Experience with Infrastructure as Code technologies;
  • Knowledge of cloud computing fundamentals;
  • Solid foundation in Linux administration and troubleshooting;
  • Understanding of Service level agreements and objectives;
  • Additional experience in OpenStack, Kubernetes, Networking, Security or Storage is desirable;
  • Monitoring / observability technologies like Prometheus, Graphite, Grafana, Kibana, Elasticsearch are a plus;
  • Good interpersonal skills
  • Proficient command of the English language, both written and spoken
  • Create a job alert for this search

    Site Reliability Engineer • Bengaluru, Karnataka, India