Talent.com
This job offer is not available in your country.
▷ (20 / 10 / 2025) Senior Site Reliability Engineer

▷ (20 / 10 / 2025) Senior Site Reliability Engineer

RecRootsIndia
11 hours ago
Job description

The core premise for the SRE lies in treating operational issues as a software problem. We code our way out of problems where operations are concerned, addressing availability, scalability, latency, and efficiency challenges within the vast infrastructure here.

Responsibilities :

  • Design, develop, and implement software that improves the stability, scalability, availability, and latency of the products.
  • Take ownership of one or more services and have the freedom to do what is best for our business and customers.
  • Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again.
  • Build effective monitoring to supervise the health of your system, and jump in to handle outages.
  • Build and run capacity tests to manage the growth of your systems.
  • Plan for reliability by designing systems to work across our multinational data centers.
  • Develop tools to assist the product development teams with successfully deploying 1000s of change sets every day.
  • Be an advocate of engineering standard processes.
  • Share the on-call rotation and be an escalation contact for incidents.
  • Contribute to growth through interviewing, onboarding, or other tasks.

Requirements :

  • 8 years of experience with building, operating, and maintaining sophisticated and scalable systems and with operations automation.
  • Solid experience in at least one programming language. We use Java, Python, Go, Ruby, and Perl.
  • Experience with Infrastructure as Code technologies.
  • Knowledge of cloud computing fundamentals.
  • Solid foundation in Linux administration and troubleshooting.
  • Understanding of service-level agreements and objectives.
  • Additional experience in OpenStack, Kubernetes, Networking, Security, or Storage is desirable.
  • Supervising / observability technologies like Prometheus, Graphite, Grafana, Kibana, and Elasticsearch are a plus.
  • Good interpersonal skills.
  • Proficient command of the English language, both written and spoken.
  • Here are some of the tools and technologies we use to achieve this : Python, Go, Puppet, Kubernetes, Elasticsearch, Prometheus, HAProxy, Cassandra, Kafka, etc.
  • Create a job alert for this search

    Senior Site Reliability Engineer • India