Talent.com
Production Systems Reliability Engineer

Production Systems Reliability Engineer

RecRootsRepublic Of India, IN
30+ days ago
Job description

Key Job Responsibilities and Duties :

The core premise for the SRE lies in treating operational issues as a software problem.

We code our way out of problems where operations are concerned addressing availability,

scalability, latency, and efficiency challenges within the vast infrastructure here.

  • You will impact millions of people all over the globe with your creative solutions
  • You work in one of the biggest e-commerce companies in the world
  • You will solve exciting problems at scale by writing and deploying code across tens of thousands of servers
  • You will have the opportunity to collaborate with many of the world’s leading SREs
  • You will be free to launch your own ideas and solutions within our sophisticated production environment
  • Here are some of the tools and technologies we use to achieve this : Python, Go, Puppet, Kubernetes, Elasticsearch, Prometheus, HAProxy, Cassandra, Kafka etc

What you’ll be Doing :

  • Design, develop and implement systems software that improves the stability, scalability, availability and latency of the products;
  • Take ownership of one or more services and have the freedom to do what is best for our business and customers;
  • Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again;
  • Build effectivemonitoring to monitor the health of your system, and jump in to handle outages;
  • Build and run capacity tests to handle the growth of your systems;
  • Plan for reliability by designing systems to work across our multinational data centers;
  • Develop tools to assist the product development teams with successfully deploying 1000s of change sets every day;
  • Share the on-call rotation and be an escalation contact for incidents (depending on level of role)
  • What you’ll bring :

  • Solid experience in at least one programming language.
  • Experience with building, operating and maintaining scalable distributed systems, and with operations automation;
  • Experience withInfrastructure as Code technologies;
  • Knowledge of cloud computing fundamentals;
  • Solid foundation in Linux administration and troubleshooting;
  • Understanding of Service level agreements and objectives;
  • Additional experience in OpenStack, Kubernetes, Networking, Security or Storage is desirable;
  • Monitoring / observability technologies like Prometheus, Graphite, Grafana, Kibana, Elasticsearch are a plus;
  • Good interpersonal skills
  • Proficient command of the English language, both written and spoken
  • Create a job alert for this search

    Reliability Engineer • Republic Of India, IN

    Related jobs
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeIndia
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 14 days ago
    • Promoted
    Principal Reliability Solutions Engineer

    Principal Reliability Solutions Engineer

    EssarRepublic Of India, IN
    We are a team of reliability experts, delivering cutting-edge condition monitoring, protection, and reliability solutions for rotating equipment and critical assets. By combining remote diagnostics ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Nagpur, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Turgajo Technologies Pvt. Ltd.Republic Of India, IN
    We are a product-based company, on a mission to capitalize on the evolution of new technologies and the new opportunities they present. We develop cutting-edge software solutions for the service ind...Show moreLast updated: 23 days ago
    • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    ConfidentialIndia
    The Production Engineering and Artificial Intelligence (AI) Group, part of the Linux Systems Group within Microsoft, plays a critical role in powering Azure Cloud. This team ensures that Azure opera...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer 2

    Site Reliability Engineer 2

    ConfidentialIndia
    Every career journey is personal.That's why we empower you with the tools and support to create your own success story.We are seeking a skilled Site Reliability Engineer 2 (SRE 2) with a strong bac...Show moreLast updated: 5 days ago
    • Promoted
    Emulation Engineer / Lead

    Emulation Engineer / Lead

    eInfochips (An Arrow Company)Nagpur, IN
    Role : Emulation Engineer / Lead.Job Location : Noida, Chennai, Bangalore, Hyderabad, Ahmedabad.You must be having BS or MS in Electrical OR Electronics engineering. Minimum 4+ Years of Emulation Expe...Show moreLast updated: 30+ days ago
    • Promoted
    DevOps / Platform Engineer

    DevOps / Platform Engineer

    iVedha Inc.Nagpur, IN
    Hiring a seasoned DevOps / Platform Engineer to drive automation, platform reliability, and robust.Design, deploy, and manage CI / CD pipelines and infrastructure automation, leveraging AI for.Implemen...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiIndia, India
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 11 days ago
    • Promoted
    Sr Systems Engineer Linux – AI Infrastructure

    Sr Systems Engineer Linux – AI Infrastructure

    DC Tech ConsultingNagpur, IN
    Position : Senior Linux Administrator – AI / ML Infrastructure.We are seeking a highly skilled Senior Linux Administrator to join our team, focusing on the implementation and management of on-premises...Show moreLast updated: 30+ days ago
    • Promoted
    Senior MLOps Engineer (Production)

    Senior MLOps Engineer (Production)

    SAIVA AINagpur, IN
    We are seeking a Senior Machine Learning Engineer to join our team and help shape the future of healthcare technology.In this role, you will design, build, and deploy machine learning systems that ...Show moreLast updated: 19 days ago
    • Promoted
    Systems Reliability Specialist

    Systems Reliability Specialist

    Persistent SystemsPune, Republic Of India, IN
    We are looking for a versatile and experienced Linux & Cloud Infrastructure Engineer to join our technology team.This role involves managing and optimizing cloud infrastructure, automating system c...Show moreLast updated: 23 days ago
    • Promoted
    Hardware Engineer (Remote)

    Hardware Engineer (Remote)

    Phinity LabsNagpur, IN
    Remote
    Phinity is helping the labs building AGI automate hardware engineering by building environments to train agents on hardware design and verification tasks. Our customers include one of the largest fr...Show moreLast updated: 30+ days ago
    • Promoted
    Remote GenAI Engineer

    Remote GenAI Engineer

    EazyMLNagpur, IN
    Remote
    Founded by Bell Labs research veterans, and associated with breakthrough startups like Amelia, EazyML, specializes in Transparent Machine Learning. Early on EazyML founders saw the need for Transpa...Show moreLast updated: 16 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConfidentialIndia
    Deel is the all-in-one payroll and HR platform for global teams.Our vision is to unlock global opportunity for every person, team, and business. Built for the way the world works today, Deel combine...Show moreLast updated: 30+ days ago
    • Promoted
    Compliance Engineer - Safety and Quality Compliance (Remote)

    Compliance Engineer - Safety and Quality Compliance (Remote)

    CertivoNagpur, IN
    Remote
    Certivo is an AI-first platform that assembles, validates, and keeps regulatory.We turn messy supplier documents into.You’ll be the company’s point of truth for. Your work directly determines whethe...Show moreLast updated: 12 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    JefferiesRepublic Of India, IN
    Jefferies,’’ ‘‘we,’’ ‘‘us’’ or ‘‘our’’) is a U.Our largest subsidiary, Jefferies LLC, a U.Jefferies International Limited, a U. Our strategy focuses on continuing to build out our investment banking...Show moreLast updated: 30+ days ago
    • Promoted
    System Reliability Engineer

    System Reliability Engineer

    ConfidentialIndia
    Job Title : Site Reliability Engineer (Technical Support).ThoughtSpot is an AI-powered analytics platform that enables users to explore and analyze data through natural language queries, making insi...Show moreLast updated: 5 days ago