Talent.com
This job offer is not available in your country.
Senior Site Reliability Engineer- ELK Expert

Senior Site Reliability Engineer- ELK Expert

iVedha Inc.India, India
30+ days ago
Job description

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice

Location : India (Remote) - Must be available to work in the EST (US / Canada) Time Zone.

Role Summary :

Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?

We're looking for an SRE with 7+ years of experience , including 4+ years specializing in the ELK stack (Elasticsearch, Logstash, Kibana) , to join our Platform Engineering Practice . In this role, you’ll design, manage, and scale ELK clusters ingesting 2–3+ TB / day , enhance reliability across distributed systems, and drive automation within Azure cloud environments. This is a high-impact engineering opportunity focused on performance, observability, and operational excellence at scale.

Why Join Us

  • Career Growth : Work alongside industry experts on cutting-edge cloud technologies
  • Competitive Compensation and Benefits : We recognize and reward top talent
  • Exciting, Impactful Work : Design and build scalable, resilient cloud environments
  • Strategic Platform Role : Contribute to the foundation of next-gen observability and reliability infrastructure

What You Will Do

  • Design and Optimize Cloud Infrastructure : Architect scalable, fault-tolerant systems on Microsoft Azure
  • Automate Everything : Use Terraform, Ansible, and GitHub Actions to streamline deployment and configuration
  • Ensure Reliability and Performance : Proactively monitor, troubleshoot, and resolve production issues using Prometheus, Grafana, and Azure Monitor
  • Enhance Security and Compliance : Implement security best practices across DevOps workflows
  • Collaborate and Innovate : Work closely with engineering, security, and operations teams to drive automation and efficiency
  • Manage and scale large ELK clusters handling 2–3+ TB / day log volumes, ensuring high availability and performance
  • Optimize ELK architecture : Implement efficient index lifecycle policies, shard strategies, and hot-warm-cold tiered storage
  • Build and tune log pipelines : Scale Logstash and Beats pipelines across distributed environments
  • Support Kibana observability layers : Create dashboards, visualizations, and custom alerting frameworks (e.g., Watcher, ElastAlert)
  • What You Bring

  • 7+ years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering
  • 4+ years of dedicated, hands-on experience with ELK (Elasticsearch, Logstash, Kibana)
  • Strong experience managing large-scale ELK clusters in production with heavy ingestion (multi-TB / day)
  • Deep knowledge of index tuning, shard allocation, ILM policies , and scaling ELK components
  • Expertise in GitHub Actions, Terraform, Ansible, and Infrastructure as Code (IaC)
  • Proficiency in Python, Go, or Bash for automation and scripting
  • Deep understanding of Kubernetes, Docker , and cloud-native architectures
  • Experience with observability tools such as Prometheus, Grafana, Azure Monitor
  • Ability to work in a fast-paced, collaborative environment and solve complex operational issues
  • Education

  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field
  • Certifications (Nice to Have)

  • Microsoft Azure certifications : AZ-104 , AZ-400
  • Create a job alert for this search

    Senior Site Reliability Engineer • India, India

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ExasoftIndia, India
    Responsibilities and Requirements : .Experience must be at least 10+ years in SRE.Multi Cloud, Hybrid Cloud – on Data center sites. Experience with multiple operating systems (.Operating Systems, Kern...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer - Chaos Management

    Site Reliability Engineer - Chaos Management

    Xebianagpur, maharashtra, in
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 9 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Insight GlobalIndia
    Must be able to join within 30 days or less!.An employer is looking for an SRE to join their enterprise level SRE team.They are building a specialized team of Senior Site Reliability Engineers to a...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    BirlasoftIndia
    Be primarily responsible for providing production, operations support and application administration to business and web applications, 3rd party applications and related ecosystems.The application ...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    UplersNagpur, IN
    Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 26 days ago
    • Promoted
    Senior Staff Site Reliability Engineer

    Senior Staff Site Reliability Engineer

    Palo Alto NetworksIndia
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 30+ days ago
    • Promoted
    Sr Site Reliability Engineer

    Sr Site Reliability Engineer

    Media.netIndia
    Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show moreLast updated: 1 day ago
    • Promoted
    Sr Engineer, Site Reliability [T500-20425]

    Sr Engineer, Site Reliability [T500-20425]

    ANSRIndia
    ANSR is hiring for one of its clients.About T-Mobile : T-Mobile US, Inc.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its st...Show moreLast updated: 9 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    XebiaNagpur, IN
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 28 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    AutoRABITIndia
    AutoRABIT Profile AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery co...Show moreLast updated: 18 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Luxoft IndiaIndia
    Project Description : We are looking for an experienced technical developer to work for one of our client from the banking industry. Project goal is to maintain and develop solutions.Responsibilities...Show moreLast updated: 19 days ago
    • Promoted
    Senior Site Reliability Engineer [T500-20117]

    Senior Site Reliability Engineer [T500-20117]

    Delta Air LinesIndia
    About Delta Tech Hub : Delta Air Lines (NYSE : DAL) is the U.Powered by our employees around the world, Delta has for a decade led the airline industry in operational excellence while maintaining our...Show moreLast updated: 22 days ago
    • Promoted
    Reliability Engineer and Planning Engineer

    Reliability Engineer and Planning Engineer

    JobTravia Pvt. Ltd.Nagpur, IN
    Reliability / Planning Superintendent.Lead reliability and maintenance planning across the processing plant to ensure safe, efficient, and cost-effective operations. Drive continuous improvement, asse...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    BayOne Solutionsnagpur, maharashtra, in
    Role : Site Reliability Engineer.The CXE Site Reliability Engineering (SRE) team manages the CI / CD pipelines and cloud infrastructure, ensuring seamless deployment, monitoring, and maintenance.Howev...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    GSPANN Technologies, IncIndia
    About Company : GSPANN is a global IT services and consultancy provider headquartered in Milpitas, California (U.With five global delivery centers across the globe, GSPANN provides digital solution...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Site Reliability [T500-20504]

    Engineer, Site Reliability [T500-20504]

    ANSRIndia
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 9 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConcordNagpur, IN
    Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 20 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TechVeritoIndia
    About the Role : 3-5 years of proven and progressive experience as an.As a SRE Engineer, you will have a strong background in cloud infrastructure management, migration and deployment, with expertis...Show moreLast updated: 1 day ago