Talent.com
This job offer is not available in your country.
Senior Site Reliability Engineer- ELK Expert

Senior Site Reliability Engineer- ELK Expert

iVedha Inc.Hyderabad, IN
30+ days ago
Job description

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice

Location : India (Remote) - Must be available to work in the EST (US / Canada) Time Zone.

Role Summary :

Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?

We're looking for an SRE with 7+ years of experience , including 4+ years specializing in the ELK stack (Elasticsearch, Logstash, Kibana) , to join our Platform Engineering Practice . In this role, you’ll design, manage, and scale ELK clusters ingesting 2–3+ TB / day , enhance reliability across distributed systems, and drive automation within Azure cloud environments. This is a high-impact engineering opportunity focused on performance, observability, and operational excellence at scale.

Why Join Us

  • Career Growth : Work alongside industry experts on cutting-edge cloud technologies
  • Competitive Compensation and Benefits : We recognize and reward top talent
  • Exciting, Impactful Work : Design and build scalable, resilient cloud environments
  • Strategic Platform Role : Contribute to the foundation of next-gen observability and reliability infrastructure

What You Will Do

  • Design and Optimize Cloud Infrastructure : Architect scalable, fault-tolerant systems on Microsoft Azure
  • Automate Everything : Use Terraform, Ansible, and GitHub Actions to streamline deployment and configuration
  • Ensure Reliability and Performance : Proactively monitor, troubleshoot, and resolve production issues using Prometheus, Grafana, and Azure Monitor
  • Enhance Security and Compliance : Implement security best practices across DevOps workflows
  • Collaborate and Innovate : Work closely with engineering, security, and operations teams to drive automation and efficiency
  • Manage and scale large ELK clusters handling 2–3+ TB / day log volumes, ensuring high availability and performance
  • Optimize ELK architecture : Implement efficient index lifecycle policies, shard strategies, and hot-warm-cold tiered storage
  • Build and tune log pipelines : Scale Logstash and Beats pipelines across distributed environments
  • Support Kibana observability layers : Create dashboards, visualizations, and custom alerting frameworks (e.g., Watcher, ElastAlert)
  • What You Bring

  • 7+ years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering
  • 4+ years of dedicated, hands-on experience with ELK (Elasticsearch, Logstash, Kibana)
  • Strong experience managing large-scale ELK clusters in production with heavy ingestion (multi-TB / day)
  • Deep knowledge of index tuning, shard allocation, ILM policies , and scaling ELK components
  • Expertise in GitHub Actions, Terraform, Ansible, and Infrastructure as Code (IaC)
  • Proficiency in Python, Go, or Bash for automation and scripting
  • Deep understanding of Kubernetes, Docker , and cloud-native architectures
  • Experience with observability tools such as Prometheus, Grafana, Azure Monitor
  • Ability to work in a fast-paced, collaborative environment and solve complex operational issues
  • Education

  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field
  • Certifications (Nice to Have)

  • Microsoft Azure certifications : AZ-104 , AZ-400
  • Create a job alert for this search

    Senior Site Reliability Engineer • Hyderabad, IN

    Related jobs
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    AutoRABIThyderabad, telangana, in
    AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show moreLast updated: 16 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    GSPANN Technologies, Inchyderabad, telangana, in
    GSPANN is a global IT services and consultancy provider headquartered in Milpitas, California (U.With five global delivery centers across the globe, GSPANN provides digital solutions that support t...Show moreLast updated: 7 days ago
    • Promoted
    Engineer, Site Reliability [T500-20502]

    Engineer, Site Reliability [T500-20502]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 7 days ago
    • Promoted
    Engineer, Site Reliability [T500-20517]

    Engineer, Site Reliability [T500-20517]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 7 days ago
    • Promoted
    Engineer, Site Reliability [T500-20266]

    Engineer, Site Reliability [T500-20266]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 15 days ago
    • Promoted
    Sr Engineer, Site Reliability [T500-20279]

    Sr Engineer, Site Reliability [T500-20279]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 7 days ago
    • Promoted
    Sr Engineer, Site Reliability Engineer [T500-20464]

    Sr Engineer, Site Reliability Engineer [T500-20464]

    ANSRhyderabad, telangana, in
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 7 days ago
    • Promoted
    Engineer, Site Reliability [T500-20504]

    Engineer, Site Reliability [T500-20504]

    ANSRhyderabad, telangana, in
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 7 days ago
    • Promoted
    Engineer, Site Reliability [T500-20518]

    Engineer, Site Reliability [T500-20518]

    ANSRhyderabad, telangana, in
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 7 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    JA Solutions India Private LimitedHyderabad, India
    Hiring : Senior Site Reliability Engineer – SaaS Real Estate Platform.We are hiring on behalf of our reputed SaaS product-based client based in Hyderabad. They are a global leader in real estate ...Show moreLast updated: 5 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    WSO2hyderabad, telangana, in
    Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 7 days ago
    • Promoted
    Engineer, Site Reliability [T500-20519]

    Engineer, Site Reliability [T500-20519]

    ANSRhyderabad, telangana, in
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 7 days ago
    • Promoted
    Sr Engineer, Site Reliability [T500-20437]

    Sr Engineer, Site Reliability [T500-20437]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 6 days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    ExasoftHyderabad, IN
    Responsibilities and Requirements : .Experience must be at least 10+ years in SRE.Multi Cloud, Hybrid Cloud – on Data center sites. Experience with multiple operating systems (.Operating Systems, Kern...Show moreLast updated: less than 1 hour ago
    • Promoted
    Engineer, Site Reliability [T500-20521]

    Engineer, Site Reliability [T500-20521]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 7 days ago
    • Promoted
    Engineer, Site Reliability [T500-20503]

    Engineer, Site Reliability [T500-20503]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 7 days ago
    • Promoted
    Engineer, Site Reliability [T500-20515]

    Engineer, Site Reliability [T500-20515]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 6 days ago
    • Promoted
    Lead - Site Reliability Engineer

    Lead - Site Reliability Engineer

    VXI Global SolutionsHyderabad, Telangana, India
    We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications...Show moreLast updated: 25 days ago