Talent.com
Senior Site Reliability Engineer- Elk Expert

Senior Site Reliability Engineer- Elk Expert

iVedha Inc.Thiruvananthapuram, Republic Of India, IN
17 days ago
Job description

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice

Location : India (Remote) - Must be available to work in the EST (US / Canada) Time Zone.

Role Summary :

Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?

We're looking for an SRE with 7+ years of experience , including 4+ years specializing in the ELK stack (Elasticsearch, Logstash, Kibana) , to join our Platform Engineering Practice . In this role, you’ll design, manage, and scale ELK clusters ingesting 2–3+ TB / day , enhance reliability across distributed systems, and drive automation within Azure cloud environments. This is a high-impact engineering opportunity focused on performance, observability, and operational excellence at scale.

Why Join Us

  • Career Growth : Work alongside industry experts on cutting-edge cloud technologies
  • Competitive Compensation and Benefits : We recognize and reward top talent
  • Exciting, Impactful Work : Design and build scalable, resilient cloud environments
  • Strategic Platform Role : Contribute to the foundation of next-gen observability and reliability infrastructure

What You Will Do

  • Design and Optimize Cloud Infrastructure : Architect scalable, fault-tolerant systems on Microsoft Azure
  • Automate Everything : Use Terraform, Ansible, and GitHub Actions to streamline deployment and configuration
  • Ensure Reliability and Performance : Proactively monitor, troubleshoot, and resolve production issues using Prometheus, Grafana, and Azure Monitor
  • Enhance Security and Compliance : Implement security best practices across DevOps workflows
  • Collaborate and Innovate : Work closely with engineering, security, and operations teams to drive automation and efficiency
  • Manage and scale large ELK clusters handling 2–3+ TB / day log volumes, ensuring high availability and performance
  • Optimize ELK architecture : Implement efficient index lifecycle policies, shard strategies, and hot-warm-cold tiered storage
  • Build and tune log pipelines : Scale Logstash and Beats pipelines across distributed environments
  • Support Kibana observability layers : Create dashboards, visualizations, and custom alerting frameworks (e.G., Watcher, ElastAlert)
  • What You Bring

  • 7+ years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering
  • 4+ years of dedicated, hands-on experience with ELK (Elasticsearch, Logstash, Kibana)
  • Strong experience managing large-scale ELK clusters in production with heavy ingestion (multi-TB / day)
  • Deep knowledge of index tuning, shard allocation, ILM policies , and scaling ELK components
  • Expertise in GitHub Actions, Terraform, Ansible, and Infrastructure as Code (IaC)
  • Proficiency in Python, Go, or Bash for automation and scripting
  • Deep understanding of Kubernetes, Docker , and cloud-native architectures
  • Experience with observability tools such as Prometheus, Grafana, Azure Monitor
  • Ability to work in a fast-paced, collaborative environment and solve complex operational issues
  • Education

  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field
  • Certifications (Nice to Have)

  • Microsoft Azure certifications : AZ-104 , AZ-400
  • Create a job alert for this search

    Senior Site Reliability Engineer • Thiruvananthapuram, Republic Of India, IN

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmakollam, kerala, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 23 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiKollam, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 12 days ago
    • Promoted
    Senior Site Reliability Engineer / Senior Cloud Engineer

    Senior Site Reliability Engineer / Senior Cloud Engineer

    CloudHirethiruvananthapuram, kerala, in
    The Technical Manager for Site Reliability Engineering (SRE) will lead a remote team of Site Reliability Engineers, ensuring operational excellence and fostering a high-performing team culture.Repo...Show moreLast updated: 2 days ago
    • Promoted
    Equifax - Site Reliability Engineer

    Equifax - Site Reliability Engineer

    EquifaxThiruvananthapuram
    Site Reliability Engineering (SRE) at Equifax SRE is a discipline that combines software and systems engineering for building and running large-scale, distrib...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    ConfidentialThiruvananthapuram, Thiruvananthapuram / Trivandrum, India
    The world's top banks use Zafin's integrated platform to drive transformative customer value.Powered by an innovative AI-powered architecture, Zafin's platform seamlessly unifies data from across t...Show moreLast updated: 6 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeThiruvananthapuram, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 15 days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Kollam, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Jade Globalthiruvananthapuram, kerala, in
    Senior Site Reliability Engineer (SRE) – Datadog Observability.SRE and Infrastructure Operations with minimum 3.Hyderabad preferable but open for Pune and remote. Site Reliability Engineer (SRE).SRE...Show moreLast updated: 2 days ago
    • Promoted
    Lead - Cloud Reliability Engineer

    Lead - Cloud Reliability Engineer

    Searce Inckollam, kerala, in
    The ‘process-first’ AI-native modern tech consultancy that's rewriting the rules.As an engineering-led consultancy, we are dedicated to relentlessly improving the real business outcomes.Our solvers...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    ConfidentialThiruvananthapuram / Trivandrum, India
    Site Reliability Engineering (SRE).Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems.SRE ensures that ...Show moreLast updated: 6 days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    ConfidentialThiruvananthapuram / Trivandrum
    As a Site Reliability Engineer (SRE) you will be responsible for improving the overall reliability of applications by ensuring its availability, performance, and scalability.Should be able to gathe...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CitNOW GroupThiruvananthapuram, IN
    Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably.CitNOW’s app-based plat...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaServiceThiruvananthapuram, IN
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show moreLast updated: 13 hours ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Grid Dynamicskollam, kerala, in
    Location-Bangalore / Chennai / Hyderabad.Core Skills (Some combination of : ).These might include (Tomcat, Apache, Springboot, SQS, JBoss, IBM MQ, IBM DataPower, Hazelcast, Flink, Connect Direct, SSL).Un...Show moreLast updated: 9 hours ago
    • Promoted
    Sr Engineer, Site Reliability [T500-21295]

    Sr Engineer, Site Reliability [T500-21295]

    TMUS Global Solutionsthiruvananthapuram, kerala, in
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    Cloud AWS Site Reliability Engineer (4-10 YEARS)

    Cloud AWS Site Reliability Engineer (4-10 YEARS)

    Accelyathiruvananthapuram, kerala, in
    Cloud Site Reliability Engineer (SRE).You will work closely with development, DevOps, and operations teams to ensure system uptime, performance, and cost efficiency. Design and maintain highly avail...Show moreLast updated: 9 hours ago
    • Promoted
    Equifax - Senior Site Reliability Engineer - IAC Terraform

    Equifax - Senior Site Reliability Engineer - IAC Terraform

    EquifaxThiruvananthapuram
    About the job Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distr...Show moreLast updated: 30+ days ago
    • Promoted
    Sr Engineer, Site Reliability T500-21295

    Sr Engineer, Site Reliability T500-21295

    TMUS Global SolutionsKollam, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 1 day ago