Senior Site Reliability Engineer- ELK Expert

iVedha Inc.Navi Mumbai, Maharashtra, India

8 days ago

Job description

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice

Location : India (Remote) -

Must be available to work in the EST (US / Canada) Time Zone.

Role Summary :

Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?

We're looking for an SRE with

7+ years of experience , including

4+ years specializing in the ELK stack (Elasticsearch, Logstash, Kibana) , to join our

Platform Engineering Practice . In this role, you’ll design, manage, and scale ELK clusters ingesting

2–3+ TB / day , enhance reliability across distributed systems, and drive automation within Azure cloud environments. This is a high-impact engineering opportunity focused on performance, observability, and operational excellence at scale.

Why Join Us

Career Growth :

Work alongside industry experts on cutting-edge cloud technologies

Competitive Compensation and Benefits :

We recognize and reward top talent

Exciting, Impactful Work :

Design and build scalable, resilient cloud environments

Strategic Platform Role :

Contribute to the foundation of next-gen observability and reliability infrastructure

What You Will Do

Design and Optimize Cloud Infrastructure :

Architect scalable, fault-tolerant systems on Microsoft Azure

Automate Everything :

Use Terraform, Ansible, and GitHub Actions to streamline deployment and configuration

Ensure Reliability and Performance :

Proactively monitor, troubleshoot, and resolve production issues using Prometheus, Grafana, and Azure Monitor

Enhance Security and Compliance :

Implement security best practices across DevOps workflows

Collaborate and Innovate :

Work closely with engineering, security, and operations teams to drive automation and efficiency

Manage and scale large ELK clusters

handling

2–3+ TB / day

log volumes, ensuring high availability and performance

Optimize ELK architecture :

Implement efficient index lifecycle policies, shard strategies, and hot-warm-cold tiered storage

Build and tune log pipelines :

Scale Logstash and Beats pipelines across distributed environments

Support Kibana observability layers :

Create dashboards, visualizations, and custom alerting frameworks (e.g., Watcher, ElastAlert)

What You Bring

7+ years of experience

in Site Reliability Engineering, DevOps, or Cloud Engineering

4+ years of dedicated, hands-on experience with ELK (Elasticsearch, Logstash, Kibana)

Strong experience managing

large-scale ELK clusters in production

with heavy ingestion (multi-TB / day)

Deep knowledge of

index tuning, shard allocation, ILM policies , and scaling ELK components

Expertise in GitHub Actions, Terraform, Ansible, and Infrastructure as Code (IaC)

Proficiency in

Python, Go, or Bash

for automation and scripting

Deep understanding of

Kubernetes, Docker , and cloud-native architectures

Experience with

observability tools

such as Prometheus, Grafana, Azure Monitor

Ability to work in a fast-paced, collaborative environment and solve complex operational issues

Education

Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field

Certifications (Nice to Have)

Microsoft Azure certifications :

AZ-104 ,

AZ-400

Create a job alert for this search

Senior Site Reliability Engineer • Navi Mumbai, Maharashtra, India