Site Reliability EngineerLandmark Group • Delhi, India

Site Reliability Engineer

Landmark Group • Delhi, India

6 days ago

Job description

What You’ll Do :

Ensure reliability and high availability of

Java and microservices-based applications

through proactive monitoring and automation.

Define and track

SLIs / SLOs

to maintain service performance and stability.

Troubleshoot and resolve

production issues , performing detailed

root cause analysis

to prevent recurrence.

Build and enhance observability using

Prometheus, Grafana, Loki, or New Relic .

Automate operational tasks —

deployments, scaling, rollbacks, diagnostics, and alerting .

Collaborate with engineering and DevOps teams to integrate reliability practices into the CI / CD pipeline.

Drive

AIOps initiatives

for intelligent alert correlation and predictive incident management.

Mentor teams on best practices in monitoring, performance optimization, and operational efficiency.

What We’re Looking For :

3–6 years

of experience in

Site Reliability Engineering, Application Operations, or DevOps .

Strong hands-on experience with

Java, Spring Boot , and

microservices architecture .

Proficiency in

monitoring tools

(Prometheus, Grafana, Loki, New Relic, or similar).

Experience with

Kubernetes ,

containers , and

cloud platforms

(AWS, Azure, or GCP).

Strong scripting skills in

Bash, Python, or Go

for automation and diagnostics.

Familiar with

incident management, RCA, and performance debugging .

Exposure to

AIOps tools

AI / LLM-based observability platforms

is a plus.

Excellent problem-solving and communication skills.

Create a job alert for this search

Site Reliability Engineer • Delhi, India