What You’ll Do :
Java and microservices-based applications
through proactive monitoring and automation.
SLIs / SLOs
to maintain service performance and stability.
production issues , performing detailed
root cause analysis
to prevent recurrence.
Prometheus, Grafana, Loki, or New Relic .
deployments, scaling, rollbacks, diagnostics, and alerting .
AIOps initiatives
for intelligent alert correlation and predictive incident management.
What We’re Looking For :
3–6 years
of experience in
Site Reliability Engineering, Application Operations, or DevOps .
Java, Spring Boot , and
microservices architecture .
monitoring tools
(Prometheus, Grafana, Loki, New Relic, or similar).
Kubernetes ,
containers , and
cloud platforms
(AWS, Azure, or GCP).
Bash, Python, or Go
for automation and diagnostics.
incident management, RCA, and performance debugging .
AIOps tools
or
AI / LLM-based observability platforms
is a plus.
Site Reliability Engineer • Delhi, India