Senior Site Reliability Engineer (SRE)

Voya IndiaAmravati, IN

6 hours ago

Job description

About the position

We are seeking a strategic and technically adept leader to drive the scalability, resilience, and operational excellence of our enterprise systems. This role will set the vision for site reliability engineering (SRE) practices, observability frameworks, and performance optimization, ensuring our digital platforms are robust, measurable, and aligned to business priorities. You will collaborate across product, engineering, and infrastructure teams to deliver highly available, high-performing systems that meet the demands of a modern digital enterprise.

Responsibilities

Set strategy and lead delivery of scalable, resilient systems across cloud and on-premise environments.
Define and govern reliability standards (SLAs, SLOs, error budgets) and embed them into development practices.
Implement observability at scale (logs, metrics, traces) to drive real-time visibility and actionable insights.
Lead performance engineering initiatives including capacity planning, load testing, and tuning of critical applications.
Drive incident management practices — proactive detection, streamlined response, and a culture of learning through postmortems.
Champion automation in monitoring, alerting, CI / CD pipelines, and infrastructure provisioning.
Partner across functions (product, engineering, DevOps, security, architecture) to align reliability goals with business priorities.
Influence enterprise architecture decisions with a reliability-first perspective, including platform modernization efforts.
Mentor and develop engineers, fostering a culture of technical excellence, accountability, and continuous improvement.
Represent reliability in executive forums, providing clear insights into system health, risks, and roadmap implications.

Qualifications

10+ years of experience in systems engineering, site reliability engineering, or infrastructure architecture.

Expertise in distributed systems and cloud platforms (AWS, Azure, GCP).

Deep knowledge of observability tooling (Datadog, Prometheus, Grafana, OpenTelemetry, etc.).

Strong programming background (e.g., Java, Python, Node.js, or similar).

Proven leadership of cross-functional technical initiatives at scale.

Experience with CI / CD, infrastructure-as-code (Terraform, Ansible, etc.), and automation frameworks.

Strong communicator with the ability to translate technical reliability goals into business outcomes.

Create a job alert for this search

Senior Site Reliability Engineer • Amravati, IN