Site Reliability Engineer III

ConfidentialHyderabad / Secunderabad, Telangana, India

30+ days ago

Job description

Job Description

As a Site Reliability Engineer III at JPMorgan Chase within the Chief Technology Office, you will collaborate with engineering, support, and operations teams to maintain and improve the reliability of mission-critical applications. You'll participate in incident management, troubleshooting, and continuous improvement, and help implement automation and monitoring solutions. On-call rotation is part of the role, requiring effective action during production incidents and a commitment to operational excellence. You'll share knowledge, follow best practices, and contribute to a culture of learning and innovation. We value team players who communicate clearly, solve problems proactively, and focus on customer needs.

Job Responsibilities

Design, develop, and operate solutions for application reliability, monitoring, and automation.
Execute incident response, troubleshooting, and root cause analysis to resolve production issues and improve system stability.
Build and maintain CI / CD pipelines using Jenkins (including global libraries), and implement infrastructure as code with Terraform.
Develop and support containerized applications using Docker and Kubernetes, ensuring robust deployments and scalability.
Implement and maintain observability solutions using tools such as Grafana, Prometheus, Splunk, and OpenTelemetry.
Collaborate with engineering and support teams to drive continuous improvement and operational excellence.
Participate in on-call rotation, responding to production incidents and ensuring timely resolution.

Required Qualifications, Capabilities, And Skills

Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experience

Experience in SRE, DevOps, or application support roles, with knowledge of SLIs / SLOs, incident response, and troubleshooting.

Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Splunk, OpenTelemetry).

Hands-on experience with CI / CD pipelines (Jenkins, including global libraries), infrastructure as code (Terraform), version control (Git), containerization (Docker), and orchestration (Kubernetes).

Exposure to cloud platforms (AWS, GCP, or Azure) and automating infrastructure and deployments.

Willingness to participate in on-call rotation and respond to production incidents.

Ability to break down issues, document solutions, and communicate effectively with team members and customers.

Preferred Qualifications, Capabilities, And Skills

Familiar in banking, fintech, or regulated environments.

Participation in game days or chaos engineering.

Interest in sharing knowledge and best practices with peers.

ABOUT US

Skills Required

Orchestration, Version Control, Prometheus, Grafana, Incident Response, Jenkins, Git, Gcp, Docker, Terraform, containerization , Troubleshooting, Splunk, Azure, Kubernetes, Aws

Create a job alert for this search

Site Reliability Engineer • Hyderabad / Secunderabad, Telangana, India