This job offer is not available in your country.

Site Reliability Engineer II - Guidewire Cloud Platform (Application)

ConfidentialBengaluru / Bangalore

12 days ago

Job description

Job Description

What Youll Do

Work with development teams to troubleshoot and resolve issues, minimizing customer impact.
Develop and maintain automated runbooks to manage issues proactively.
Apply engineering principles and automation to enhance our operating environments.
Monitor and improve the reliability and performance of applications on the Guidewire Cloud Platform.
Use your software engineering expertise to optimize systems and reduce manual toil.
Document incidents and develop processes to prevent future occurrences.
Stay current with industry trends, tools, and best practices in site reliability engineering.
Foster a culture of innovation, learning, and continuous improvement.
Participate in on-call rotations to ensure the availability and reliability of our services.

What Youll Bring

Experience as an SRE or similar role, with a focus on improving system reliability.

Strong problem-solving skills and the ability to analyze complex systems and devise effective solutions.

Effective collaboration and communication skills to work cross-functionally and document processes clearly.

Experience with automation, monitoring, and performance optimization tools and techniques.

Commitment to maximizing uptime, scalability, and delivering an exceptional end-user experience.

Passion for technology and a desire to continuously learn and grow your skills.

Alignment with Guidewires mission to leverage technology to help protect and support others.

Required Skills :

Experience with designing and implementing SLIs, SLOs, and Error Budgets

Familiarity with application performance monitoring (APM) and telemetry tools to maintain expected service levels for applications

Proficiency with Linux system administration and the ability to program / script using Python, Go, Java, shell, or equivalent

Experience troubleshooting and debugging distributed systems on cloud infrastructure

Experience with CICD pipelines within K8S and legacy ecosystems

Experience creating monitors, dashboards, and synthetic transactions in monitoring tools like Datadog

Experience deploying and managing scalable infrastructure within AWS and Kubernetes ecosystems using Terraform and other cloud-native approaches

Skills Required

Github, Saml, Postgresql, Python, Aws

Site Reliability Engineer • Bengaluru / Bangalore