Talent.com
This job offer is not available in your country.
Site Reliability Engineer II - Guidewire Cloud Platform (Application)

Site Reliability Engineer II - Guidewire Cloud Platform (Application)

ConfidentialBengaluru / Bangalore
12 days ago
Job description

Job Description

What Youll Do

  • Work with development teams to troubleshoot and resolve issues, minimizing customer impact.
  • Develop and maintain automated runbooks to manage issues proactively.
  • Apply engineering principles and automation to enhance our operating environments.
  • Monitor and improve the reliability and performance of applications on the Guidewire Cloud Platform.
  • Use your software engineering expertise to optimize systems and reduce manual toil.
  • Document incidents and develop processes to prevent future occurrences.
  • Stay current with industry trends, tools, and best practices in site reliability engineering.
  • Foster a culture of innovation, learning, and continuous improvement.
  • Participate in on-call rotations to ensure the availability and reliability of our services.

What Youll Bring

  • Experience as an SRE or similar role, with a focus on improving system reliability.
  • Strong problem-solving skills and the ability to analyze complex systems and devise effective solutions.
  • Effective collaboration and communication skills to work cross-functionally and document processes clearly.
  • Experience with automation, monitoring, and performance optimization tools and techniques.
  • Commitment to maximizing uptime, scalability, and delivering an exceptional end-user experience.
  • Passion for technology and a desire to continuously learn and grow your skills.
  • Alignment with Guidewires mission to leverage technology to help protect and support others.
  • Required Skills :

  • Experience with designing and implementing SLIs, SLOs, and Error Budgets
  • Familiarity with application performance monitoring (APM) and telemetry tools to maintain expected service levels for applications
  • Proficiency with Linux system administration and the ability to program / script using Python, Go, Java, shell, or equivalent
  • Experience troubleshooting and debugging distributed systems on cloud infrastructure
  • Experience with CICD pipelines within K8S and legacy ecosystems
  • Experience creating monitors, dashboards, and synthetic transactions in monitoring tools like Datadog
  • Experience deploying and managing scalable infrastructure within AWS and Kubernetes ecosystems using Terraform and other cloud-native approaches
  • Skills Required

    Github, Saml, Postgresql, Python, Aws

    Create a job alert for this search

    Site Reliability Engineer • Bengaluru / Bangalore