Job Description
What You ll Do
- Collaborate with engineering teams to provide feedback and contribute code where needed, enhancing product functionality and resilience.
- Participate in on-call rotations to ensure 24x7 availability of services.
- Design and develop tools to support 24x7 follow-the-sun operations for critical production systems.
- Automate deployment tasks for core products and infrastructure, maintaining a robust automation framework.
- Monitor and optimize the performance of applications on the Guidewire Cloud Platform, ensuring reliability and efficiency.
- Develop and maintain observability tools, metrics, and dashboards, including self-healing mechanisms for increased reliability.
- Foster a culture of reliability by promoting blameless postmortems, SLO tracking, and continuous learning from incidents.
- Proactively identify and address infrastructure issues to minimize business impact.
- Develop system documentation and training materials to empower and educate team members.
Who You Are
Skilled in programming with Python or Go for building internal tools, CLIs, and APIs; familiarity with Java and Spring Boot is a plus.Exceptional troubleshooting skills, with a proactive, critical approach to solving complex issues.Proficient in containerization technologies, with hands-on expertise in Docker, Helm, Kubernetes (EKS), CNI, and Ingress networking.Strong knowledge of Kubernetes concepts (pods, deployments, services, statefulsets, ingress etc.) and the Operator pattern.Experienced with Terraform, including developing and testing complex modules.Advanced experience with AWS, including custom tool development using AWS SDK.Solid understanding of Single Sign-On (SSO), SAML, and OAuth protocols; experience with Okta is a bonus.Skilled in using observability tools such as Prometheus, OpenTelemetry, or Datadog for proactive monitoring.Production-At-Scale support background in a heavily microservice-based world.Familiar with agile methodologies, including Scrum and Kanban, to enhance software development processes.Excellent communication skills, with the ability to explain complex technical concepts to diverse audiences.Other Requirements
Bachelor s Degree in Computer Science or a related field.Ability to read, write, and speak EnglishWe provide 24x7 support to our customers, so we expect you to take turns with your teammates being on-call for weekend production emergencies or to provide rotating weekend operational supportTravel - Expect occasional travel (less than 5%) to other Guidewire offices for training and team meetingsBonus Points
Kubernetes or AWS certificationsContributions to open source projectsFamiliar with Kubevela (OAM) or Crossplane for Kubernetes-native infrastructure managementExperience in managing large scale Aurora PostgreSQL clusters and Aurora ServerlessExperience with TeamCity CI or GitHub actionsSkills Required
Github, Postgresql, Aws, Saml, Python