We are seeking a Site Reliability Engineer with expertise in Google Cloud Platform to join our multi-functional SRE team.
You will focus on enhancing operational automation and monitoring to improve efficiency within our cloud environments. Your role involves identifying repetitive tasks and implementing automated solutions to minimize manual effort. If you have a strong background in cloud engineering and are driven to optimize system reliability, we invite you to apply and contribute to our team.
Responsibilities
- Act as subject matter expert for operation automation and monitoring on Google Cloud Platform
- Identify toil in existing systems and processes and recommend solutions to reduce manual tasks
- Design and implement automated workflows to improve team efficiency
- Define and create customer user journeys, service level objectives, service level indicators, and error budgeting based on non-functional requirements
- Develop and maintain infrastructure as code using Terraform and GitHub
- Write and maintain scripts using Bash, PowerShell, Python, and Ansible to support automation
- Manage containerized environments using Kubernetes
- Collaborate with team members to reduce toil in software development life cycle and IT operations
- Utilize source control management tools including Git, GitHub, and SonarQube
- Apply understanding of IT service management processes to support operational excellence
- Monitor and analyze system performance metrics using Prometheus and Grafana
- Provide proactive and analytical insights to improve system reliability
Requirements
Experience of 5 to 10 years in site reliability engineering or related cloud engineering rolesStrong knowledge of Google Cloud Platform and cloud engineering practicesExpertise in defining and implementing customer user journeys, service level objectives, service level indicators, and error budgetingProficiency in infrastructure as code tools such as Terraform and GitHubSkills in scripting languages, including Bash, PowerShell, Python, and AnsibleCompetency in container orchestration with KubernetesExperience designing and implementing automated workflows to reduce manual effortFamiliarity with source control management tools like Git, GitHub, and SonarQubeUnderstanding of IT service management processesAnalytical and proactive mindset to identify and solve operational challengesSkills Required
Github, Google Cloud Platform, Powershell, Prometheus, Bash, Grafana, Git, Terraform, Ansible, Sonarqube, Python, Kubernetes