We are looking for a skilled Site Reliability Engineer (SRE) with a strong DevOps background and deep expertise in Google Cloud Platform (GCP).
The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of production systems while implementing modern DevOps practices.
Responsibilities :
- Design, build, and maintain scalable, reliable infrastructure on GCP.
- Develop and maintain CI / CD pipelines using tools like Jenkins, GitLab CI / CD, etc.
- Implement observability, including monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK, Dynatrace).
- Automate infrastructure using Infrastructure as Code (IaC) tools such as Terraform, Ansible.
- Manage containerized applications using Docker and orchestration tools like Kubernetes.
- Write scripts and automation in languages such as Python, Go, and ash.
- Collaborate with engineering teams to define SLOs, SLIs, and error budgets.
- Participate in incident response, root cause analysis, and system optimization.
Requirements :
Strong knowledge of Linux / Unix fundamentals.Proficient in at least one programming / scripting language (Python, Go, Bash, Java, or JavaScript).Experience with Version Control Systems (e.g., Git).Hands-on with CI / CD pipelines.Deep understanding of cloud environments, specifically GCP.Proficiency with IaC tools (Terraform, Ansible, Chef, Puppet).Knowledge of containerization & orchestration (Docker, Kubernetes).Familiar with monitoring / logging tools : Dynatrace, ELK, OpenSearch, Log Explorer, Prometheus, and Grafana(ref : hirist.tech)