A deep understanding of Observability Dynatrace preferably (or other tools if they are well versed).Provisioning and setup metric in any observability tool Dynatrace, Prometheus, Thanos, or Grafana, alerts and silencesDevelopment work (not just support and running scripts but actual development) done on :Chef (basic syntax, recipes, cookbooks) or Ansible (basic syntax, tasks, playbooks) orTerraform basic syntax and GitLab CI / CD configuration, pipelines, jobsProficiency in scripting Python, PowerShell, Bash etc. This becomes the enabler for automation.Proposes ideas and solutions within the Infrastructure Department to reduce the workload through automation.Cloud resources provisioning and configuration through CLI / API specially Azure and GCP. AWS experience is also ok Troubleshooting SRE approach, SRE mindset.Provides emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when neededImproves documentation all around, either in application documentation, or in runbooks, explaining the why, not stopping with the what.Root cause analysis and corrective actionsStrong Concepts around Scale & Redundancy for design, troubleshooting, implementationMid Term
- Kubernetes basic understanding, CLI, service re-provisioning Operating system (Linux) configuration, package management, startup and troubleshooting System Architecture & Design - Plan, design and execute solutions to reach specific goals agreed within the team.
Long Term
- Block and object storage configuration Networking VPCs, proxies and CDNs
Skills Required
Azure, Gcp, Aws, Dynatrace, Prometheus