Responsibilities
- Assist in the design, implementation, and maintenance of observability solutions for Azure-based applications.
- Monitor system health, performance, and availability using Azure Monitor, Application Insights, and Log Analytics.
- Implementation experience on azure alerts and azure log analytics workspace.
- Candidate should be able to learn other technologies like solarwinds, elastic, AWS cloudwatch..etc
- Support SRE practices by automating infrastructure tasks, incident response, and root cause analysis.
- Develop and maintain dashboards, alerts, and reports to provide insights into system performance.
- Troubleshoot and resolve issues related to Azure infrastructure and application performance.
- Collaborate with DevOps, development, and operations teams to improve system reliability and efficiency.
- Implement and manage logging, tracing, and metrics collection for microservices-based architectures.
- Assist in developing runbooks, playbooks, and documentation for incident management and resolution.
Participate in on-call rotations and proactively address potential system failures
Qualifications
Bachelor's degree in Computer Science, IT, or a related field (or equivalent experience).Should have minimum 7+ years of experience.Basic knowledge of Azure services, including Virtual Machines, Kubernetes (AKS), Storage, and Networking.Familiarity with observability tools like Azure Monitor, Application Insights, Grafana, or Prometheus.Understanding of logging and tracing concepts using tools like Log Analytics, Elastic Stack, or OpenTelemetry.Exposure to scripting and automation using PowerShell, Python, or Terraform.Knowledge of CI / CD pipelines and Infrastructure as Code (IaC) principles.Strong problem-solving skills and ability to work in a fast-paced environment.Good communication and collaboration skills.Preferred Qualifications
Hands-on experience with Azure DevOps or GitHub Actions.Basic understanding of SRE principles, error budgets, and SLIs / SLOs.Exposure to containerization technologies like Docker and Kubernetes.Experience with ITIL practices and incident management processes.Certification in Azure Fundamentals (AZ-900) or Azure Administrator (AZ-104) is a plus.Skills Required
Azure Devops, Github, Docker, Incident Management, Itil, Kubernetes, Azure Administration