Key Responsibilities :
- Design, develop, and maintain monitoring and alerting systems using Grafana, Prometheus, and AWS CloudWatch.
- Create and optimize dashboards to provide actionable insights into system and application performance.
- Collaborate with development and operations teams to ensure high availability and reliability
of services.
Proactively identify performance bottlenecks and drive improvements.Continuously explore and adopt new monitoring / observability tools and best practices.Required Skills & Qualifications :
Minimum 2 years of experience in SRE, DevOps, or related roles.Hands-on expertise in Grafana, Prometheus, and AWS CloudWatch.Proven experience in dashboard creation, visualization, and alerting setup.Strong understanding of system monitoring, logging, and metrics collection.Excellent problem-solving and troubleshooting skills.Quick learner with a proactive attitude and adaptability to new technologies.Good to Have (Optional) :
Experience with AWS services beyond CloudWatch.Familiarity with containerization (Docker, Kubernetes) and CI / CD pipelines.Scripting knowledge (Python, Bash, or similar).(ref : hirist.tech)