We're looking for an experienced Observability Specialist to drive the success of our Grafana platform.
This is a full-time remote role that requires a strong background in setting up and maintaining open-source tools.
- Expertise in platform ownership and maintenance
- Experience in setting up platforms using open-source tools
- Hands-on experience with customer onboarding processes
- Knowledge of open telemetry and collector technologies
- Proficiency in using tools like Loki, Tempo, and Mimir for log, trace, and metric analysis
- Strong skills in dashboard management, creation, and customization
- Familiarity with connecting platforms to various data sources like Prometheus, Elasticsearch, PostgreSQL, CloudWatch, etc.
- Ability to transform data into meaningful visualizations
- Skills in configuring alert rules and notification channels
- Experience in managing alert thresholds and reducing alert fatigue
- Knowledge of setting up alert managers and dashboards for incident detection
- Understanding of recovery plans and post-incident analysis
- Familiarity with Python, Azure, Terraform, Docker, Kubernetes, and GitHub