Company Description
ThreatXIntel is a startup cybersecurity company focused on delivering advanced and tailored solutions to protect businesses and organizations from cyber threats. Our expertise spans cloud security, web and mobile security testing, DevSecOps, and cloud security assessment. We are committed to providing affordable and custom solutions that cater specifically to the needs of businesses of all sizes, ensuring high-quality protection for digital assets. With a proactive approach to security, ThreatXIntel helps clients identify and address vulnerabilities before they can be exploited, allowing businesses to operate with confidence and peace of mind.
Role Description
We are looking for a skilled Freelance Observability & Monitoring Engineer to enhance our monitoring stack, improve system visibility, and automate alerting workflows across our cloud environments. The consultant will work with modern observability tools including Grafana, PromQL, GCP Metrics Explorer, Loki, Tempo , and integrate automated alerting through ServiceNow .
This role requires deep technical expertise in dashboarding, metrics, logs, tracing, and automated incident workflows.
Responsibilities
Build and optimize Grafana dashboards including advanced panel configuration, business KPIs visualization, and multi-data-source insights.
Write performance and reliability queries using PromQL , including metric aggregation, rate calculations, SLO / SLI definitions, and alert rule logic.
Configure monitoring and alerting in GCP Metrics Explorer , ensuring proper escalation rules and actionable alert policies.
Manage log collection and analysis using Loki , enabling structured logging, log enrichment, log-to-metric correlation, and debugging support.
Implement distributed tracing solutions using Tempo , integrating OpenTelemetry, and helping identify system bottlenecks.
Develop automated incident workflows with ServiceNow (SNOW) , including alert ingestion, ticket automation, routing, and workflow enhancements.
Troubleshoot production issues related to metrics, logs, traces, and alert pipelines.
Collaborate with engineering teams to standardize observability best practices across services.
Document dashboards, queries, alert logic, runbooks, and operational procedures.
Required Technical Skills
Grafana : dashboard creation, visualization design, panel configuration
PromQL : query building, aggregations, SLO / SLA calculations, alert conditions
GCP Metrics Explorer : monitoring setup, alert policies, escalation rules
Loki : structured logging, log pipelines, correlation, troubleshooting
Tempo : distributed tracing, OpenTelemetry usage, trace-based performance analysis
ServiceNow automation : incident creation, alert routing, workflow automation
Observability Engineer • Vapi, Gujarat, India