Position Overview
We are looking for a Datadog Observability Engineer with strong experience implementing Datadog at the application layer . The role focuses on instrumenting business applications, enabling distributed tracing, improving application performance visibility, correlating logs / metrics, and providing real-time insights into user journeys, errors, latency, and reliability.
The ideal candidate will collaborate closely with backend / frontend developers, QA, product, and DevOps , helping teams build high-quality observability into their services and customer experience.
Primary Responsibilities
- Implement and manage Datadog APM, Real User Monitoring (RUM), Distributed Tracing, Service Monitoring, and Application Logs across multiple applications.
- Instrument application code with Datadog libraries, OpenTelemetry, or native integrations to capture business KPIs and performance metrics.
- Configure synthetic tests, error tracking, and frontend performance dashboards to monitor user experience and critical paths.
- Create meaningful dashboards for :
- latency and throughput
- endpoint / API performance
- error rates and exceptions
- RUM user behavior and UX performance
- SLA / SLO trends at the application level
- Lead the creation of alerting strategies based on real application behavior , including anomaly detection, latency spikes, and error bursts.
- Correlate logs, metrics, and trace data to perform root-cause analysis of application failures and performance degradation .
- Work with development teams to :
- define observability requirements early in development
- integrate monitoring into CI / CD and test environments
- improve tagging, business context, and trace spans
- Conduct application performance reviews and identify opportunities for :
- response-time improvement
- database or API bottlenecks
- code-level optimizations
- Train developers and QA on how to use Datadog tools for debugging, troubleshooting, and performance testing .
- Recommend improvements to observability maturity and documentation.
Required Skills
Hands-on experience with :Datadog APMDatadog LogsRUM (Real User Monitoring)Service MapsDistributed TracingSynthetic MonitoringStrong application debugging and performance analysis experience, using trace / span data.Proficiency instrumenting apps in at least one modern programming language :Node.js, Java, Python, Go, Ruby, .NET, etc.Solid understanding of :HTTP APIsmicroservicesqueues / event-driven flowsfrontend performance basicsComfortable working with developers and QA to embed observability.Preferred Skills
Familiarity with OpenTelemetry and custom instrumentation practices.Experience with databases, caching, async messaging , and how to measure them via tracing.Ability to derive business KPIs from monitoring data (conversion impacts, latency cost, UX issues).Exposure to CI / CD integration and automated observability testing.