Location : Pune / Hyderabad / Bangalore
Experience in years : 8-10 years
Primary skills - Observability Subject Matter Expert (SME)- DataDog SME
Please find the below JD :
We are seeking an Observability Subject Matter Expert (SME) to implementation, and maintenance of enterprise-grade observability solutions. The ideal candidate will have deep hands-on experience with Datadog and similar platforms, ensuring system reliability, performance, and proactive monitoring across complex environments.
- Demonstrable experience on managing complex APM project.
- Ability to think strategically, as well as tactically, and to exercise sound judgment in problem-solving and priority / goal setting.
- Knowledge of APM, infrastructure metrics, distributed tracing, and log aggregation using Datadog. - Configure and manage Datadog for infrastructure, application monitoring, APM, and log management.
- Implements dashboards, alerts, and monitors to provide comprehensive visibility into system performance, availability, and reliability.
- Integrate Datadog with cloud platforms (AWS, Azure, GCP) and on-premises infrastructure.
- Define and deploy monitoring strategies using Datadog’s metrics, traces, logs, and events.
- Set up automated alerts to notify of anomalies and potential issues.
- Work with teams to optimize monitoring strategies and reduce false positives.
- Identify and troubleshoot performance bottlenecks using Datadog APM and Real User Monitoring (RUM).
- Proactively monitor and improve the efficiency, performance, and reliability of services.
- Integrate observability solutions with CI / CD pipelines, automation & ITSM frameworks.
- Collaborate with DevOps, SRE, and infrastructure teams to embed observability into workflows.
- Conduct training to various customer stakeholders, prepare & maintain relevant training materials
- Automate monitoring tasks and deployments using Datadog API, scripts, and other automation tools.
- Knowledge on Event management, AIOps and observability configurations.
- Knowledge of ITIL foundations
- Strong communication skills and ability to interact with our business, product, and development teams, skills and ability to articulate business benefit of a technology solution.
- Good knowledge related to Infrastructure management and Operations.
- Very good experience in development of integration scripts / programs using SOAP / REST Webservices.
- Experience in monitoring and observability for microservices and Kubernetes-based architectures.
- Outstanding problem-solving skills. Fast learner & openness to try different tools, technologies & concepts.
- Self-motivated individual, able to work independently and in coordination with a team.
- Define observability standards, guidelines, and best practices for enterprise environments.
- Having Datadog certifications is an added advantage.