This role is for one of Weekday’s clients
Min Experience : 5 years
Location : Bengaluru, Karnataka
JobType : full-time
Requirements
We are seeking an experienced Application Performance Monitoring (APM) Specialist with strong expertise in OpenTelemetry to design, implement, and optimize observability solutions across our technology stack. The ideal candidate will have deep knowledge of distributed tracing, metrics, logging, and instrumentation, and will work closely with engineering, DevOps, and SRE teams to ensure system reliability, performance, and visibility.
Key Responsibilities
Design & Implementation
- Define and implement observability standards leveraging OpenTelemetry for metrics, traces, and logs.
- Instrument services, APIs, and applications for performance monitoring and distributed tracing.
- Integrate OpenTelemetry with APM platforms (e.g., Datadog, New Relic, Dynatrace, Grafana, Prometheus, Elastic, etc.).
Monitoring & Optimization
Build and maintain dashboards, alerts, and reporting systems to track system health and performance.Conduct performance analysis, identify bottlenecks, and provide optimization recommendations.Establish SLOs, SLIs, and SLAs for key business applications.Collaboration & Enablement
Partner with development and platform teams to ensure instrumentation best practices.Train teams on OpenTelemetry usage, observability frameworks, and monitoring standards.Provide root-cause analysis and incident support using telemetry data.Continuous Improvement
Evaluate and implement emerging observability tools and practices.Automate observability pipelines for scalability and reliability.Drive adoption of OpenTelemetry standards across the enterprise.Required Skills & Experience
Strong hands-on experience with OpenTelemetry (OTel) SDKs, collectors, exporters.Proven expertise in APM tools such as Datadog, New Relic, Dynatrace, AppDynamics, or equivalent.Solid understanding of distributed systems, microservices, and cloud-native architectures (Kubernetes, Docker, Service Mesh).Proficiency in monitoring backends (Prometheus, Grafana, Jaeger, Tempo, Zipkin, Elastic APM).Strong programming / scripting experience (Java, .NET, Go, Python, or Node.js preferred).Knowledge of cloud platforms (AWS, Azure, GCP) and observability in hybrid / multi-cloud environments.Experience with CI / CD pipelines and automation frameworks for monitoring deployments.Excellent problem-solving, communication, and cross-team collaboration skills.Preferred Qualifications
Experience with SRE / DevOps practices (incident management, capacity planning, resilience testing).Familiarity with service mesh observability (Istio, Linkerd, Envoy).Contributions to OpenTelemetry or other open-source observability projects.Certification in cloud or observability platforms (AWS CloudWatch, GCP Operations Suite, etc.).