Job Description : Observability Engineer - Elasticsearch & Azure
We are seeking an experienced Observability Engineer with deep expertise in Elasticsearch and Azure enterprise environments to unlock critical insights across our cloud infrastructure and enable comprehensive monitoring capabilities for our observability practice.
Key Responsibilities :
- Enterprise Observability Strategy : Drive end-to-end observability implementation across Azure components including Azure Data Factory (ADF), Load Balancers, Landing Zones, and Virtual Machines, enabling the team to gain actionable insights into complex infrastructure behavior and performance patterns
- Elastic Cloud & APM Expertise : Lead Application Performance Monitoring (APM) deployment through Elastic Agents, establishing robust observability frameworks that capture application and infrastructure telemetry, with focus on agent policy management and configuration strategies for development and production environments
- Centralized Logging & Data Streams : Design and implement centralized log aggregation architecture using Elastic data streams, ensuring standardized data ingestion from distributed sources with proper tagging, naming conventions, and routing strategies to enable seamless search and analysis
- Ingest Pipeline Development : Build sophisticated ingest pipelines and processors to enrich observability data including runtime context, ECS field normalization, embedded payload parsing, sensitive data masking, noise reduction, and deduplication to maximize data quality and insights
- Infrastructure as Code for Observability : Leverage Terraform to codify and automate Elasticsearch configurations including APM setup, data stream management, ingest processors, and agent policies, ensuring reproducible and scalable observability infrastructure across environments
- Insights Enablement & Knowledge Transfer : Configure advanced observability features including APM log insights and custom dashboards, while mentoring the observability team on best practices for querying, analyzing, and deriving actionable intelligence from Elastic ecosystems