Impetus Description :
Key Responsibilities :
- Own the observability product roadmap with a focus on enabling visibility for data pipelines, distributed compute frameworks (e.g., Spark, Flink), and cloud-native workloads.
- Define and deliver features for metrics ingestion, distributed tracing, log processing pipelines, alerting, dashboards, and SLO / SLA tooling.
- Drive integration with cloud platforms (AWS, GCP, Azure), container orchestration systems (Kubernetes), and data infrastructure components (Kafka, Airflow, Snowflake, etc.
- Define APIs, data models, and storage strategies for telemetry data at scale.
- Collaborate with platform, SRE, and data engineering teams to understand pain points, gather requirements, and validate solutions.
- Contribute to the definition and tracking of service health indicators (SLIs / SLOs), incident response tooling, and automated root cause analysis.
- Stay current on emerging trends in observability (e.g., eBPF, AI / ML for anomaly detection, continuous profiling), cloud infrastructure, and big data ecosystems.
- Work with engineering to build scalable systems for telemetry collection, processing, retention, and visualization.
- Develop product specifications with clear technical detail for engineering execution.
Preferred Experience & Skills :
8+ years in technical product management, ideally with products related to observability, infrastructure, or data platforms.Hands-on experience with observability tools like OpenTelemetry, Prometheus, Grafana, Jaeger, ELK stack, Datadog, New Relic, or similar.Strong understanding of cloud-native architecture patterns, microservices, containers, and orchestration (especially Kubernetes).Experience with distributed systems and data platforms e.g., Apache Kafka, Apache Spark, Flink, Airflow, Presto, Snowflake, etc.Familiarity with infrastructure-as-code (e.g., Terraform, Helm) and CI / CD systems.Working knowledge of telemetry data storage and processing at scale (TSDBs, log indexing, event pipelines).Ability to read and communicate technical designs with engineers and stakeholders (e.g., API specs, sequence diagrams, data flows).Experience working with SREs, platform teams, or DevOps roles in production environments.Strong analytical skills; ability to define and monitor KPIs for performance, reliability, and user adoption.Nice to Have :
Background in data engineering or site reliability engineering (SRE).Experience with cost optimization and resource utilization tracking in cloud environments.Exposure to AI / ML-based anomaly detection and predictive analytics in observability.Experience contributing to or working with open-source observability communities.Roles & Responsibilities :
Key Responsibilities :
Own the observability product roadmap with a focus on enabling visibility for data pipelines, distributed compute frameworks (e.g., Spark, Flink), and cloud-native workloads.Define and deliver features for metrics ingestion, distributed tracing, log processing pipelines, alerting, dashboards, and SLO / SLA tooling.Drive integration with cloud platforms (AWS, GCP, Azure), container orchestration systems (Kubernetes), and data infrastructure components (Kafka, Airflow, Snowflake, etc.Define APIs, data models, and storage strategies for telemetry data at scale.Collaborate with platform, SRE, and data engineering teams to understand pain points, gather requirements, and validate solutions.Contribute to the definition and tracking of service health indicators (SLIs / SLOs), incident response tooling, and automated root cause analysis.Stay current on emerging trends in observability (e.g., eBPF, AI / ML for anomaly detection, continuous profiling), cloud infrastructure, and big data ecosystems.Work with engineering to build scalable systems for telemetry collection, processing, retention, and visualization.Develop product specifications with clear technical detail for engineering execution.NOTE : We are hiring for Indore and Bangalore.
(ref : iimjobs.com)