Role : Lead Platform Engineer
Experience : 7 -10 Years
Education : B.Tech / Masters
Location : Bengaluru (Work from Office)
We are seeking a Lead for Observability Solutions Platform to spearhead the development of packaged observability solutions for a broad range of standard infrastructure and application components (e.g., Oracle DB, Tomcat, Kubernetes, Apache Solar, etc.). This role is responsible for defining and driving the strategy, architecture, and delivery of end-to-end observability packages including data collection, golden signals, dashboards, alerts, reports, and RCA workflows. You will work closely with product managers, engineers, SREs, and customers to build scalable, maintainable, and impactful solutions.
Roles & Responsibilities :
- Solution Packaging : Lead the end-to-end development of observability packages for 100+ standard technologies across infrastructure, databases, middleware, and application platforms
- Data Collection Strategy : Define and implement data collection strategies including agent instrumentation, API integrations, log and metrics collection pipelines, and auto-discovery mechanisms.
- Golden Signals & Data Modeling : Define golden signals, KPIs, SLIs / SLOs, and data schemas for different component types to support health monitoring, performance optimization, and
anomaly detection.
Dashboards, Alerts, Reports : Design and standardize visualizations, alerting rules, reporting templates, and RCA workflows for fast detection and resolution of issues.Platform Enablement : Guide enhancements to agents, collectors, and platform components to support new integrations and data formats.Team Leadership : Lead a team of engineers and specialists focused on observability solutions development. Establish best practices, design standards, and agile delivery pipelines.Collaboration & Stakeholder Management : Work closely with product management, DevOps, SRE, and customer success teams to align on priorities, gather requirements, and validate delivered packages.Quality, Scale & Reusability : Ensure all developed solutions are scalable, reusable, and version-controlled, with automated testing and Skills :Minimum 6+ years of experience in observability, monitoring, SRE, or platform engineering rolesStrong hands-on experience with observability tools such as Prometheus, Grafana, OpenTelemetry, ELK / EFK, Datadog, Splunk, or similar.In-depth understanding of logs, metrics, traces, profiling, events, and the corresponding instrumentation / collection mechanisms.Proven experience in developing observability solutions for platforms like Kubernetes, databases (Oracle, PostgreSQL), middleware (Tomcat, WebLogic), and distributed systems.Experience with scripting, APIs, and automation frameworks (Python, Shell, Terraform, etc.).Familiarity with RCA techniques, anomaly detection, and alert fatigue reduction strategies.Ability to define and enforce design patterns, standards, and governance models.Strong leadership, project management, and cross-functional collaboration skills.Excellent verbal and written communication skills.Good to Have Skills :
Experience building or managing a packaged observability marketplace or platform.Contributions to open-source observability projects.Certifications in Kubernetes, Observability tools, or cloud platforms (AWS, Azure, GCP).Background in ITSM, CMDBs, or workflow automation is a plus.(ref : hirist.tech)