Talent.com
This job offer is not available in your country.
Lead Platform Engineer - Site Reliability

Lead Platform Engineer - Site Reliability

NeemtreeBangalore
30+ days ago
Job description

Responsibilities :

  • Solution Packaging : Lead the end-to-end development of observability packages for 100+ standard technologies across infrastructure, databases, middleware, and application platforms
  • Data Collection Strategy : Define and implement data collection strategies including agent instrumentation, API integrations, log and metrics collection pipelines, and auto-discovery mechanisms.
  • Golden Signals and Data Modeling : Define golden signals, KPIs, SLIs / SLOs, and data schemas for different component types to support health monitoring, performance optimization, and anomaly detection.
  • Dashboards, Alerts, Reports : Design and standardize visualizations, alerting rules, reporting templates, and RCA workflows for fast detection and resolution of issues.
  • Platform Enablement : Guide enhancements to agents, collectors, and platform components to support new integrations and data formats.
  • Team Leadership : Lead a team of engineers and specialists focused on observability solutions development. Establish best practices, design standards, and agile delivery pipelines.
  • Collaboration and Stakeholder Management : Work closely with product management, DevOps, SRE, and customer success teams to align on priorities, gather requirements, and validate delivered packages.
  • Quality, Scale, and Reusability : Ensure all developed solutions are scalable, reusable, and version-controlled with automated testing and documentation

Requirements :

  • Minimum 6+ years of experience in observability, monitoring, SRE, or platform engineering roles.
  • Strong hands-on experience with observability tools such as Prometheus, Grafana, OpenTelemetry, ELK / EFK, Datadog, Splunk, or similar.
  • In-depth understanding of logs, metrics, traces, profiling, events, and the corresponding instrumentation / collection mechanisms.
  • Proven experience in developing observability solutions for platforms like Kubernetes, databases (Oracle, PostgreSQL), middleware (Tomcat, WebLogic), and distributed systems.
  • Experience with scripting, APIs, and automation frameworks (Python, Shell, Terraform, etc. ).
  • Familiarity with RCA techniques, anomaly detection, and alert fatigue reduction strategies.
  • Ability to define and enforce design patterns, standards, and governance models.
  • Strong leadership, project management, and cross-functional collaboration skills.
  • Excellent verbal and written communication skills.
  • Good to Have Skills :

  • Experience building or managing a packaged observability marketplace or platform.
  • Contributions to open-source observability projects.
  • Certifications in Kubernetes, Observability tools, or cloud platforms (AWS, Azure, GCP).
  • Background in ITSM, CMDBs, or workflow automation is a plus.
  • (ref : hirist.tech)

    Create a job alert for this search

    Site Reliability Engineer • Bangalore

    Related jobs
    • Promoted
    Site Reliability Engineer - Cloud Platforms

    Site Reliability Engineer - Cloud Platforms

    LanceSoft, IncBangalore
    Role and Responsibilities : Reporting to Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Soluti...Show moreLast updated: 18 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ViewSonicBengaluru, Karnataka, India
    Bachelor's degree in Computer Science, Engineering, or a related field.Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory. Basic understanding of AWS solutions in...Show moreLast updated: 16 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Synechronbangalore, karnataka, in
    We have immediate opportunity for.SRE (Senior Site Reliability Engineer) 5 to 9 years.SRE (Senior Site Reliability Engineer). We began life in 2001 as a small, self-funded team of technology special...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.hosur, tamil nadu, in
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineering Lead - Cloud Platform

    Site Reliability Engineering Lead - Cloud Platform

    Leap India Stack FoundationBangalore
    About the job : Position Purpose : At Brambles there is a need to make sure that platforms built on cloud hypervisors run smo...Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Core Minds Tech SOlutionsHosur
    Job Description : - Engage with our product teams to understand requirements, design, and implement resilient and scalable infrastructure solutions&l...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Platform Engineer

    Senior Platform Engineer

    Cummins Indiahosur, tamil nadu, in
    Responsible for defining and communicating a shared technical and architectural vision for Product Teams to help ensure the system or Solution under development is fit for its intended purpose.Embr...Show moreLast updated: 7 days ago
    • Promoted
    System Engineer

    System Engineer

    Netsmore Technologieshosur, tamil nadu, in
    Systems Engineer – Level 3 (Internal).Mandatory skills : AWS cloud infrastructure + OKTA administration.The L3 Systems Engineer role is more engineering-focused than traditional system admin roles.I...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    XebiaBengaluru, IN
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TavantBengaluru, Karnataka, India
    With 25+ years of experience building innovative digital products and solutions, Tavant provides impactful results to its customers. It has been the frontrunner in driving digital innovation and tec...Show moreLast updated: 25 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    WSO2Bengaluru, Karnataka, India
    Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    WhiteLotus Talent PartnersBengaluru, Karnataka, India
    L0 and L1 Site Reliability Engineer (SRE) Support.Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by. In this role, you will focu...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Uplershosur, tamil nadu, in
    Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 23 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Amicon Hub ServicesBengaluru, Karnataka, India
    Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 6 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConcordBangalore, IN
    Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer - Chaos Management

    Site Reliability Engineer - Chaos Management

    Xebiahosur, tamil nadu, in
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 7 days ago
    • Promoted
    Lead Sustenance Engineer - Storage

    Lead Sustenance Engineer - Storage

    DDNhosur, tamil nadu, in
    This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a globa...Show moreLast updated: 6 days ago
    • Promoted
    Lead Engineer - Solana Blockchain

    Lead Engineer - Solana Blockchain

    Mindfire Digital LLPhosur, tamil nadu, in
    We are looking for a Lead Engineer with 3+ years of hands-on experience in Solana blockchain development.The role involves designing, building, and optimising high-performance dApps, smart contract...Show moreLast updated: 23 days ago
    • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Rakuten IndiaBengaluru, Karnataka, India
    Design, develop SLA, SLO, SLI of services within the Business Unit.Involve in whole process of Development, Production System Operation including system maintenance, monitoring, automation, backend...Show moreLast updated: 7 days ago
    • Promoted
    Lead Site Reliability Engineer [T500-20012]

    Lead Site Reliability Engineer [T500-20012]

    Delta Air LinesBengaluru, Karnataka, India
    Delta Air Lines (NYSE : DAL) is the U.Powered by our employees around the world, Delta has for a decade led the airline industry in operational excellence while maintaining our reputation for award-...Show moreLast updated: 26 days ago