Talent.com
This job offer is not available in your country.
Principal / Chief Site Reliability Engineer - Observability Services

Principal / Chief Site Reliability Engineer - Observability Services

CollaberaBangalore
30+ days ago
Job description

Job Description :

As a Principal / Chief Site Reliability Engineer, you will play a critical role in designing, developing, and maintaining scalable and highly reliable systems.

Youll work closely with development teams to improve system reliability, monitor critical applications, and design fail-proof :

  • Design and implement scalable, highly available infrastructure and automation solutions.
  • Drive adoption of SRE principles, SLAs, SLOs, and error budgets across teams.
  • Proactively identify, debug, and resolve complex system reliability issues.
  • Build tooling for observability, alerting, and performance monitoring.
  • Collaborate with developers and architects on cloud-native design and service resilience.
  • Conduct failure analysis, system audits, and root cause investigations.
  • Contribute to strategic infrastructure decisions and reliability roadmaps.
  • Promote influential leadership through mentorship and technical direction across teams.
  • Work across multiple platforms and large-scale distributed systems.

Key Requirements :

  • Experience : 15+ years in technology, with at least 5+ years in Site Reliability Engineering.
  • Development Background : Strong hands-on experience in C / C++, Java, Go, or Python.
  • Proven experience as a hands-on Individual Contributor (not a managerial role).
  • Proficiency in scripting, system programming, and multi-platform architecture.
  • Deep knowledge of :
  • a. Linux / Unix OS fundamentals.

    b. Networking (DNS, TCP / IP, etc.

    c. Cloud platforms (preferably AWS).

    d. Observability and Monitoring Tools.

    e. CI / CD and Infrastructure as Code.

  • Strong exposure to SRE concepts : reliability, automation, on-call best practices, etc.
  • System design, performance tuning, and troubleshooting large-scale systems.
  • (ref : hirist.tech)

    Create a job alert for this search

    Site Reliability Engineer • Bangalore