This job offer is not available in your country.

Principal / Chief Site Reliability Engineer - Observability Services

CollaberaBangalore

30+ days ago

Job description

Job Description :

As a Principal / Chief Site Reliability Engineer, you will play a critical role in designing, developing, and maintaining scalable and highly reliable systems.

Youll work closely with development teams to improve system reliability, monitor critical applications, and design fail-proof :

Design and implement scalable, highly available infrastructure and automation solutions.
Drive adoption of SRE principles, SLAs, SLOs, and error budgets across teams.
Proactively identify, debug, and resolve complex system reliability issues.
Build tooling for observability, alerting, and performance monitoring.
Collaborate with developers and architects on cloud-native design and service resilience.
Conduct failure analysis, system audits, and root cause investigations.
Contribute to strategic infrastructure decisions and reliability roadmaps.
Promote influential leadership through mentorship and technical direction across teams.
Work across multiple platforms and large-scale distributed systems.

Key Requirements :

Experience : 15+ years in technology, with at least 5+ years in Site Reliability Engineering.

Development Background : Strong hands-on experience in C / C++, Java, Go, or Python.

Proven experience as a hands-on Individual Contributor (not a managerial role).

Proficiency in scripting, system programming, and multi-platform architecture.

Deep knowledge of :

a. Linux / Unix OS fundamentals.

b. Networking (DNS, TCP / IP, etc.

c. Cloud platforms (preferably AWS).

d. Observability and Monitoring Tools.

e. CI / CD and Infrastructure as Code.

Strong exposure to SRE concepts : reliability, automation, on-call best practices, etc.

System design, performance tuning, and troubleshooting large-scale systems.

(ref : hirist.tech)

Create a job alert for this search

Site Reliability Engineer • Bangalore