Talent.com
This job offer is not available in your country.
Minfy Technologies - Head - Site Reliability Engineering

Minfy Technologies - Head - Site Reliability Engineering

Minfy Technologies Private LimitedBangalore, India
30+ days ago
Job description

Job Summary

We are seeking a strategic and technically proficient Head of Site Reliability Engineering (SRE) to lead the design, implementation, and scaling of our reliability, observability, and operational practices.

As the Head of SRE, you will play a critical role in ensuring our systems are highly available, scalable, and performant while maintaining a strong engineering culture of reliability and resilience.

Key - Lead and mentor a team of SREs responsible for production systems, ensuring operational excellence and system reliability.

  • Define and drive the vision, strategy, and execution of SRE initiatives aligned with company goals.
  • Own the uptime, latency, performance, and monitoring of all infrastructure and services.
  • Partner with development, QA, and product teams to embed reliability practices early in the software development lifecycle (shift-left).
  • Build and enforce SLAs, SLOs, and SLIs across all services, ensuring continuous improvement.
  • Lead incident management processes and postmortems with a focus on blameless culture and systemic improvement.
  • Design and maintain CI / CD pipelines and infrastructure as code (IaC) practices.
  • Identify and eliminate toil by promoting automation, self-healing systems, and tooling.
  • Drive capacity planning, cost optimization, and service scalability in cloud or hybrid environments.
  • Ensure compliance with security, privacy, and regulatory standards related to - 8+ years of experience in software engineering or infrastructure roles, with 4+ years in an SRE leadership or equivalent role.
  • Deep knowledge of cloud platforms (AWS, GCP, or Azure), container orchestration (Kubernetes), and modern DevOps practices.
  • Proficient in monitoring tools (e.g., Prometheus, Grafana, Datadog), incident response systems, and observability platforms.
  • Strong programming / scripting knowledge (e.g., Python, Go, Bash, or similar).
  • Demonstrated success building high-availability systems at scale.
  • Exceptional leadership, communication, and stakeholder management skills.

Preferred - Experience managing SRE teams across multiple time zones.

  • Certifications in cloud architecture or site reliability engineering (e.g., Google SRE, AWS DevOps).
  • Exposure to zero-trust security, FinOps, or regulated environments (e.g., healthcare, finance).
  • (ref : iimjobs.com)

    Create a job alert for this search

    Site Reliability • Bangalore, India