This job offer is not available in your country.

Site Reliability Engineering Manager

Landmark GroupBangalore Urban, Karnataka, India

3 days ago

Job description

About Styli

Established in 2019 by Landmark Group, Styli is the GCC e-commerce getaway for a trendy fashion. Scaling fast and offering over 40,000 styles for men, women and kids, sourced from across the globe, Styli aims to bring the latest in fashion to our customers!

Job Overview

We are hiring a seasoned Site Reliability Engineer with strong experience in building and operating scalable systems on Google Cloud Platform (GCP). You will be responsible for ensuring system availability, performance, and security in a complex microservices ecosystem, while collaborating cross-functionally to improve infrastructure reliability and developer velocity

Key Responsibilities

Design and maintain highly available, fault-tolerant systems on GCP using SRE best practices.
Implement SLIs / SLOs, monitor error budgets, and lead post-incident reviews with RCA documentation.
Automate infrastructure provisioning (Terraform / Deployment Manager) and CI / CD workflows.
Operate and optimize Kubernetes (GKE) clusters including autoscaling, resource tuning, and HPA policies.
Integrate observability across microservices using Prometheus, Grafana, Stackdriver, and OpenTelemetry.
Manage and fine-tune databases (MySQL / Postgres / BigQuery / Firestore) for performance and cost.
Improve API reliability and performance through Apigee (proxy tuning, quota / policy handling, caching).
Drive container best practices including image optimization, vulnerability scanning, and registry hygiene.
Participate in on-call rotations, capacity planning, and infrastructure cost reviews

Experience- 8+ Years

Must-Have Skills

Minimum 8 years of total experience, with at least 3 years in SRE, DevOps, or Platform Engineering roles.

Strong expertise in GCP services (GKE, IAM, Cloud Run, Cloud Functions, Pub / Sub, VPC, Monitoring).

Advanced Kubernetes knowledge : pod orchestration, secrets management, liveness / readiness probes.

Experience in writing automation tools / scripts in Python, Bash, or Golang.

Solid understanding of incident response frameworks and runbook development.

CI / CD expertise with GitHub Actions, Cloud Build, or similar tools

Good to Have

Apigee hands-on experience : API proxy lifecycle, policies, debugging, and analytics.

Database optimization : index tuning, slow query analysis, horizontal / vertical sharding.

Distributed monitoring and tracing : familiarity with Jaeger, Zipkin, or GCP Trace.

Service Mesh (Istio / Linkerd) and secure workload identity configurations.

Exposure to BCP / DR planning, infrastructure threat modeling, and compliance (ISO / SOC2)

Create a job alert for this search

Manager Engineering • Bangalore Urban, Karnataka, India