Talent.com
Site Reliability Engineer (SRE) – Infrastructure & Automation

Site Reliability Engineer (SRE) – Infrastructure & Automation

InstaServiceNashik, Maharashtra, India
14 hours ago
Job description

About InstaService InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding nationwide — backed by strong traction, rapid adoption, and a mission to simplify how people get work done at home.

We’re looking for a Senior Site Reliability Engineer (SRE) to join our core engineering team and scale our infrastructure to serve millions of users reliably.

What You’ll Do Lead incident response , conduct root cause analysis , and ensure permanent preventive measures.

Design and optimize CI / CD pipelines , automate deployments, and enforce release stability.

Build and manage scalable infrastructure on AWS, GCP, or Azure using Terraform , Ansible , and Kubernetes .

Continuously monitor system health with Prometheus , Grafana , ELK , and CloudWatch .

Conduct load and performance testing (k6, JMeter, Locust) and optimize systems for high-traffic events.

Improve observability , reduce alert noise, and enhance signal clarity for faster debugging.

Collaborate with developers and architects to ensure systems meet SLOs, SLIs, and SLAs .

Develop automation scripts and tools in Python, Go, Node.js, or Shell to streamline operations.

Manage distributed systems and message queues like Kafka or RabbitMQ .

Drive a culture of reliability, automation, and scalability across teams.

What We’re Looking For 4–7 years of experience in SRE or DevOps roles (preferably in high-scale or e-commerce environments).

Strong hands-on experience with Kubernetes , Docker , Terraform , Ansible , and CI / CD pipelines .

Deep understanding of Linux systems , networking , and distributed architecture .

Solid programming skills in Python , Go , or Node.js .

Experience managing cloud platforms (AWS, GCP, or Azure).

Proven track record of maintaining production uptime and optimizing system performance .

Nice to Have Experience with observability stacks , distributed tracing , and incident automation .

Familiarity with microservices and event-driven systems .

Exposure to cost optimization and capacity planning in multi-cloud environments.

Why Join InstaService? Fast-growing startup reshaping a massive industry

Work on high-scale systems and impactful technology

Collaborative and innovation-driven team

Competitive compensation and growth opportunities

Create a job alert for this search

Site Reliability Engineer • Nashik, Maharashtra, India