We're Hiring : Senior Site Reliability Engineer (SRE) – Backend Systems (Remote-Friendly)
Are you the go-to person when things break in production? Do you love solving deep infrastructure issues, debugging Kafka lag, and reviewing backend code that keeps services resilient and fast?
We’re looking for a Senior Site Reliability Engineer (SRE) to join our backend team and help scale our real-time, event-driven platform. This isn’t a traditional DevOps or infrastructure-only role — this is for engineers who can debug complex systems, write high-quality code, and design for reliability at scale.
Location : India – Remote-Friendly
Experience : 8+ years in SRE, backend, or platform engineering roles
Interview Process Includes :
High-level system design discussions
A coding round focused on problem solving & debugging skills
What You’ll Do
- Investigate and resolve reliability issues like Kafka lag, queue bottlenecks, timeouts, memory spikes, and more
- Debug and review production code (Python, Go, Node.js, etc.) for performance and reliability
- Design scalable, distributed backend systems that are fault-tolerant and observable
- Build tools and automation to detect, fix, and prevent incidents
- Own the monitoring, alerting, and SLOs for critical systems
- Collaborate closely with backend developers and infrastructure teams
What We’re Looking For
Strong debugging skills : Kafka lag, distributed system failures, log tracing, profilingProven experience with observability, monitoring, and alerting tools (Prometheus, Grafana, ELK, etc.)Deep understanding of message brokers and data pipelines : Kafka, RabbitMQ, RedisStrong backend coding ability in any modern language (Python, Go, Rust, Node.js, etc.)Familiar with production-grade system design patterns : retries, backpressure, eventual consistencyExperience with microservices, distributed systems, and containerized deploymentsNice to Have
Exposure to streaming platforms (Apache Pulsar, Flink)Familiarity with Agentic Architecture, LLMFamiliarity with DevSecOps practices and GitOps workflowsKnowledge of resilience engineering, chaos testing, or load testingExperience working in agile, product-centric teamsWhy Join Us?
You’ll be a core part of building resilient, high-scale systems from day oneModern architecture with no legacy baggageRemote flexibility and a team that values deep technical workDirect impact on platform reliability, uptime, and performanceInterested? Let’s Talk.
Send your resume to [email protected] or apply directly here on LinkedIn.