About the Role
We are looking for a passionate and detail-oriented
Site Reliability Engineer (SRE)
to join our engineering team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services. You’ll work closely with development and QA teams to build, maintain, and scale production systems while implementing best practices for monitoring, automation, and incident management.
This position is ideal for engineers who thrive in complex distributed environments, are strong in
Databases ,
Kubernetes , and enjoy improving system reliability through automation and modern tooling.
Key Responsibilities
Infrastructure Reliability & Performance
Maintain, monitor, and improve uptime and performance of production systems.
Design and implement scalable, reliable, and secure infrastructure on cloud platforms (AWS / GCP).
Kubernetes & Containerization
Deploy, manage, and optimize containerized workloads using Kubernetes and Helm.
Troubleshoot Kubernetes clusters, pods, and networking issues.
Manage CI / CD pipelines integrated with Kubernetes-based deployments.
Database Administration
Manage and optimize databases (PostgreSQL, MongoDB, or other DBs).
Perform database tuning, backups, restores, and replication management.
Automate DB monitoring and implement high availability (HA) strategies
Monitoring & Incident Response
Participate in on-call rotations for production support and incident response.
Conduct post-incident reviews and drive preventive improvements.
Security & Compliance
Implement and enforce security best practices in infrastructure and application deployments.
Manage access controls, secrets, and network policies in production environments.
Collaboration & Continuous Improvement
Work with development teams to design systems with reliability and scalability in mind.
Drive automation and self-healing capabilities for common operational tasks.
Contribute to SRE playbooks, runbooks, and documentation.
Required Skills & Qualifications
Education :
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
Experience :
2–5 years of experience as an SRE / DevOps / DBA
Core Skills :
Strong experience with
Kubernetes , Docker, and container orchestration.
Hands-on experience with
Databases
(MySQL, PostgreSQL, MongoDB, or similar).
Proficiency in
Linux system administration
and
shell scripting .
Good knowledge of
cloud platforms
(AWS / GCP / Azure) and related services.
Basic understanding of
networking concepts
(DNS, Load Balancing, Firewalls, etc.).
Programming experience in
Python ,
Go , or
Bash
for automation.
Site Reliability Engineer • Delhi, India