Tekonika - Service Reliability Engineer - Production Systems

ConfidentialBengaluru / Bangalore, India

7 days ago

Job description

Job Title : Service Reliability Engineer

Location : Bangalore (Hybrid)

Experience : 9-12 Years

Mode of Working : Hybrid (Office-Based)

About The Role

We are looking for a highly skilled and experienced Lead Service Reliability Engineer (SRE) to join our growing team. In this role, you will be responsible for ensuring the reliability, performance, and scalability of our production systems. You'll play a key part in incident response, infrastructure automation, and driving operational excellence across the organization.

Key Responsibilities

Handle and lead the response to production incidents with calm and clarity.
Communicate effectively with internal teams and clients during outages.
Draft detailed Root Cause Analysis (RCA) documents post-incident.
Monitor and improve the performance, stability, and health of production systems.
Proactively identify and resolve system issues by analyzing metrics and logs.
Scale infrastructure to meet business objectives while adhering to SLA / SLO targets.
Perform upgrades and maintenance on EKS clusters.
Administer Kubernetes clusters and ensure optimal configuration and performance.
Automate infrastructure using Terraform and Terragrunt (IaC).
Integrate observability and security checks into CI / CD pipelines.

Required Skills & Qualifications

Proven experience in managing production environments and incident handling.

Hands-on experience with incident management tools (e.g., PagerDuty, ServiceNow).

Strong expertise in observability tools (e.g., Datadog).

Proficient in scripting / programming using Python or similar languages.

Solid understanding and administration of Kubernetes.

Expertise in Infrastructure as Code (IaC) using Terraform and Terragrunt.

In-depth experience with AWS, including :

IAM (with cross-account role experience preferred)

EC2, VPC, S3

Networking (VPC, Transit Gateway, NACLs, Security Groups)

Experience with EKS for cluster management and upgrades.

Familiarity with CI / CD pipelines and DevOps best practices.

Preferred / Bonus Skills

Exposure to infrastructure security and best practices :

IAM least privilege, encryption, secrets management, etc.

Experience working in Agile / Scrum environments.

What We Offer

Opportunity to work on high-impact, production-critical systems.

Collaborative and inclusive work culture.

Competitive compensation and benefits.

Learning and growth opportunities in cloud-native technologies and DevOps practices.

Join us and lead the charge in building scalable, reliable, and secure systems that power our mission.

(ref : hirist.tech)

Skills Required

Servicenow, Terraform, Datadog, Kubernetes, Python, Aws

Create a job alert for this search

Reliability Engineer • Bengaluru / Bangalore, India