Talent.com
Athenahealth Technology Private Limited
Senior Site Reliability Engineer – AWS & KubernetesAthenahealth Technology Private Limited • Bengaluru / Bangalore
No longer accepting applications
Senior Site Reliability Engineer – AWS & Kubernetes

Senior Site Reliability Engineer – AWS & Kubernetes

Athenahealth Technology Private Limited • Bengaluru / Bangalore
30+ days ago
Job description

Position Summary: We are looking for a Senior Site Reliability Engineer – SMTS to join our Cloud Infrastructure Engineering division in Bangalore. Cloud Infrastructure Engineering ensures the continuous availability of the technologies and systems that are the foundation of athena health's services. We are directly responsible for thousands of servers, petabytes of storage, and handling thousands of web requests per second, all while sustaining growth at a meteoric rate. We enable an operating system for the medical office that abstracts away administrative complexity, leaving doctors free to practice medicine.

The Team: We are a bunch of Site Reliability Engineers who are passionate about reliability, automation, and scalability. We use an agile based framework to execute our work, ensuring we are always focused on the most important and impactful needs of the business. We support systems in both private and public cloud and make data-driven decisions for which one best suit the needs of the business. We are relentless in automating away manual, repetitive work so we can focus on projects that help move the business forward.

Job Responsibilities:

Reliability and Availability:

  • Define, measure, and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for cloud services and infrastructure components.
  • Lead efforts to continuously improve system availability, fault tolerance, and disaster recovery capabilities.
  • Ensure proactive incident detection, efficient root cause analysis, and timely resolution of production incidents
  • On-Call participation in 24x7 setup.

Automation and Infrastructure as Code (IaC):

  • Drive automation efforts to reduce manual intervention and streamline cloud infrastructure management.
  • Implement Infrastructure as Code (IaC) using tools like Terraform, AWS CloudFormation, and Ansible to provision, manage, and scale cloud resources.
  • Automate deployment, scaling, and monitoring processes to improve efficiency and reduce operational complexity.

Monitoring, Observability, and Performance Tuning:

  • Design and implement monitoring, logging, and alerting solutions to track cloud infrastructure health, performance, and security.
  • Use observability tools (e.g., Prometheus, Grafana, Cloud Watch) to ensure continuous visibility into cloud infrastructure performance and capacity.
  • Identify bottlenecks and performance issues, proposing and implementing improvements to ensure optimal resource usage.

Security and Compliance:

  • Ensure that cloud infrastructure is built with security best practices in mind and meets all relevant compliance and regulatory requirements.
  • Collaborate with security teams to implement security controls and risk mitigation strategies across cloud environments.
  • Regularly audit and review cloud infrastructure for security vulnerabilities and compliance gaps.

Collaboration and Cross-Functional Leadership:

  • Work closely with development, DevOps, and operations teams to ensure cloud infrastructure aligns with application and business requirements.
  • Lead and mentor a team of Site Reliability Engineers, promoting best practices and fostering a culture of operational excellence.
  • Act as a key technical point of contact for cloud-related infrastructure and operations issues.

Incident Management and Post-Mortem:

  • Lead the incident response efforts for cloud infrastructure-related issues, ensuring that all incidents are managed effectively.
  • Conduct post-incident reviews (PIRs) to identify root causes and implement preventive measures.
  • Continuously refine incident management processes to reduce downtime and enhance recovery times.

Qualifications

  • 5-9 years of hands-on experience with cloud automation and configuration management tools (e.g., Terraform, AWS CloudFormation, Ansible). On a Hybrid Cloud Set-up.
  • 5+ years of experience in a Site Reliability Engineering (SRE), Infrastructure Engineering, or DevOps role, with at least 3+ years in a technical leadership capacity.
  • Deep knowledge of cloud services and technologies (e.g., EC2, S3, Lambda, Kubernetes, etc.).
  • Proficiency in scripting or programming languages (Python, Go, Bash, etc.).
  • Experience with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, ELK stack).
  • Familiarity with Continuous Integration/Continuous Deployment (CI/CD) pipelines and cloud-native development practices.
  • Strong expertise in managing cloud infrastructure (AWS, Google Cloud, Azure) in production environments.
  • Experience with cloud-native architectures, microservices, and containerized environments (Kubernetes, Docker).
  • Proven experience in building and managing highly available, scalable, and fault-tolerant systems in the cloud.
  • Strong understanding of cloud networking, storage, compute services, On-Prem and security best practices.

Skills Required
Docker, Ansible, Kubernetes, Terraform, Gcp, Cloudformation, Azure, Aws
Create a job alert for this search

Senior Site Reliability Engineer – AWS & Kubernetes • Bengaluru / Bangalore

Similar jobs

Senior Site Reliability Engineer

ScaleneWorksBengaluru, Karnataka, India
Quick Apply

Experience in C++ / Java: if one of the two it is ok.Knowledge of cloud would be appreciated.Knowledge of software development life cycle: nice to have.Has working experience and advanced and speci... Show more

Senior Site Reliability Engineer

QuantiphiBengaluru, Republic Of India, IN

Work Location: Mumbai/Bangalore/Trivandrum.Deep cloud expertise with hands-on experience in GCP (Azure + GCP experience is a plus) including compute, storage, networking, and managed services.Distr... Show more

 • Promoted

Site Reliability Engineer

Weekday AIBengaluru, KA, IN
Quick Apply

This role is for one of the Weekday's clients.As we transition to Kubernetes as a fundamental component of our infrastructure development, this individual will play a key role in driving this chang... Show more

DevOps Site Reliability Engineer

HyperVergeBengaluru, Republic Of India, IN

We are looking for an SRE who doesn't just "maintain" systems but builds them.You won't be stuck in a traditional support loop;.The ideal candidate has a "developer first" mindset, using code to so... Show more

 • Promoted

Site Reliability Engineer

Synechronbangalore, karnataka, in

Position Site Reliability Engineer.Notice: Immediate joiner to 15 days.Synechron is a global technology consulting firm that helps leading organizations accelerate digital transformation through in... Show more

 • Promoted

Site Reliability Engineer

Resource Algorithmbangalore, karnataka, in

We are seeking an experienced and dynamic.Site Reliability Engineering (SRE) Lead.As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engi... Show more

 • Promoted

Lead Site Reliability Engineer

Concentrixbangalore district, karnataka, in

As a Lead Site Reliability Engineer, you will own the reliability and availability of our production systems.You will champion SRE principles across engineering teams — defining SLOs, managing erro... Show more

 • Promoted

Senior Site Reliability Engineer

Solvex SolutionsBengaluru, Republic Of India, IN

Role: Senior Operations Engineer.Bangalore or Chennai (1-2 day/week onsite if near office).Handle and fix system issues, find root causes, and prevent them from happening again.Set up monitoring, l... Show more

 • Promoted

Site Reliability Engineer

Signzybangalore, karnataka, in

Signzy is an AI-powered RPA platform for financial services.No matter how complex your workflow or operational complexity, Signzy can completely automate your back-operations decision-making proces... Show more

 • Promoted

Site Reliability Engineer

HiroJetbangalore, karnataka, in

Role - Site Reliability Engineer.Location - In Office (Bengaluru, India).AI voice automation platform that enables businesses to streamline high-volume, repetitive communication across customer sup... Show more

 • Promoted

Reliability Engineer

Sasken Technologies Limitedbangalore, karnataka, in

Sasken is a pioneer in Product Engineering and Digital Transformation delivering concept-to-market and chip-to-cognition R&D solutions to customers across the semiconductor, automotive, industrial,... Show more

 • Promoted

Site Reliability Engineer

Genpactbangalore, karnataka, in

Inviting applications for the role of Site Reliability Engineer with over 7+ years of experience to join our team.Strong experience in Site Reliability Engineering and Development.Hands-on experien... Show more

 • Promoted

Site Reliability and Platform Engineer

BayOne SolutionsBengaluru, Republic Of India, IN

Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.Minimum 5+ years of experience in SRE/DevOps or Platform engineer roles.Strong experience in Site Reliability ... Show more

 • Promoted

Site Reliability Engineer I

Aqilea (formerly Soltia)Bangalore, Karnataka, India
Quick Apply

Aqilea is an IT and engineering consulting partner that helps companies get more out of their technology and operations.With teams in Stockholm and Bangalore, we work closely with our clients to bu... Show more

Senior Software Engineer (Site Reliability)

Deltekbangalore district, karnataka, in

The Deltek Global Cloud team focuses on the delivery of first-class services and solutions for our customers.We are an innovative and dynamic team that is passionate about transforming the Deltek c... Show more

 • Promoted

Senior Site Reliability Engineer

Josysbangalore, karnataka, in

Senior Site Reliability Engineer (SRE).Josys, a dynamic B2B SaaS platform startup, has embarked on a mission to revolutionize IT operations globally, following an exceptional launch in Japan and se... Show more

 • Promoted

Lead Site Reliability Engineer

HCLTechbangalore, karnataka, in

Job Title: Lead Site Reliability Engineer.The Support Lead (SRE) is responsible for overseeing the support operations and site reliability engineering tasks, ensuring the effective functioning of s... Show more

 • Promoted

Site Reliability Engineer

LTMbangalore, karnataka, in

If interested, please apply to this link https://ltim.Experience Range: 3 to 5 years.Notice Period: Immediate Joiners.Incident handling with Servicenow following runbooks.Observability tools like D... Show more

 • Promoted

Reliability Engineer

Birlasoftbangalore, karnataka, in

Job Description: Reliability Sr.Reliability Architect with 8 to 12 years of experience in proactive monitoring, automation, and observability.Skilled in AIOps/MLOps, infrastructure management, and ... Show more

 • Promoted

Site Reliability Engineer

HyperVergebangalore, karnataka, in

We are looking for an SRE who doesn't just \"maintain\" systems but builds them.You won't be stuck in a traditional support loop; instead, you will focus on the.The ideal candidate has a \"develope... Show more