This job offer is not available in your country.

Site Reliability Engineer - IAC Terraform

Success Pact Consulting Pvt LtdBangalore

22 days ago

Job description

Position : Site Reliability Engineer

Experience : 5 - 9 Years

Location : Bangalore, India

Job Summary :

We are seeking an experienced Site Reliability Engineer (SRE) with 5-9 years of experience to join our Platform Engineering team. This role is crucial for ensuring the high availability, performance, and scalability of our AI-powered code review platform. As a key member of the team, you will operate at the intersection of software engineering and systems operations, building the foundational platforms and automation that enable our engineering teams to deploy, monitor, and scale our services reliably.

You will be instrumental in enhancing the reliability of critical services that process millions of code reviews, building sophisticated automation platforms, and owning the infrastructure that powers our AI-driven analysis engine. This role involves working with cutting-edge technologies, including large language models, real-time processing systems, and distributed architectures.

Key Responsibilities :

Infrastructure and Platform Ownership :

Design, implement, and maintain a scalable infrastructure on Google Cloud Platform (GCP).
You will own and operate critical platform services and build and maintain Infrastructure as Code (IaC) using Terraform to ensure consistent and reproducible deployments.

Reliability and Performance Engineering :

Implement and maintain SLI / SLO frameworks to meet reliability commitments.

You will deploy comprehensive monitoring, alerting, and observability solutions using Datadog and custom instrumentation.

Your duties will also include conducting thorough incident response, root cause analysis, and post-mortem processes to continuously improve system reliability.

You will be responsible for optimizing application and infrastructure performance and designing and implementing chaos engineering practices to proactively identify system weaknesses.

Automation and Developer Experience :

Develop self-service platforms and tooling that empower engineering teams to deploy, monitor, and troubleshoot their services independently.

You will automate operational tasks such as scaling, backup / recovery, and security patching.

A key part of your role will be to create and maintain infrastructure APIs and abstractions that simplify complex operations for development teams.

Security and Compliance :

You will be tasked with integrating security best practices into all infrastructure and platform services. This includes implementing security monitoring, vulnerability scanning, and compliance reporting.

You will also design secure network architectures and establish disaster recovery and business continuity plans.

Required Skills & Qualifications Experience :

5+ years of hands-on experience in Site Reliability Engineering, Platform Engineering, or DevOps roles.

A proven track record of managing production systems at scale in high-growth technology companies.

Technical Proficiency :

Programming Languages : Proficiency in Node.js and TypeScript for building automation tools.

Infrastructure as Code : Advanced experience with Terraform.

Monitoring & Observability : Hands-on experience with Datadog or similar platforms like Prometheus, Grafana, or the ELK stack.

Cloud Platforms : Comprehensive experience with GCP services, including Compute Engine, GKE, Cloud Run, Cloud SQL, and Cloud Storage.

Strong Linux / Unix systems skills.

Experience with Kubernetes and Docker.

Understanding of microservices architecture and distributed systems principles.

Preferred Skills :

Experience with AI / ML infrastructure and tools.

Background in managing high-traffic web applications and API services.

Experience with disaster recovery planning and execution.

Knowledge of FinOps practices and cost optimization.

Experience with performance testing and capacity planning methodologies.

Contributions to open-source SRE or infrastructure tooling projects.

(ref : hirist.tech)

Create a job alert for this search

Site Reliability Engineer • Bangalore

Related jobs

Site Reliability Engineer

AIONBengaluru, KA, IN

Quick Apply

AION is building the next generation of AI cloud platform by transforming the future of high-performance computing (HPC) through its decentralized AI cloud. Purpose-built for bare-metal performance,...Show moreLast updated: 30+ days ago

Promoted
New!

Site Reliability Engineer

Rangam Indiabangalore, India

Infrastructure Platform Engineering (IPE), part of the client Infrastructure & Cloud organisation, are searching for a senior Associate to drive Site Reliability Engineering (SRE) and a professiona...Show moreLast updated: 2 hours ago

Promoted

Site Reliability Engineer

ExasoftBengaluru, IN

Responsibilities and Requirements : .Experience must be at least 10+ years in SRE.Multi Cloud, Hybrid Cloud – on Data center sites. Experience with multiple operating systems (.Operating Systems, Kern...Show moreLast updated: 1 day ago

Promoted

Site Reliability Engineer

BayOne Solutionsbangalore, karnataka, in

Role : Site Reliability Engineer.The CXE Site Reliability Engineering (SRE) team manages the CI / CD pipelines and cloud infrastructure, ensuring seamless deployment, monitoring, and maintenance.Howev...Show moreLast updated: 23 hours ago

Promoted

Senior Site Reliability Engineer

WSO2Bengaluru, Karnataka, India

Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 30+ days ago

Promoted
New!

Site Reliability Engineer

IN10 VMware Software India Private Limitedbangalore, India

Why will you enjoy this new opportunity?.VMware End-User Computing runs the world’s largest Digital Workspace Platform : Workspace ONE. With over 60,000 customers around the globe, our software is us...Show moreLast updated: 2 hours ago

Promoted

Site Reliability Engineer

ViewSonicBengaluru, Karnataka, India

Bachelor's degree in Computer Science, Engineering, or a related field.Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory. Basic understanding of AWS solutions in...Show moreLast updated: 17 days ago

Promoted

Senior Site Reliability Engineer- ELK Expert

iVedha Inc.hosur, tamil nadu, in

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago

Promoted

LSEG - Site Reliability Engineer

REFINITIV INDIA SHARED SERVICES PRIVATE LIMITEDBangalore

LSEG is a leading global financial markets infrastructure and data provider.Our purpose is driving financial stability, empowering economies and enabling customers to create sustainable growth.Our ...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

ElgebraBangalore

Role Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our c...Show moreLast updated: 4 days ago

Promoted

Site Reliability Engineer

Core Minds Tech SOlutionsHosur

Job Description : - Engage with our product teams to understand requirements, design, and implement resilient and scalable infrastructure solutions&l...Show moreLast updated: 30+ days ago

Promoted
New!

Site Reliability Engineer

Halliburtonbangalore, India

We are looking for the right people — people who want to innovate, achieve, grow and lead.We attract and retain the best talent by investing in our employees and empowering them to develop themselv...Show moreLast updated: 8 hours ago

Promoted
New!

Site Reliability Engineer

Oraclebangalore, India

Looking for a DevOps Senior Engineer in the Data Engineering team who can help us support next-generation Analytics applications over Oracle cloud. This posting is for DevOps Senior Engineer in the ...Show moreLast updated: 2 hours ago

Promoted

Site Reliability Engineer

TavantBengaluru, Karnataka, India

With 25+ years of experience building innovative digital products and solutions, Tavant provides impactful results to its customers. It has been the frontrunner in driving digital innovation and tec...Show moreLast updated: 27 days ago

Promoted

Site Reliability Engineer

XebiaBengaluru, Karnataka, India

AWS DevOps Engineer with strong expertise in Observability and Site Reliability Engineering (SRE).The role requires hands-on experience with AWS services, Infrastructure as Code (IaC), CI / CD, monit...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

WhiteLotus Talent PartnersBengaluru, Karnataka, India

L0 and L1 Site Reliability Engineer (SRE) Support.Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by. In this role, you will focu...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Uplershosur, tamil nadu, in

Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 25 days ago

Promoted

Site Reliability Engineer

Amicon Hub ServicesBengaluru, Karnataka, India

Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 7 days ago

Promoted

Site Reliability Engineer - Chaos Management

Xebiahosur, tamil nadu, in

AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 8 days ago

Promoted
New!

Site Reliability Engineer

ACL DigitalBengaluru, Karnataka, India

Service Management : Maintain application uptime / performance, manage system enhancements and defects, oversee daily operational activities, and ensure continuous improvement and adherence to ITIL be...Show moreLast updated: 22 hours ago