Description :
Role : Site Reliability Engineer (SRE).
Location : Hyderabad.
Experience : 10 to 15 Years.
Job Summary :
The Site Reliability Engineer (SRE) will play a critical role in ensuring the reliability, scalability, and performance of Citizens Banks enterprise systems and cloud environments.
The ideal candidate brings deep technical expertise across multi-cloud platforms, automation, observability, and incident management driving reliability engineering practices and operational excellence in a complex financial services environment.
Key Responsibilities :
- Manage and support cloud-based solutions across AWS, Azure, GCP, and other IaaS / PaaS / SaaS / CDN environments.
- Design, implement, and maintain reliable, scalable, and secure infrastructure, ensuring high availability and performance.
- Collaborate with DevOps and security teams to implement DevSecOps workflows using Git, Jenkins, Docker, Kubernetes (EKS / AKS).
- Automate infrastructure and configuration management using Terraform, Ansible, and scripting languages like Python, Bash, or PowerShell.
- Analyze traffic flows, system logs, and application events to troubleshoot issues and identify interdependencies across systems.
- Utilize monitoring and observability tools such as DataDog, Splunk, and CloudWatch for proactive system health management.
- Implement on-call support processes, develop and maintain runbook documentation, and work toward full automation of repetitive tasks.
- Collaborate with other SREs to build resilient systems and promote Site Reliability Engineering best practices across the enterprise.
- Handle critical application outages, perform root cause analysis, and drive incident resolution and preventive measures.
- Work within an Agile environment, partnering with cross-functional teams to continuously improve performance and reliability.
Technical Skills Required :
Cloud Platforms : AWS, Azure, GCP.DevOps / DevSecOps Tools : Jenkins, Git, Docker, Kubernetes (EKS, AKS).Infrastructure as Code (IaC) : Terraform, Ansible.Monitoring & Logging : DataDog, Splunk, CloudWatch.Scripting : Python, Bash, PowerShell.Networking : TCP / IP, DNS, HTTP, Load Balancing, Routing.OS Environments : Linux, Windows Server.Familiarity with AMI builds, patching, and rehydration processes.Core Competencies :
Strong analytical and troubleshooting skills.Proven ability to drive incident response and post-incident reviews.Excellent communication and stakeholder management.Ability to collaborate in global, distributed teams.Focus on automation, resilience, and continuous improvement.(ref : hirist.tech)