Talent.com
IT Infrastructure Support Site Reliability Engineer
IT Infrastructure Support Site Reliability EngineerAstreya Consultancy India Private Ltd • Hyderabad, India
IT Infrastructure Support Site Reliability Engineer

IT Infrastructure Support Site Reliability Engineer

Astreya Consultancy India Private Ltd • Hyderabad, India
30+ days ago
Job description

About the Job


We are seeking an experienced Site Reliability Engineer to join our IT Infrastructure Support team,
responsible for ensuring the reliability, scalability, and performance of critical physical security
infrastructure and supporting systems. In this role, you will combine software engineering expertise with
operations knowledge to build and maintain automation tools, monitoring systems, and processes that
support enterprise-grade server, network, and security device management. You will work closely with
cross-functional teams to define and enforce service level objectives, reduce operational toil through
automation, and drive continuous improvement in system resilience. This position requires 24x5
availability with on-call rotation to ensure uninterrupted support for mission-critical infrastructure.


Key Responsibilities


 Partner with leadership to establish, monitor, and enforce Service Level Indicators (SLIs) and
Service Level Objectives (SLOs) for infrastructure tooling, including configuration compliance
rates, patch success rates, and deployment latency metrics.
 Provide Level 3 expertise for tooling-specific incidents, focusing on automating incident
remediation workflows and reducing Mean Time To Repair (MTTR) through intelligent
automation and runbook development.
 Identify and automate repetitive manual tasks across managed infrastructure, targeting measurable
reductions in operational overhead (e.g., 50% reduction in manual server build time) through
scripting and workflow automation.
 Conduct thorough root cause analysis and lead blameless postmortems for all major service-
impacting incidents, driving systemic improvements in tooling reliability and infrastructure
resilience.
 Engineer and maintain automated processes and scripts to populate, update, and synchronize asset
management platforms (e.g., NetBox), configuration management databases, and monitoring
systems for internal and external stakeholders.
 Design, develop, and deploy full-stack applications, custom plugins, and automation scripts to
extend functionality of management and monitoring systems, enabling direct device interaction for
configuration management.
 Develop and maintain fully automated Infrastructure-as-Code configurations for Windows and
Linux server roles using tools such as Ansible, Terraform, or Puppet, including drift detection and
auto-remediation capabilities.
 Build end-to-end automation pipelines for vulnerability patching, security baseline enforcement
(CIS benchmarks), and continuous compliance auditing against internal and regulatory standards
for physical security devices.
 Develop API-driven tools for network configuration management, automated firmware updates,
pre/post-change validation, and real-time network health monitoring across the device fleet.
 Deploy and standardize monitoring agents, centralized log collection systems, and custom
dashboards with alerts based on critical SLIs (latency, error rate, saturation, traffic) for servers and
edge devices.
 Build automation scripts for intelligent ticket handling, problem validation, and escalation
workflows within enterprise ticketing systems, ensuring 2-hour initial response SLAs are
consistently met.
 Participate in 24x5 on-call rotation to provide timely support for infrastructure systems, security
devices, and related tooling, ensuring service continuity and rapid incident response.


Required Skills


 6+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
 Strong proficiency in Python, Bash, and PowerShell for automation scripting, with experience in
Go for building high-performance backend services and APIs.

 Hands-on experience with Infrastructure-as-Code tools (Terraform, Ansible, Chef, or Puppet) and
configuration management practices, including drift detection, version control, and automated
remediation.
 Advanced knowledge of Linux and Windows server environments, including Tier 3
troubleshooting capabilities, system hardening, and enterprise-scale server management.
 Solid understanding of enterprise networking concepts, Cisco device administration, network
automation protocols (NETCONF/RESTCONF), and experience with network monitoring and
flow analysis tools.
 Experience implementing and managing monitoring solutions (Prometheus, Grafana, Datadog)
and centralized logging platforms (ELK Stack), with ability to create custom dashboards and
alerting rules.
 Proficiency in implementing CI/CD pipelines, automated testing frameworks, and deployment
strategies using modern DevOps tooling, with strong emphasis on code quality, security, and
maintainability.

Create a job alert for this search

IT Infrastructure Support Site Reliability Engineer • Hyderabad, India

Similar jobs
Site reliability engineer

Site reliability engineer

Kanerika Inc • Hyderabad, Andhra Pradesh, India
Support system reliability, automation, and operational efficiency by developing automation tools, improving monitoring systems, and contributing to infrastructure management.The role focuses on re...Show more
Last updated: 23 days ago • Promoted
L1/L2 IT Support Engineer

L1/L2 IT Support Engineer

Programmers.io • secunderabad, telangana, in
Shift timing - (Rotational Shift ).Managed Service Provider (MSP) environments.The ideal candidates will have 3–7 years of hands-on experience supporting MSP clients across various domains includin...Show more
Last updated: 9 days ago • Promoted
Staff Site Reliability Engineer

Staff Site Reliability Engineer

The Hartford India • hyderabad, telangana, in
The Safe Enablement team, a subset of the AI Platform Team, carries a mission of building site-reliable practices and guardrails into the platforms the AI Platform team builds and the Analytics Com...Show more
Last updated: 21 days ago • Promoted
Infrastructure Reliability Engineer

Infrastructure Reliability Engineer

Evernorth Health Services • Hyderabad, Republic Of India, IN
Evernorth℠ exists to elevate health for all, because we believe health is the starting point for human potential and progress.As champions for affordable, predictable and simple health care,.Our in...Show more
Last updated: 26 days ago • Promoted
Linux System Administrator

Linux System Administrator

MGT-COMMERCE GmbH • hyderabad, telangana, in
Job Title: System Administrator / Senior System Administrator .Provide 24/7 hosting and infrastructure support for Magento eCommerce environments, ensuring high availability, security, performance,...Show more
Last updated: 6 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Tanla Platforms Limited • hyderabad, telangana, in
We are looking for a Senior Site Reliability Engineer (SRE) to ensure high availability, reliability, scalability, and performance of our CPaaS platforms supporting real-time communication services...Show more
Last updated: 24 days ago • Promoted
Site Reliability Engineer III

Site Reliability Engineer III

McDonalds in India • Hyderabad, India
We are seeking an exceptional Senior Data Product Engineering SRE to lead the development and operational excellence of our data products that deliver insights and drive critical business decisions...Show more
Last updated: 23 days ago • Promoted
Lead Site Reliability Engineer

Lead Site Reliability Engineer

Concentrix • hyderabad, telangana, in
As a Lead Site Reliability Engineer, you will own the reliability and availability of our production systems.You will champion SRE principles across engineering teams — defining SLOs, managing erro...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

LTM • hyderabad, telangana, in
Hyderabad-L&T Metro-flr 1-9,11&12.Site Reliability Engineer (SRE) – Azure Storage.We are seeking a Site Reliability Engineer (SRE) to support Azure Storage deployments and operations across public,...Show more
Last updated: 25 days ago • Promoted
Site reliability engineer

Site reliability engineer

Tanla Platforms Limited • Hyderabad, Andhra Pradesh, India
We are looking for a Senior Site Reliability Engineer (SRE) to ensure high availability, reliability, scalability, and performance of our CPaa S platforms supporting real-time communication service...Show more
Last updated: 23 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

DigiHelic Solutions Pvt. Ltd. • secunderabad, telangana, in
AWS, DevOps, Kubernetes, CI/CD, GoLang/Python.AWS, DevOps, Kubernetes - Hands on experience of minimum 2 years in all of these skills is Mandatory.Show more
Last updated: 5 hours ago • Promoted • New!
Site Reliability Engineer

Site Reliability Engineer

The Hartford India • hyderabad, telangana, in
Our client is a leader in property and casualty insurance, employee benefits and mutual funds.One of the largest insurers in the United States with many decades of expertise, this company is widely...Show more
Last updated: 22 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

UST • hyderabad, telangana, in
SRE Operations Avaloq Support:.Job descriptionRole & responsibilities.Provide production support and troubleshooting for the Avaloq Banking Suite platform, ensuring seamless operations and resolvin...Show more
Last updated: 15 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

nexocean • Hyderabad, Telangana, India
Skills Required: Kubernetes, Terraform, Ansible, ARM Templates, AWS, GCP, Azure, Linux, CI/CD, Python, Bash, Prometheus, Grafana, SRE, Site Reliability.Experience Range: 5 - 15 years.The Senior Sit...Show more
Last updated: 1 day ago • Promoted
Site reliability engineer

Site reliability engineer

LTM • Hyderabad, Andhra Pradesh, India
Notice period : 15 days to Immediate only.Work Model : Hybrid: 3 days work from office, 2 days work from home.Site Reliability Engineer (SRE) – Azure Storage.We are seeking a Site Reliability Engin...Show more
Last updated: 18 days ago • Promoted
Site Reliability Engineer III [T500-24447]

Site Reliability Engineer III [T500-24447]

McDonald's Global Office in India • hyderabad, telangana, in
One of the world’s largest employers with locations in more than 100 countries, McDonald’s Corporation has corporate opportunities in Hyderabad.Our global offices serve as dynamic innovation and op...Show more
Last updated: 16 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Kanerika Inc • hyderabad, telangana, in
Support system reliability, automation, and operational efficiency by developing automation tools, improving monitoring systems, and contributing to infrastructure management.The role focuses on re...Show more
Last updated: 24 days ago • Promoted
Site Reliability Engineer - Kafka & Axon Platforms

Site Reliability Engineer - Kafka & Axon Platforms

National Payments Corporation Of India (NPCI) • Hyderabad, Republic Of India, IN
Experience working on Linux based infrastructure.Excellent understanding of Python, Bash Shell, and Java.Working knowledge of various tools, open-source technologies, and cloud services.Awareness o...Show more
Last updated: 3 days ago • Promoted