Site Reliability Engineer (Windows / Cloud / Automation)
Job Summary :
We are seeking an experienced Site Reliability Engineer with a strong background in managing Windows infrastructure and cloud environments. The ideal candidate will be responsible for designing, implementing, automating, and maintaining scalable infrastructure solutions across AWS, Azure, and VMC environments, ensuring high performance, reliability, and efficiency.
Key Responsibilities :
- Manage and support Windows-based infrastructure , including Windows Server, Active Directory (AD), LDAP, DNS, and Network Storage.
- Gather and analyze system and application metrics to assist in performance tuning, capacity planning, and fault resolution.
- Collaborate with development and DevOps teams to improve services through testing, automation, and streamlined release procedures.
- Participate in system design consulting , platform management , and capacity planning initiatives.
- Develop and maintain automation processes for Infrastructure as a Service (IaaS) and Infrastructure as Code (IaC) using tools such as Terraform, Ansible, PowerShell, Python, and Bash .
- Perform day-to-day management of cloud infrastructure across VMC, AWS, and Azure platforms.
- Oversee backup and patch management to ensure system stability and compliance.
- Implement and manage container orchestration solutions, with hands-on experience in AWS container services .
- Create sustainable, reliable systems and balance feature development speed with service-level objectives (SLOs) .
- Monitor infrastructure performance, proactively identify issues, and implement improvements.
- Follow ITSM processes including Incident, Problem, and Change Management (preferably using ServiceNow ).
Required Skills & Qualifications :
Bachelor’s degree (or equivalent) in Computer Science, Information Technology, or related discipline.Minimum 7 years of experience in IT Infrastructure management.Strong hands-on experience in :Windows Server, AD, LDAP, DNS, and Network StorageCloud platforms : AWS, Azure, VMCInfrastructure automation and scripting : PowerShell, Python, Ansible, Terraform, BashContainer orchestration and management on AWSIn-depth understanding of system performance tuning , troubleshooting , and capacity planning .Familiarity with ITSM processes (ServiceNow preferred).Excellent analytical, problem-solving , and communication skills .Proactive, self-driven , and capable of working independently as well as in a team environment.Preferred Attributes :
Demonstrated ability to build automated, scalable infrastructure solutions.Strong ownership mindset with a focus on reliability, efficiency, and continuous improvement.Ability to communicate technical concepts effectively to both technical and non-technical stakeholders