Position : Site Reliability Engineer (SRE)
Role Overview :
We are seeking an experienced Site Reliability Engineer (SRE) with a strong background in Windows infrastructure to manage and optimize our cloud and on-premises environments. The ideal candidate will partner with development teams to improve service reliability, implement automation, and ensure high-performance systems across VMC, AWS, and Azure platforms.
Key Responsibilities :
- Manage day-to-day operations of VMC, AWS, and Azure infrastructure .
- Gather and analyze metrics from operating systems and applications to support performance tuning and fault finding.
- Collaborate with development teams to improve services through rigorous testing and release procedures.
- Participate in system design consulting, platform management, and capacity planning.
- Create sustainable systems and services through automation and Infrastructure as Code (IaC) .
- Balance feature development speed and reliability by defining and maintaining service-level objectives (SLOs) .
- Build and document automation processes for Infrastructure as a Service (IaaS).
- Oversee backup, patch management, and system maintenance.
- Configure, deploy, maintain, troubleshoot, and monitor container orchestration on AWS.
- Communicate complex technical ideas effectively to both technical and non-technical stakeholders.
Qualifications :
Bachelor’s degree (or equivalent) in Computer Science or related discipline.7+ years of experience in IT operations, Windows infrastructure, or SRE roles.Hands-on experience with Windows Server, Active Directory (AD), LDAP, DNS, network storage, and Azure compute services .Strong scripting and programming skills using PowerShell, Python, Ansible, Terraform, and Bash .Solid understanding of container orchestration (Kubernetes, Docker, etc.).Familiarity with ITSM processes (Incident, Problem, Change Management) using tools such as ServiceNow (preferable).Proactive approach to identifying performance bottlenecks and areas for improvement.Strong analytical, problem-solving, and communication skills .Ability to work both independently and collaboratively with a sense of ownership.Skills & Attributes :
Strong interpersonal and teamwork skills.Self-driven with a proactive mindset .Ability to balance reliability and speed while maintaining high-quality services.