Position : Site Reliability Engineer (SRE)
Experience : 3+ Years
Job Type : Full-time
Job Summary :
We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our global engineering team focused on the monitoring and management of Kaseya's globally distributed Cloud SaaS production environments.
The ideal candidate will have 3+ years of experience in Windows and Linux administration, hands-on experience with hybrid cloud infrastructure (VMware, AWS, Azure), and strong expertise in fault analysis and preventative maintenance to ensure infrastructure and application stability in a 24 / 7 Responsibilities :
- SaaS Infrastructure Maintenance : Provide direct support and maintenance for Kaseyas globally distributed SaaS production infrastructure running across various Virtual and Hybrid Data Centres.
- Hybrid Virtualization Management : Manage and monitor Virtual Machines (VMs) running across a hybrid virtualization environment, including Hyper-V, VMWare, Amazon EC2, and Microsoft Azure.
- System Administration : Execute core Windows Administration tasks (3+ years experience), including Active Directory, IIS, Disk management, Windows Patching, and Performance tuning. Provide troubleshooting and administrative support for Linux-based production systems (Ubuntu, RHEL).
- Monitoring & Tooling : Enhance and support our global infrastructure and Cloud software monitoring solutions to ensure proactive detection of issues.
- Incident & Problem Resolution : Quickly resolve hardware, operational, infrastructure, performance, and application incidents. Provide preventative maintenance, troubleshooting, and lead problem resolution efforts.
- Capacity Planning & Provisioning : Provision new servers as required to proactively anticipate and meet production server capacity requirements.
- Collaboration & Support : Effectively communicate and collaborate with R&D, Customer Success, Support, and Operations teams globally. Participate in weekly maintenance and mandatory on-call duties, which require working in rotational shifts in a 24 / 7 / 365 environment and providing after-hours :
- Experience : 3+ years of overall experience in Windows and Linux system administration in a production environment.
- Operating Systems : Strong expertise in Windows Administration (Active Directory, IIS, Disk management, Patching) and administering Linux-based production systems (Ubuntu, RHEL).
- Cloud Infrastructure : Hands-on experience with Cloud Infrastructure platforms, specifically Amazon Cloud (AWS) or MS Azure.
- Problem Solving : Strong fault analysis / determination and problem-solving skills.
- Security : Basic computer and network security skills.
- Education : Bachelor's degree in Computer Science, Information Technology, or equivalent work Skills : Relevant Microsoft / Linux Certifications. Strong organizational skills, ability to multitask, and strong interpersonal skills for working in a distributed team environment.
(ref : hirist.tech)