Job Title : Windows System Engineer (Disaster Recovery)
Location : Bangalore, Chennai, Pune, Mumbai, Hyderabad
Experience Level : 5 to 8 years
Job Summary :
We are seeking an experienced Windows System Engineer (DR) to manage, support, and optimize Windows-based enterprise infrastructure with a focus on Disaster Recovery (DR) and Business Continuity Planning (BCP).
The ideal candidate will have strong expertise in failover recovery, networking, database health checks, load balancing, and monitoring / log analytics using Splunk.
The role requires a proactive engineer who ensures systems are resilient, secure, and compliant with defined RTO / RPO objectives.
Key Responsibilities :
Disaster Recovery (DR) Management :
- Plan, implement, and maintain DR environments for Windows-based systems.
- Conduct failover and failback exercises to validate DR readiness and minimize downtime.
- Define and manage RTO (Recovery Time Objective) and RPO (Recovery Point Objective) metrics for critical systems.
- Perform DR drills and document lessons learned for continuous improvement.
System and Network Administration :
Manage Windows Server environments (2016 / 2019 / 2022) including patching, performance tuning, and troubleshooting.Configure and monitor network components (DNS, DHCP, IP routing, firewall rules, VLANs) to ensure connectivity during DR operations.Collaborate with network teams to validate load balancing and failover mechanisms.Database and Application Checks :
Perform DB sanity checks and ensure data consistency post-DR switchovers.Coordinate with DBAs and application owners to validate system availability post-recovery.Load Balancer and Failover Operations :
Configure, test, and monitor load balancers (F5, Citrix, HAProxy, or similar) to ensure high availability.Validate load balancing rules, session persistence, and failover logic during DR scenarios.Monitoring and Incident Management :
Use Splunk to monitor system health, event logs, and performance metrics.Develop dashboards and alerts for proactive issue detection and resolution.Perform root cause analysis (RCA) for system outages and DR failures.Documentation and Compliance :
Maintain DR documentation, runbooks, and standard operating procedures (SOPs).Support audit and compliance activities by providing recovery metrics and validation reports.Collaborate with IT Security and Compliance teams to ensure DR adherence to organizational standards.Required Skills & Qualifications :
Bachelors degree in Computer Science, Information Technology, or related field.5 to 8 years of hands-on experience in Windows System Administration and Disaster Recovery Planning.Strong knowledge of Windows Server (2016 / 2019 / 2022) and Active Directory Services.Hands-on experience with failover clustering, replication, and backup technologies (e., Veeam, Commvault, Azure Backup).Solid understanding of RTO / RPO concepts and disaster recovery frameworks.Experience with networking fundamentals TCP / IP, DNS, DHCP, VLAN, routing, and firewalls.Practical knowledge of load balancer configuration and failover testing.Experience using Splunk for log management, monitoring, and alerting.Familiarity with PowerShell scripting for automation and system checks.Excellent troubleshooting, analytical, and documentation skills.(ref : hirist.tech)