Expertise on BareMetalPhysical servers and datacenter architecture along with Azure concepts
Windows Server Operating systems on physical servers and environments
Networking Basic knowledge and Troubleshooting experience
Knowledge of LinuxUbuntu
PowerShell experience in automation
Working experience in IT Operation such as creating support case troubleshooting the issue track the issue mitigating the issue working with product support engineer
PowerBi Kusto Intermediate skills Preferred
Excellent oral and written English communication
Job Summary
Responsible for buildmanage Azure High performance computing clusters and automating cluster buildout workflowtasksreports produce innovative solutions to the cluster buildoutmaintenance activities
Job Responsibilities
- InfiniBand networking layer configuration deployment and troubleshooting including cabling validation wrt network topology
- IBUFM switch softwareFW upgrades during buildout production
- Health monitoring of IBUFMNodes and troubleshooting using existing tools designated by Microsoft and configuring them as necessary by provided procedures
- Define and follow recommended practices and business processes within the Project
- Helping supporting in automating the buildout process reporting
- Cluster maintenance activities as required include node diagnostics resolution routing and recovery
- Monitoring target clusters in buildout progress working with componentvendor teams to unblockdrive the deployments to resolution
- Moving nodesdevices to RMA queue for faulty devices and work with respective teamsvendors until resolution moving them back into production to fill the cluster capacity
- Incident queue mitigation and automating
- Creating TSG SOP documents
- Team collaboration
- Willing to work on 247 Shifts
Skills Required
Linux, System Management, Linux Unix