Key Responsibilities :
- Administer Grid Schedulers in large semiconductor design environments
- Perform LSF and License Scheduler configuration changes in large, complex clusters
- Troubleshoot and resolve distributed resource management (DRM) issues related to service availability, performance, and SLA compliance
- Collaborate with frontline support teams and other Engineering Compute teams globally to maintain DRM service availability and optimize computing environments
- Drive capability development and architectural improvements for DRM services and provide input to Grid Architects
- Manage incident, problem, change, and configuration management processes for Grid Systems Engineering from an ITIL framework perspective
- Create and review incident reports, drive major incident resolutions, and track closure of elevated tickets
- Perform root cause analysis (RCA) for repeated issues and coordinate with vendors and peer organizations to resolve bugs or ecosystem problems
- Create release plans for deploying enhancements or internal projects to upgrade tools / processes
- Maintain and audit configuration management database (CMDB) and perform configuration management activities
- Provide periodic capacity and availability reports to management, influencing architectural decisions focused on scalability and system stability
Required Skills and Qualifications :
Bachelor of Science in Computer Science or related field, with 7+ years IT-related experience OR 9+ years without a degreeHands-on experience with Spectrum LSF, Altair Flow Tracer, License Scheduler, and EDA toolsDeep knowledge of Grid best practices, configuration options, limitations, troubleshooting, and performance tuningPractical understanding of high-performance computing environments and distributed resource management, particularly in the EDA and semiconductor design spaceExperience with Perl and shell programming / debugging in UNIX / Linux environmentsStrong customer service skills and ability to communicate clearly and conciselyProven ability to work within geographically dispersed teams and lead self-directed teams of technical contributorsExcellent verbal, written, and presentation communication skillsStrong interpersonal and cross-cultural collaboration skillsSelf-motivated and able to work well in teamsPhysical Requirements :
Frequently transport and install equipment up to 20 lbsPreferred :
Experience with formal change management processesExperience in incident, problem, change, and configuration management under ITIL frameworkExperience creating capacity reports and availability improvement plansSkills Required
Incident Management, EDA Tools, Performance Tuning