HPC Software Manager
KLA is looking for an HPC Software Manager to build and lead a team responsible for architecting and developing the distributed software infrastructure for image computing clusters within the LS division. This pivotal role involves enabling scalable, high-performance platforms that support advanced image processing and AI workloads.
Key Responsibilities
- Strategic Leadership : Define and drive the long-term vision and roadmap for distributed HPC software infrastructure supporting image computing clusters.
- Team Development : Build, mentor, and grow a high-performing team of software engineers and technical leaders.
- Cross-functional Collaboration : Partner with product, hardware, and algorithm teams to align infrastructure capabilities with evolving image processing and AI requirements.
- Platform Architecture : Oversee the design and implementation of scalable, fault-tolerant distributed systems optimized for hybrid CPU / GPU workloads.
- Lifecycle Management : Lead the end-to-end development of image computing platforms, from requirements gathering through deployment and maintenance, using best-in-class project management practices.
- Developer Enablement : Deliver robust software platforms and tools that empower engineers to develop, test, and deploy new image processing and deep learning algorithms efficiently.
- Innovation in Hybrid Computing : Spearhead the integration of traditional image processing and AI / DL techniques into a unified hybrid computing architecture, leveraging modern HPC technologies.
Skills
Deep understanding of distributed computing frameworks and Linux Systems Programming.Proficiency in C++, Python, and / or other systems programming languages.Familiarity with GPU computing and hybrid CPU / GPU architectures.Strong grasp of software development best practices, CI / CD, and DevOps principles.Demonstrated ability to lead and drive functional teams.Excellent communication and stakeholder management skills.Passion for mentoring and developing engineering talent.Proven track record in building and scaling distributed systems, preferably in HPC or cloud-native environments.Experience with image processing, computer vision, or AI / ML infrastructure is highly desirable.Qualifications
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field.Skills Required
Hpc, C++, Python, gpu computing , Devops