We are seeking an exceptional and highly motivated HPC System R&D Engineer to join our team.
In this role, you will be instrumental in developing and demonstrating next-generation HPC technologies, specifically focusing on their deployment and scalability across both on-premise and cloud infrastructures.
You will identify limitations in current CPU and GPU-based solutions for large-scale AI deployments and architect innovative distributed frameworks and system-level solutions to overcome these challenges.
Your work will directly impact the development of KLA's future tools, enabling breakthroughs in process control.
Critically analyze existing HPC solutions based on CPU and GPU clusters to pinpoint bottlenecks and limitations in deploying AI-based solutions at scale on both on-premise and cloud infrastructures.
Design, develop, and implement distributed frameworks and system-level solutions that enable seamless scaling of image processing and AI workloads from single GPUs to multi-node clusters with numerous GPUs.
Focus on the challenges and opportunities of deploying HPC workloads in cloud environments, including resource management, scalability, and cost optimization.
Install, benchmark, and rigorously evaluate pre-release hardware (CPUs, GPUs, interconnects, etc.) to assess their suitability for next-generation KLA tools.
This includes identifying or developing relevant workloads for early-stage evaluation and prototyping.
Conduct in-depth performance analysis of hardware and software stacks, identifying areas for optimization and improvement.
Build functional prototypes and demonstrations of developed technologies on on-premise testbed clusters, paving the way for their integration into future KLA tools.
Masters or PhD in Computer Science, Electrical Engineering, or a closely related field.
Exceptional Bachelor's degree holders with significant relevant experience and an extraordinary track record will also be considered.
Deep understanding of operating systems (Linux internals preferred), computer networks (high-speed interconnects like InfiniBand), and high-performance computing applications.
Strong mental model of the architecture of modern distributed systems, including a comprehensive understanding of CPUs, GPUs, and various hardware accelerators.
Proven experience with the deployment and scaling of deep-learning frameworks such as TensorFlow and PyTorch on large-scale on-premise or cloud infrastructures.
Strong background in modern and advanced C++ concepts (including parallel programming paradigms).
Excellent scripting skills in Bash, Python, or similar languages for automation, system administration, and data analysis.
Good verbal and written communication skills to effectively collaborate with a diverse team and present technical findings.
Things to Make us go Wow!
Experience with the development and training of deep learning models using frameworks such as TensorFlow and PyTorch.
Experience with building or significantly contributing to open-source operating systems and the software stack on pre-release hardware.
Solid understanding of container infrastructure such as Docker or Singularity, and container orchestration platforms like Kubernetes for managing HPC workloads in the cloud.
Active participation in C++ standards bodies or similar technical communities.
Deep understanding and practical experience with HPC services offered by major cloud providers (AWS, Azure, GCP).
Demonstrated expertise in performance profiling, tuning, and optimization of HPC applications.
In-depth knowledge of high-speed interconnect technologies like InfiniBand and their impact on distributed application performance.
Location :
Noida, Uttar Pradesh, India (Based on the current location)
Benefits :
Opportunity to be at the forefront of HPC and AI innovation for a world-leading technology company.
Work with a team of extraordinary engineers and researchers.
Access to state-of-the-art on-premise and cloud HPC infrastructure.
Make a significant impact on the future of semiconductor manufacturin
(ref : hirist.tech)
Create a job alert for this search
Cloud Infrastructure Engineer • India
Related jobs
Promoted
New!
Infrastructure Engineer (AWS CDK with TypeScript)
Trispoke managed servicesIndia
Job Title : Infrastructure Engineer (AWS CDK with TypeScript).Employment Type : Independent Contractor.Contract Duration : 1 Month (with potential extension).
Location : India, Pakistan, Nigeria, Kenya,...Show moreLast updated: 3 hours ago
Promoted
L3 O365 Engineer
Nextbridge IT SolutionsNagpur, IN
We are seeking a highly skilled .This senior role is a critical escalation point for complex issues, driving the resolution of major incidents and ensuring the seamless operation, security, and pro...Show moreLast updated: 10 days ago
Promoted
Sr. Cloud Infrastructure Support Engineer (AWS, Azure and GCP) _HCL Cloud Product (Bangalore)
HCLSoftwareIndia
HCL Software” : - Is a Product Development Division of HCL Tech : That operates its primary Software business.At HCL Software we Develop, Market, Sell and Support over 20 Product families in the area...Show moreLast updated: 20 days ago
Promoted
New!
Infrastructure Engineer – CDK for Terraform (TypeScript)
Trispoke managed servicesIndia
Infrastructure Engineer CDK for Terraform (TypeScript) Overview.CDK for Terraform (CDKTF) using TypeScript.You will leverage modern Infrastructure-as-Code (IaC) practices to create reusable Typ...Show moreLast updated: 3 hours ago
Promoted
New!
IT Infrastructure Engineer
ESP Global ServicesIndia
Title : L2 IT Infrastructure Engineer – Pune.We are seeking an experienced Endpoint Management Engineer to join our dynamic team.
In this role, you will have the opportunity to manage and support ent...Show moreLast updated: 3 hours ago
Promoted
Cloud Engineer - (CKA. Linux)
Quess SingaporeNagpur, IN
Working hours and public holidays as per Singapore.Proficient in implementation, management and troubleshoot AWS Technologies, ECS, EKS.
Proficient in Operating System such as AWS Linux, Red Hat Ent...Show moreLast updated: 19 days ago
Promoted
Principal Engineer, Software - Cloud Infrastructure [T500-20363]
ANSRnagpur, maharashtra, in
ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 10 days ago
Promoted
Sr Cloud Network Engineer
Siemens HealthineersIndia
Now’s our time to inspire the future of healthcare together.Siemens Healthineers is a leading global medical technology company with over 170 years of experience and 18,000 patents.More than 48,000...Show moreLast updated: 2 days ago
Promoted
New!
Infrastructure & Cloud Engineer
SmarTek21India
We are looking for experienced Infrastructure & Cloud Engineers to manage and support the core IT infrastructure, including virtualization platforms and cloud environments.This role is essential fo...Show moreLast updated: 3 hours ago
Promoted
Cloud Engineer
EXLNagpur, IN
We're looking for a highly skilled and experienced Cloud AI Engineer to join our dynamic team.In this role, you'll be instrumental in designing, developing, and deploying cutting-edge artificial in...Show moreLast updated: 3 days ago
Promoted
AWS Cloud Engineer
ProgliteNagpur, IN
Infrastructure & System Administration : .Deploy, manage, and optimize EC2 instances across dev, test, and production environments.
Perform system administration and troubleshooting for Linux and Wind...Show moreLast updated: 10 days ago
Promoted
Cloud Engineer
ValueMomentumnagpur, maharashtra, in
We are seeking a highly skilled.You will work closely with development, operations, and security teams to ensure continuous delivery, high availability, and optimal performance of our applications....Show moreLast updated: 10 days ago
Promoted
Lead Infrastructure Engineer
Peepal ConsultingIndia
Candidate should have a proven track record of supporting significant Network Infrastructures.The successful candidate for this role should have at least 2-4 years’ experience in Network Lead suppo...Show moreLast updated: 10 days ago
Promoted
L3 Infrastructure Engineer (VMWare, Server & Network)
Nextbridge IT SolutionsNagpur, IN
We're looking for a highly skilled .This isn't just a server, network, or virtualization role; it's a dynamic position for a seasoned professional who can handle all three.You will be a top-tier es...Show moreLast updated: 10 days ago
Promoted
Cloud Engineer Lead (AWS)
Datapel SystemsNagpur, IN
The Senior Cloud Engineer (AWS) will be responsible for developing, maintaining, optimising and supporting the cloud infrastructure that supports Datapel’s Warehouse Management System (WMS) and rel...Show moreLast updated: 21 days ago
Promoted
Cloud Engineer
DBiz.aiNagpur, IN
We are seeking a dynamic and skilled AWS Cloud & DevOps Engineer to design, implement, and maintain scalable, secure, and automated cloud environments on Amazon Web Services.The ideal candidate wil...Show moreLast updated: 10 days ago
Promoted
Cloud Network Engineer
IQVIANagpur, IN
Cloud Network Engineering, AWS / Azure, Firewall, Load Balancer.Required Skills and Experience.Skilled in provisioning and maintaining network systems such as Palo Alto Firewalls, F5 Load Balancers, ...Show moreLast updated: 22 days ago
Promoted
UCCE L3 Engineer
Servion Global SolutionsNagpur, IN
Supporting Experience on Cisco UCCE / UCCX / PCCE solutions & 3rd party Call recording platforms.Basic Cisco ICM / CCMP / CVP / CUIC & troubleshooting.
MACD creation knowledge in Cisco UCCE & IPT platform...Show moreLast updated: 21 days ago
Promoted
ACI Network Engineer
PamTen IncNagpur, IN
We are seeking numerous highly skilled Data Center Engineers to join our delivery team supporting Cisco initiatives.This role requires deep technical expertise across core data center technologies,...Show moreLast updated: 22 days ago
Promoted
IP Deployment Engineer
InfogainIndia
Come work for the company recently recognized by the Everest Group as the 2nd fastest growing engineering services company worldwide!.
Infogain is seeking a IP Deployment Engineer to join our dynami...Show moreLast updated: 2 days ago