Talent.com
DGX Cloud Performance Engineer

DGX Cloud Performance Engineer

ConfidentialBengaluru / Bangalore, India
22 days ago
Job description

NVIDIA DGX™ Cloud is an end-to-end, scalable AI platform for developers, offering scalable capacity built on the latest NVIDIA architecture and co-engineered with the world's leading cloud service providers (CSPs).We are seeking highly skilled Parallel and Distributed Systems engineers to drive the performance analysis, optimization, and modeling to define the architecture and design of NVIDIA's DGX Cloud clusters.

The ideal candidate will have a deep understanding of the methodology to conduct end to end performance analysis of critical AI applications running on large scale parallel and distributed systems. Candidates will work closely with the cross functional teams to define DGX Cloud cluster architecture for different CSPs, optimize workloads running on these systems and develop the methodology that will drive the HW-SW codesign cycle to develop world class AI infrastructure at scale and make them more easily consumable by users (via improved scalability, reliability, cleaner abstractions, etc).

What you will be doing :

Develop benchmarks, end to end customer applications running at scale, instrumented for performance measurements, tracking, sampling, to measure and optimize performance of important applications and services

Construct carefully designed experiments to analyze, study and develop critical insights into performance bottlenecks, dependencies, from an end to end perspective

Develop ideas on how to improve the end to end system performance and usability by driving changes in the HW or SW (or both).

Collaborate with AI researchers, developers, and application service providers to understand internal developer and external customer pain points, requirements, project future needs and share best practice.

Develop the necessary modeling framework and the TCO (total cost of ownership) analysis to enable efficient exploration and sweep of the architecture and design space

Develop the methodology needed to drive the engineering analysis to Inform the architecture, design and roadmap of DGX Cloud

What we need to see :

Expertise in working with large scale parallel and distributed accelerator-based system systems

Expertise optimizing performance and AI workloads on large scale systems

Experience with performance modeling and benchmarking at scale

Strong background in Computer Architecture, Networking, Storage systems, Accelerators

Familiarity with popular AI frameworks (PyTorch, TensorFlow, JAX, Megatron-LM, Tensort-LLM, VLLM) among others

Experience with AI / ML models and workloads, in particular LLMs as well as an u nderstanding of DNNs and their use in emerging AI / ML applications and services

Bachelors / Masters in Engineering or equivalent experience (preferably, Electrical Engineering, Computer Engineering, or Computer Science)

10 years experience in the above areas

Proficiency in Python, C / C++

Expertise with at least one of public CSP infrastructure (GCP, AWS, Azure, OCI, …)

Ways to stand out from the crowd :

PhD in the relevant areas

Very high intellectual curiosity Confidence to dig in as needed Not afraid of confronting complexity Able to pick up new areas quickly

Proficiency in CUDA, XLA

Excellent interpersonal skills

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, autonomous and love a challenge, we want to hear from you.

Skills Required

Networking, Cuda, C, Python, Computer Architecture, Performance Analysis, Storage Systems

Create a job alert for this search

Performance Engineer • Bengaluru / Bangalore, India

Related jobs
  • Promoted
  • New!
Cloud Operations Engineer (DevOps)

Cloud Operations Engineer (DevOps)

Suronexhosur, tamil nadu, in
We’re expanding our team and looking for a skilled.Cloud Operations Engineer (DevOps).If you’re passionate about automation, infrastructure reliability, and modern cloud technologies, we’d love to ...Show moreLast updated: 17 hours ago
  • Promoted
GCP cloud engineer

GCP cloud engineer

ImpetusBengaluru, India
Job Descriptions for Big data or Cloud Engineer.We are looking for candidates with hands on experience in Big Data with GCP cloud. IT experience range is preferred.Able to effectively use GCP manage...Show moreLast updated: 30+ days ago
  • Promoted
Tookitaki Technologies - Cloud Engineer - AWS & Google Cloud Platform

Tookitaki Technologies - Cloud Engineer - AWS & Google Cloud Platform

TOOKITAKI TECHNOLOGIES PRIVATE LIMITEDBangalore
Description : Cloud Engineer Job Title : Cloud Engineer Location : Bangalore ...Show moreLast updated: 30+ days ago
  • Promoted
Lead Engineer - Cloud

Lead Engineer - Cloud

ConfidentialBengaluru / Bangalore
Design, develop and operate Infrastructure-as-code automation for Terraform, K8s, Snowflake, and while also executing customer tenant migrations in Analytics and Insights.Manage and optimize K8s cl...Show moreLast updated: 30+ days ago
  • Promoted
Senior Performance Engineer

Senior Performance Engineer

CoforgeBengaluru, Karnataka, India
Job Description : Performance Engineering Architect – Mainframe to GCP Cloud Transformation (Airlines Domain).We are seeking an accomplished Performance Engineering Architect to lead the performance...Show moreLast updated: 5 days ago
  • Promoted
Senior Aws Cloud Engineer (04-06 Years)

Senior Aws Cloud Engineer (04-06 Years)

CloudZeniaBengaluru, Republic Of India, IN
Please fill the below Google Form to get the interview process started : .Com / forms / d / e / 1FAIpQLScx23dmwM6VcPlLTbO0UiVaRWMgK_H0vgluUSYtRaIk0herxA / viewform. AWS Cloud Engineer (04-06 years).We’re hiring...Show moreLast updated: 9 days ago
  • Promoted
Cloud Platform Engineer (FinOps)

Cloud Platform Engineer (FinOps)

ConfidentialBengaluru / Bangalore, India
Continental's digital capabilities are growing every day.Our global Cloud Services team is at the forefront of this transformation, establishing scalable, secure, and cost-effective cloud platforms...Show moreLast updated: 16 days ago
  • Promoted
Platform Engineer (GCP Cloud)

Platform Engineer (GCP Cloud)

Enterprise Minds, IncBengaluru, Karnataka, India
We’re Hiring – Senior DevOps Engineer (Platform Engineering) | Apna.Senior DevOps Engineer (SDE-2 – Platform Engineering). You’ll help modernize our platform, implement self-healing systems, and dri...Show moreLast updated: 4 days ago
  • Promoted
Senior Cloud Operations Engineer 1

Senior Cloud Operations Engineer 1

ConfidentialBengaluru / Bangalore
PowerShell, Python, Javascript, etc), that automate tasks to improve efficiency, productivity and accuracy.Experience managing a monitoring system that tracks system resources, services, and connec...Show moreLast updated: 30+ days ago
  • Promoted
Cloud Engineer

Cloud Engineer

Intellistaff Services Pvt. LtdBangalore Urban, Karnataka, India
Cloud Engineer – Google Cloud Platform.Location-Hyderabad / Chennai / Bangalore / Pune / Gurugram.Deploy, enhance, and maintain GCP landing zone and associated infrastructure, ensuring compliance, security...Show moreLast updated: 12 days ago
  • Promoted
Senior GCP cloud engineer

Senior GCP cloud engineer

ConfidentialBengaluru / Bangalore
Cloud Infrastructure Design & Management.Design, implement, and maintain GCP infrastructure using Terraform and other automation tools. Deploy and manage GCP services like Cloud Storage, Cloud SQL, ...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Google Cloud DevOps Engineer

Google Cloud DevOps Engineer

Live ConnectionsBengaluru, Republic Of India, IN
Experience with Google Cloud Platform (GCP) and its core services, GCP cloud native tools.Proficiency in PostgreSQL, Cloud SQL and Strong Relational Database concepts. Hands-on experience with Linux...Show moreLast updated: 19 hours ago
  • Promoted
Cloud Platform Engineer

Cloud Platform Engineer

Tata Consultancy ServicesBengaluru, Republic Of India, IN
TCS invites application for Walkin interview @TCS Bangalore.Mode : Physical Walkin drive (face to face).Venue : Bangalore-Tata Consultancy Services Limited Bangalore -Brigade Bhuwalka Icon ITPL Main ...Show moreLast updated: 30+ days ago
  • Promoted
Cloud Engineer- Observability

Cloud Engineer- Observability

SmarshBengaluru, Republic Of India, IN
Smarsh empowers its customers to manage risk and unleash intelligence in their digital communications.Our growing community of over 6500 organizations in regulated industries counts on Smarsh every...Show moreLast updated: 30+ days ago
  • Promoted
Performance Engineer

Performance Engineer

Tata Consultancy ServicesBengaluru, Karnataka, India
Good experience using Performance Test tool LoadRunner and understanding of APM tools like AppDynamics / Dynatrace / New Relic, etc. Responsibilities Should have ability to work independently in Requir...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Cloud Presales Engineer

Cloud Presales Engineer

ConfidentialBengaluru / Bangalore, India
Work closely with the sales team to understand customer requirements and provide technical solutions using cloud services. Conduct product demonstrations and presentations to showcase the capabiliti...Show moreLast updated: 15 hours ago
  • Promoted
GCP Cloud Engineer

GCP Cloud Engineer

ConfidentialBengaluru / Bangalore
Srs Business Solutions India is looking for an experienced.We need a professional with extensive experience in building modern cloud applications on Google Cloud Platform and a strong foundation in...Show moreLast updated: 30+ days ago
  • Promoted
Principal Cloud Platform Engineer

Principal Cloud Platform Engineer

ConfidentialBengaluru / Bangalore
We are Progress (Nasdaq : PRGS) - a trusted provider of software that enables our customers to develop, deploy and manage responsible, AI powered applications and experiences with agility and ease.W...Show moreLast updated: 20 days ago