This job offer is not available in your country.

Principal Systems Performance Engineer

ConfidentialHyderabad / Secunderabad, Telangana, India

9 days ago

Job description

Our vision is to transform how the world uses information to enrich life for all .

Micron Technology is a world leader in innovating memory and storage solutions that accelerate the transformation of information into intelligence, inspiring the world to learn, communicate and advance faster than ever.

Principal / Senior Systems Performance Engineer

Micron Data Center and Client Workload Engineering in Hyderabad, India, is seeking a senior / principal engineer to join our dynamic team.

The successful candidate will primarily contribute to the ML development, ML DevOps, HBM program in the data center by analyzing how AI / ML workloads perform on the latest MU-HBM, Micron main memory, expansion memory and near memory (HBM / LP) solutions, conduct competitive analysis, showcase the benefits that workloads see with MU-HBM's capacity / bandwidth / thermals, contribute to marketing collateral, and extract AI / ML workload traces to help optimize future HBM designs.

Job Responsibilities :

The Job Responsibilities include but are not limited to the following :

Design, implement, and maintain scalable & reliable ML infrastructure and pipelines.
Collaborate with data scientists and ML engineers to deploy machine learning models into production environments.
Automate and optimize ML workflows, including data preprocessing, model training, evaluation, and deployment.
Monitor and manage the performance, reliability, and scalability of ML systems.
Troubleshoot and resolve issues related to ML infrastructure and deployments.
Implement and manage distributed training and inference solutions to enhance model performance and scalability.
Utilize DeepSpeed, TensorRT, vLLM for optimizing and accelerating AI inference and training processes.
Understand key care abouts when it comes to ML models such as : transformer architectures, precision, quantization, distillation, attention span & KV cache, MoE, etc.
Build workload memory access traces from AI models.
Study system balance ratios for DRAM to HBM in terms of capacity and bandwidth to understand and model TCO.
Study data movement between CPU, GPU and the associated memory subsystems (DDR, HBM) in heterogeneous system architectures via connectivity such as PCIe / NVLINK / Infinity Fabric to understand the bottlenecks in data movement for different workloads.
Develop an automated testing framework through scripting.
Customer engagements and conference presentations to showcase findings and develop whitepapers.

Requirements :

Strong programming skills in Python and familiarity with ML frameworks such as TensorFlow, PyTorch, or scikit-learn.

Experience in data preparation : cleaning, splitting, and transforming data for training, validation, and testing.

Proficiency in model training and development : creating and training machine learning models.

Expertise in model evaluation : testing models to assess their performance.

Skills in model deployment : launching server, live inference, batched inference

Experience with AI inference and distributed training techniques.

Strong foundation in GPU and CPU processor architecture

Familiarity with and knowledge of server system memory (DRAM)

Strong experience with benchmarking and performance analysis

Strong software development skills using leading scripting, programming languages and technologies (Python, CUDA, C, C++)

Familiarity with PCIe and NVLINK connectivity

Preferred Qualifications :

Experience in quickly building AI workflows : building pipelines and model workflows to design, deploy, and manage consistent model delivery.

Ability to easily deploy models anywhere : using managed endpoints to deploy models and workflows across accessible CPU and GPU machines.

Understanding of MLOps : the overarching concept covering the core tools, processes, and best practices for end-to-end machine learning system development and operations in production.

Knowledge of GenAIOps : extending MLOps to develop and operationalize generative AI solutions, including the management of and interaction with a foundation model.

Familiarity with LLMOps : focused specifically on developing and productionizing LLM-based solutions.

Experience with RAGOps : focusing on the delivery and operation of RAGs, considered the ultimate reference architecture for generative AI and LLMs.

Data management : collect, ingest, store, process, and label data for training and evaluation. Configure role-based access control dataset search, browsing, and exploration data provenance tracking, data logging, dataset versioning, metadata indexing, data quality validation, dataset cards, and dashboards for data visualization.

Workflow and pipeline management : work with cloud resources or a local workstation connect data preparation, model training, model evaluation, model optimization, and model deployment steps into an end-to-end automated and scalable workflow combining data and compute.

Model management : train, evaluate, and optimize models for production store and version models along with their model cards in a centralized model registry assess model risks, and ensure compliance with standards.

Experiment management and observability : track and compare different machine learning model experiments, including changes in training data, models, and hyperparameters. Automatically search the space of possible model architectures and hyperparameters for a given model architecture analyze model performance during inference, monitor model inputs and outputs for concept drift.

Synthetic data management : extend data management with a new native generative AI capability. Generate synthetic training data through domain randomization to increase transfer learning capabilities. Declaratively define and generate edge cases to evaluate, validate, and certify model accuracy and robustness.

Embedding management : represent data samples of any modality as dense multi-dimensional embedding vectors generate, store, and version embeddings in a vector database. Visualize embeddings for improvised exploration. Find relevant contextual information through vector similarity search for RAGs.

Education :

Bachelor's or higher (with 12+ years of experience) in Computer Science or related field.

About Micron Technology, Inc.

We are an industry leader in innovative memory and storage solutions transforming how the world uses information to enrich life for all . With a relentless focus on our customers, technology leadership, and manufacturing and operational excellence, Micron delivers a rich portfolio of high-performance DRAM, NAND, and NOR memory and storage products through our Micron® and Crucial® brands. Every day, the innovations that our people create fuel the data economy, enabling advances in artificial intelligence and 5G applications that unleash opportunities - from the data center to the intelligent edge and across the client and mobile user experience.

To learn more, please visit micron.com / careers

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

To request assistance with the application process and / or for reasonable accommodations,please contact

Micron Prohibits the use of child labor and complies with all applicable laws, rules, regulations, and other international and industry labor standards.

Micron does not charge candidates any recruitment fees or unlawfully collect any other payment from candidates as consideration for their employment with Micron.

AI alert : Candidates are encouraged to use AI tools to enhance their resume and / or application materials. However, all information provided must be accurate and reflect the candidate's true skills and experiences. Misuse of AI to fabricate or misrepresent qualifications will result in immediate disqualification.

Fraud alert : Micron advises job seekers to be cautious of unsolicited job offers and to verify the authenticity of any communication claiming to be from Micron by checking the official Micron careers website in the About Micron Technology, Inc.

Skills Required

C, Pytorch, Cuda, Tensorflow, Python, Pcie

Create a job alert for this search

Performance Engineer • Hyderabad / Secunderabad, Telangana, India

Related jobs

Promoted

Principal Engineer- Platform

Talentiserhyderabad, telangana, in

Participate in design, development, test, and maintenance.Develop and maintain complex Linux systems software to support deployment, upgrade, and day-to-day operation operations for the NAS applian...Show moreLast updated: 4 days ago

Promoted

Principal Engineer - Verification and Manager

Connectpro Management Consultants Private Limitedhyderabad, telangana, in

Responsible for ownership and verification of high-speed serial interface features and protocols, with experience in.AMBA, DSP, DDR, and Ethernet IP cores. Develop robust, feature-optimized test ben...Show moreLast updated: 7 days ago

Promoted

Pegasystems - Principal Engineer - Technical Support

Pegasystems Worldwide India Pvt. Ltd.Hyderabad

What You'll Do At Pega : - Build strong client relationships by resolving issues swiftly and professionally, setting clear priorities and timelines.Lead resoluti...Show moreLast updated: 23 days ago

Promoted

System Engineer

Netsmore TechnologiesHyderabad, IN

Systems Engineer – Level 3 (Internal).Mandatory skills : AWS cloud infrastructure + OKTA administration.The L3 Systems Engineer role is more engineering-focused than traditional system admin roles.I...Show moreLast updated: 4 days ago

Promoted

Senior Principal Engineer

Cornerstone On DemandHyderabad

Cornerstone OnDemand is looking for a Digital Solutions Engineers who will closely work with the Digital Workplace Tools (DWT) Lead to deliver solutions to enhanced digital workspace experience aro...Show moreLast updated: 30+ days ago

Promoted

Senior Systems Engineer

antal international networkHyderabad

JOB SUMMARY : The Senior Systems Engineer is a fully qualified practitioner, providing advanced technical leadership and systems engineer...Show moreLast updated: 30+ days ago

Promoted

Principal Engineer - Distributed Systems

Orange SharkHyderabad

Job Description : SDE3 / Principal Engineer Experience : 6 to 10 Years Notice Period : Immediate to 30 Days ...Show moreLast updated: 30+ days ago

Promoted

Pegasystems - Principal Applications Engineer - Pega Platform

Pegasystems Worldwide India Pvt. Ltd.Hyderabad

About the job : Meet Our Team : As a member of Digital COE group, you will be working in one of the most innovative group that...Show moreLast updated: 30+ days ago

Promoted

Cubic Transportation Systems - Senior System Engineer - Project Management

Cubic Transportation Systems India Pvt. Ltd.Hyderabad

Exp : 5 - 10 Years Location : Hyderabad Designation : Senior System Engineer J...Show moreLast updated: 30+ days ago

Promoted

Associate Systems Engineer

SapaadHyderabad, IN

Sapaad, has seen tremendous success in the last decade, with thousands of customers.Driven by a team of passionate developers and designers, Sapaad is constantly. Singapore, with offices across five...Show moreLast updated: 30+ days ago

Promoted

Principal Engineer - Distributed Systems

NetSysConHyderabad

Key Responsibilities : - Lead the design and implementation of complex, scalable, and high-performance systems and architectures. Tackle challenging engineering proble...Show moreLast updated: 30+ days ago

Promoted

Principal Platform Engineer

PeoplefyHyderabad, India

Greetings from Peoplefy Info solutions ! We are recruiting for Principal engineer role for one of our client in Hyderabad location. Skills Required - Platform Engineering, Python, Terraform, Kub...Show moreLast updated: 5 days ago

Promoted
New!

Principal Engineer

SWAI TECHNOLOGIES PRIVATE LIMITEDHyderabad

Key Outcomes / Objectives : - Define and drive the long-term technical vision and strategy that aligns with business objectives. Lead research and innovation eff...Show moreLast updated: 12 hours ago

Promoted

Principal Engineer - System Design

Zyoin GroupHyderabad

Position : Principal Engineer.Location : Hyderabad.Experience : 10+ Years.About the role : ...Show moreLast updated: 17 days ago

Promoted

Principal Engineer, Agent Platform

MightyBotHyderabad, IN

Join our team as a Principal Engineer, where we're focused on graduating AI from interesting demos to indispensable products. You will architect and build the core distributed systems that provide t...Show moreLast updated: 20 days ago

Promoted

Principal Professional Services Engineer

Palo Alto Networkshyderabad, telangana, in

At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 7 days ago

Promoted

Senior Infrastructure Systems Engineer

Guardant Health Indiahyderabad, telangana, in

Guardant Health is seeking a Senior Infrastructure Systems Engineer to develop, manage and support our enterprise virtualization and Linux infrastructure. This role focuses on VMware, Linux systems,...Show moreLast updated: 26 days ago

Promoted

Principal Platform Engineer - SaaS Technologies

DashhireHyderabad

Responsibilities : - Lead the design and implementation of our Cloud Management Platform, ensuring its scalability, reliability, and performance. Collaborate with cross-function...Show moreLast updated: 30+ days ago

Promoted

eBPF Systems Engineer (Core Agent Team)

Alma SecurityHyderabad, IN

The ideal candidate will help build, maintain, and troubleshoot, the company's rapidly expanding infrastructure.They will work alongside other engineers to ensure highest levels of performance and ...Show moreLast updated: 30+ days ago

Promoted

Linux Performance Developer - Kernel

SEMI LEAFHyderabad

Responsibilities : - Analyze, measure, and optimize system performance across the full Linux stackkernel, drivers, user-space services, and applic...Show moreLast updated: 30+ days ago