Talent.com
This job offer is not available in your country.
Principal Systems Performance Engineer

Principal Systems Performance Engineer

ConfidentialHyderabad / Secunderabad, Telangana, India
9 days ago
Job description

Our vision is to transform how the world uses information to enrich life for all .

Micron Technology is a world leader in innovating memory and storage solutions that accelerate the transformation of information into intelligence, inspiring the world to learn, communicate and advance faster than ever.

Principal / Senior Systems Performance Engineer

Micron Data Center and Client Workload Engineering in Hyderabad, India, is seeking a senior / principal engineer to join our dynamic team.

The successful candidate will primarily contribute to the ML development, ML DevOps, HBM program in the data center by analyzing how AI / ML workloads perform on the latest MU-HBM, Micron main memory, expansion memory and near memory (HBM / LP) solutions, conduct competitive analysis, showcase the benefits that workloads see with MU-HBM's capacity / bandwidth / thermals, contribute to marketing collateral, and extract AI / ML workload traces to help optimize future HBM designs.

Job Responsibilities :

The Job Responsibilities include but are not limited to the following :

  • Design, implement, and maintain scalable & reliable ML infrastructure and pipelines.
  • Collaborate with data scientists and ML engineers to deploy machine learning models into production environments.
  • Automate and optimize ML workflows, including data preprocessing, model training, evaluation, and deployment.
  • Monitor and manage the performance, reliability, and scalability of ML systems.
  • Troubleshoot and resolve issues related to ML infrastructure and deployments.
  • Implement and manage distributed training and inference solutions to enhance model performance and scalability.
  • Utilize DeepSpeed, TensorRT, vLLM for optimizing and accelerating AI inference and training processes.
  • Understand key care abouts when it comes to ML models such as : transformer architectures, precision, quantization, distillation, attention span & KV cache, MoE, etc.
  • Build workload memory access traces from AI models.
  • Study system balance ratios for DRAM to HBM in terms of capacity and bandwidth to understand and model TCO.
  • Study data movement between CPU, GPU and the associated memory subsystems (DDR, HBM) in heterogeneous system architectures via connectivity such as PCIe / NVLINK / Infinity Fabric to understand the bottlenecks in data movement for different workloads.
  • Develop an automated testing framework through scripting.
  • Customer engagements and conference presentations to showcase findings and develop whitepapers.

Requirements :

  • Strong programming skills in Python and familiarity with ML frameworks such as TensorFlow, PyTorch, or scikit-learn.
  • Experience in data preparation : cleaning, splitting, and transforming data for training, validation, and testing.
  • Proficiency in model training and development : creating and training machine learning models.
  • Expertise in model evaluation : testing models to assess their performance.
  • Skills in model deployment : launching server, live inference, batched inference
  • Experience with AI inference and distributed training techniques.
  • Strong foundation in GPU and CPU processor architecture
  • Familiarity with and knowledge of server system memory (DRAM)
  • Strong experience with benchmarking and performance analysis
  • Strong software development skills using leading scripting, programming languages and technologies (Python, CUDA, C, C++)
  • Familiarity with PCIe and NVLINK connectivity
  • Preferred Qualifications :

  • Experience in quickly building AI workflows : building pipelines and model workflows to design, deploy, and manage consistent model delivery.
  • Ability to easily deploy models anywhere : using managed endpoints to deploy models and workflows across accessible CPU and GPU machines.
  • Understanding of MLOps : the overarching concept covering the core tools, processes, and best practices for end-to-end machine learning system development and operations in production.
  • Knowledge of GenAIOps : extending MLOps to develop and operationalize generative AI solutions, including the management of and interaction with a foundation model.
  • Familiarity with LLMOps : focused specifically on developing and productionizing LLM-based solutions.
  • Experience with RAGOps : focusing on the delivery and operation of RAGs, considered the ultimate reference architecture for generative AI and LLMs.
  • Data management : collect, ingest, store, process, and label data for training and evaluation. Configure role-based access control dataset search, browsing, and exploration data provenance tracking, data logging, dataset versioning, metadata indexing, data quality validation, dataset cards, and dashboards for data visualization.
  • Workflow and pipeline management : work with cloud resources or a local workstation connect data preparation, model training, model evaluation, model optimization, and model deployment steps into an end-to-end automated and scalable workflow combining data and compute.
  • Model management : train, evaluate, and optimize models for production store and version models along with their model cards in a centralized model registry assess model risks, and ensure compliance with standards.
  • Experiment management and observability : track and compare different machine learning model experiments, including changes in training data, models, and hyperparameters. Automatically search the space of possible model architectures and hyperparameters for a given model architecture analyze model performance during inference, monitor model inputs and outputs for concept drift.
  • Synthetic data management : extend data management with a new native generative AI capability. Generate synthetic training data through domain randomization to increase transfer learning capabilities. Declaratively define and generate edge cases to evaluate, validate, and certify model accuracy and robustness.
  • Embedding management : represent data samples of any modality as dense multi-dimensional embedding vectors generate, store, and version embeddings in a vector database. Visualize embeddings for improvised exploration. Find relevant contextual information through vector similarity search for RAGs.
  • Education :

  • Bachelor's or higher (with 12+ years of experience) in Computer Science or related field.
  • About Micron Technology, Inc.

    We are an industry leader in innovative memory and storage solutions transforming how the world uses information to enrich life for all . With a relentless focus on our customers, technology leadership, and manufacturing and operational excellence, Micron delivers a rich portfolio of high-performance DRAM, NAND, and NOR memory and storage products through our Micron® and Crucial® brands. Every day, the innovations that our people create fuel the data economy, enabling advances in artificial intelligence and 5G applications that unleash opportunities - from the data center to the intelligent edge and across the client and mobile user experience.

    To learn more, please visit micron.com / careers

    All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

    To request assistance with the application process and / or for reasonable accommodations,please contact

    Micron Prohibits the use of child labor and complies with all applicable laws, rules, regulations, and other international and industry labor standards.

    Micron does not charge candidates any recruitment fees or unlawfully collect any other payment from candidates as consideration for their employment with Micron.

    AI alert : Candidates are encouraged to use AI tools to enhance their resume and / or application materials. However, all information provided must be accurate and reflect the candidate's true skills and experiences. Misuse of AI to fabricate or misrepresent qualifications will result in immediate disqualification.

    Fraud alert : Micron advises job seekers to be cautious of unsolicited job offers and to verify the authenticity of any communication claiming to be from Micron by checking the official Micron careers website in the About Micron Technology, Inc.

    Skills Required

    C, Pytorch, Cuda, Tensorflow, Python, Pcie

    Create a job alert for this search

    Performance Engineer • Hyderabad / Secunderabad, Telangana, India

    Related jobs
    • Promoted
    Principal Engineer- Platform

    Principal Engineer- Platform

    Talentiserhyderabad, telangana, in
    Participate in design, development, test, and maintenance.Develop and maintain complex Linux systems software to support deployment, upgrade, and day-to-day operation operations for the NAS applian...Show moreLast updated: 4 days ago
    • Promoted
    Principal Engineer - Verification and Manager

    Principal Engineer - Verification and Manager

    Connectpro Management Consultants Private Limitedhyderabad, telangana, in
    Responsible for ownership and verification of high-speed serial interface features and protocols, with experience in.AMBA, DSP, DDR, and Ethernet IP cores. Develop robust, feature-optimized test ben...Show moreLast updated: 7 days ago
    • Promoted
    Pegasystems - Principal Engineer - Technical Support

    Pegasystems - Principal Engineer - Technical Support

    Pegasystems Worldwide India Pvt. Ltd.Hyderabad
    What You'll Do At Pega : - Build strong client relationships by resolving issues swiftly and professionally, setting clear priorities and timelines.Lead resoluti...Show moreLast updated: 23 days ago
    • Promoted
    System Engineer

    System Engineer

    Netsmore TechnologiesHyderabad, IN
    Systems Engineer – Level 3 (Internal).Mandatory skills : AWS cloud infrastructure + OKTA administration.The L3 Systems Engineer role is more engineering-focused than traditional system admin roles.I...Show moreLast updated: 4 days ago
    • Promoted
    Senior Principal Engineer

    Senior Principal Engineer

    Cornerstone On DemandHyderabad
    Cornerstone OnDemand is looking for a Digital Solutions Engineers who will closely work with the Digital Workplace Tools (DWT) Lead to deliver solutions to enhanced digital workspace experience aro...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Systems Engineer

    Senior Systems Engineer

    antal international networkHyderabad
    JOB SUMMARY : The Senior Systems Engineer is a fully qualified practitioner, providing advanced technical leadership and systems engineer...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Engineer - Distributed Systems

    Principal Engineer - Distributed Systems

    Orange SharkHyderabad
    Job Description : SDE3 / Principal Engineer Experience : 6 to 10 Years Notice Period : Immediate to 30 Days ...Show moreLast updated: 30+ days ago
    • Promoted
    Pegasystems - Principal Applications Engineer - Pega Platform

    Pegasystems - Principal Applications Engineer - Pega Platform

    Pegasystems Worldwide India Pvt. Ltd.Hyderabad
    About the job : Meet Our Team : As a member of Digital COE group, you will be working in one of the most innovative group that...Show moreLast updated: 30+ days ago
    • Promoted
    Cubic Transportation Systems - Senior System Engineer - Project Management

    Cubic Transportation Systems - Senior System Engineer - Project Management

    Cubic Transportation Systems India Pvt. Ltd.Hyderabad
    Exp : 5 - 10 Years Location : Hyderabad Designation : Senior System Engineer J...Show moreLast updated: 30+ days ago
    • Promoted
    Associate Systems Engineer

    Associate Systems Engineer

    SapaadHyderabad, IN
    Sapaad, has seen tremendous success in the last decade, with thousands of customers.Driven by a team of passionate developers and designers, Sapaad is constantly. Singapore, with offices across five...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Engineer - Distributed Systems

    Principal Engineer - Distributed Systems

    NetSysConHyderabad
    Key Responsibilities : - Lead the design and implementation of complex, scalable, and high-performance systems and architectures. Tackle challenging engineering proble...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Platform Engineer

    Principal Platform Engineer

    PeoplefyHyderabad, India
    Greetings from Peoplefy Info solutions ! We are recruiting for Principal engineer role for one of our client in Hyderabad location. Skills Required - Platform Engineering, Python, Terraform, Kub...Show moreLast updated: 5 days ago
    • Promoted
    • New!
    Principal Engineer

    Principal Engineer

    SWAI TECHNOLOGIES PRIVATE LIMITEDHyderabad
    Key Outcomes / Objectives : - Define and drive the long-term technical vision and strategy that aligns with business objectives. Lead research and innovation eff...Show moreLast updated: 12 hours ago
    • Promoted
    Principal Engineer - System Design

    Principal Engineer - System Design

    Zyoin GroupHyderabad
    Position : Principal Engineer.Location : Hyderabad.Experience : 10+ Years.About the role : ...Show moreLast updated: 17 days ago
    • Promoted
    Principal Engineer, Agent Platform

    Principal Engineer, Agent Platform

    MightyBotHyderabad, IN
    Join our team as a Principal Engineer, where we're focused on graduating AI from interesting demos to indispensable products. You will architect and build the core distributed systems that provide t...Show moreLast updated: 20 days ago
    • Promoted
    Principal Professional Services Engineer

    Principal Professional Services Engineer

    Palo Alto Networkshyderabad, telangana, in
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 7 days ago
    • Promoted
    Senior Infrastructure Systems Engineer

    Senior Infrastructure Systems Engineer

    Guardant Health Indiahyderabad, telangana, in
    Guardant Health is seeking a Senior Infrastructure Systems Engineer to develop, manage and support our enterprise virtualization and Linux infrastructure. This role focuses on VMware, Linux systems,...Show moreLast updated: 26 days ago
    • Promoted
    Principal Platform Engineer - SaaS Technologies

    Principal Platform Engineer - SaaS Technologies

    DashhireHyderabad
    Responsibilities : - Lead the design and implementation of our Cloud Management Platform, ensuring its scalability, reliability, and performance. Collaborate with cross-function...Show moreLast updated: 30+ days ago
    • Promoted
    eBPF Systems Engineer (Core Agent Team)

    eBPF Systems Engineer (Core Agent Team)

    Alma SecurityHyderabad, IN
    The ideal candidate will help build, maintain, and troubleshoot, the company's rapidly expanding infrastructure.They will work alongside other engineers to ensure highest levels of performance and ...Show moreLast updated: 30+ days ago
    • Promoted
    Linux Performance Developer - Kernel

    Linux Performance Developer - Kernel

    SEMI LEAFHyderabad
    Responsibilities : - Analyze, measure, and optimize system performance across the full Linux stackkernel, drivers, user-space services, and applic...Show moreLast updated: 30+ days ago