Talent.com
AI Software System Engineer- HPC Infrastructure Engineering

AI Software System Engineer- HPC Infrastructure Engineering

ConfidentialHyderabad / Secunderabad, Telangana, India
6 days ago
Job description

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

Job Title : AI Systems Engineer

Job Overview : We are seeking an AI Systems Engineer to join our AMD IT compute platforms engineering team. The AI Systems Engineer is responsible for the design, development, and administration of High-Performance Computing (HPC) infrastructure, GPU clusters, and AI workload schedulers.

ABOUT YOU :

You have a passion for learning. You are passionate about the field of large-scale distributed computing in AI and HPC workloads. You take responsibility for end-to-end outcomes of your efforts. You want to build scalable and highly performant HPC / AI / Data services with AMD hardware, software, people and processes. You have a curiosity to learn and improve scalable HPC systems. You have significant experience in working across a globally distributed organization.

Responsibilities :

  • Develop, implement, and maintain GPU-based clusters, ensuring optimal performance
  • Administer ML / AI platforms - Distributed ML services, LLMs and AI inferencing, by managing deployments, resource allocation, monitoring, and security.
  • Automate system provisioning and Cluster management end to end
  • Collaborate with cross-functional teams to address AI infrastructure requirements, support AI-related projects, and provide technical expertise .
  • Monitor and evaluate the performance of AI systems and clusters, ensuring that they adhere to industry best practices and meet company standards.
  • Use AI / ML to continuously improve internal processes and tools that are used in end-to-end delivery of your services in this team

Experience and Qualifications :

  • 5 + years of experience in developing python based AI apps and UI
  • 5 + years of experience in HPC infrastructure engineering for AI / HPC domain
  • 5+ years of experience in SLURM and Kubernetes management
  • 2+ years of experience managing GPU clusters optimizing GPU-based services / tools / software
  • Experience in creating web services with HPC backend (like AI)
  • Proficiency in RoCEv2, K8s, KVM, Ubuntu, Python, Shell, GPU drivers, and Cluster interconnect with 400G networking.
  • Demonstrated experience with AI workload schedulers and allocation optimization.
  • Automation / monitoring tool - ansible / saltstack , terraform , Prometheus, grafana
  • Strong organizational, problem-solving, and troubleshooting skills, with the ability to manage multiple projects simultaneously.
  • Excellent verbal and written communication skills, with the ability to collaborate effectively with team members and stakeholders at all levels of the organization.
  • Location : Hyderabad

    #LI-SK4

    Benefits offered are described : .

    AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

    Skills Required

    Grafana, Ubuntu, K8S, Kvm, Shell, Ansible, Prometheus, Python, Kubernetes, Saltstack, Terraform

    Create a job alert for this search

    Software Engineer Ai • Hyderabad / Secunderabad, Telangana, India

    Related jobs
    • Promoted
    Principal AI System Architect

    Principal AI System Architect

    BPMLinksHyderabad, Republic Of India, IN
    Information Technology, or related field.LLMs, embeddings, and agent-based architectures.AWS, Azure, GCP), with deep understanding of. Neo4j, Pinecone, PostgreSQL (Aurora).FastAPI, Flask, Django, La...Show moreLast updated: 21 days ago
    • Promoted
    Sr Systems Engineer Linux – AI Infrastructure

    Sr Systems Engineer Linux – AI Infrastructure

    DC Tech ConsultingHyderabad, IN
    Position : Senior Linux Administrator – AI / ML Infrastructure.We are seeking a highly skilled Senior Linux Administrator to join our team, focusing on the implementation and management of on-premises...Show moreLast updated: 30+ days ago
    • Promoted
    AI Software System Engineer

    AI Software System Engineer

    ConfidentialHyderabad / Secunderabad, Telangana, India
    WHAT YOU DO AT AMD CHANGES EVERYTHING.We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that ...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Software - Agentic AI [T500-20454]

    Engineer, Software - Agentic AI [T500-20454]

    TMUS Global Solutionshyderabad, telangana, in
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 27 days ago
    • Promoted
    AI Architect

    AI Architect

    Persistent Systemssecunderabad, telangana, in
    We are seeking a highly skilled and innovative Agentic AI Developer to architect and implement intelligent, autonomous AI workflows and backend systems. This role is ideal for someone passionate abo...Show moreLast updated: 16 days ago
    • Promoted
    Senior AI-Integrated Software Engineer

    Senior AI-Integrated Software Engineer

    Programmers.iohyderabad, telangana, in
    Senior AI-Integrated Software Engineer (.Remote until office reopens, Work from Home.We are looking for a dynamic and innovative. The ideal candidate will bring hands-on experience in AI-assisted de...Show moreLast updated: 10 days ago
    • Promoted
    Autonomous AI Systems Engineer

    Autonomous AI Systems Engineer

    Intellectt IncHyderabad, Republic Of India, IN
    We are seeking an experienced Agentic AI Engineer to design and implement intelligent systems leveraging autonomous agents, LLMs, and advanced Python frameworks. The ideal candidate will have hands-...Show moreLast updated: 2 days ago
    • Promoted
    Generative AI Systems Architect

    Generative AI Systems Architect

    HCLTechHyderabad, Republic Of India, IN
    Chennai, Hyderabad, Bangalore, Noida.As a Technical Architect, you will be responsible for designing and implementing robust, scalable, and high-performance AI and Generative AI solutions.You will ...Show moreLast updated: 30+ days ago
    • Promoted
    AI Systems Engineer

    AI Systems Engineer

    ConfidentialHyderabad / Secunderabad, Telangana, India
    AI Systems Engineer GPU / ROCm / CUDA | ML Frameworks Optimization.We are looking for a passionate and experienced AI Systems Engineer to join our team to work on next-generation Machine Learning techn...Show moreLast updated: 6 days ago
    • Promoted
    Autonomous AI Systems Engineer

    Autonomous AI Systems Engineer

    InterScripts, Inc.Hyderabad, Republic Of India, IN
    We are seeking an experienced Agentic AI Developer to design and deploy advanced AI systems that reason, plan, and act autonomously. The ideal candidate will have hands-on experience integrating lea...Show moreLast updated: 23 days ago
    • Promoted
    AI and ML Systems Engineer

    AI and ML Systems Engineer

    Veritis Group IncHyderabad, Republic Of India, IN
    AI-driven enterprise solutions.Our mission is to leverage artificial intelligence to enable business transformation through intelligent, scalable, and secure systems. This is a key role where innova...Show moreLast updated: 15 days ago
    • Promoted
    Lead Software Engineer, AI Solutions

    Lead Software Engineer, AI Solutions

    Programmers.ioHyderabad, Republic Of India, IN
    Senior AI-Integrated Software Engineer (.Remote until office reopens, Work from Home.We are looking for a dynamic and innovative. The ideal candidate will bring hands-on experience in AI-assisted de...Show moreLast updated: 11 days ago
    • Promoted
    Agentic Ai Engineer

    Agentic Ai Engineer

    Intellectt IncHyderabad, Republic Of India, IN
    We are seeking an experienced Agentic AI Engineer to design and implement intelligent systems leveraging autonomous agents, LLMs, and advanced Python frameworks. The ideal candidate will have hands-...Show moreLast updated: 2 days ago
    • Promoted
    Agentic AI Engineer

    Agentic AI Engineer

    Intellectt IncHyderabad, Telangana, India
    We are seeking an experienced Agentic AI Engineer to design and implement intelligent systems leveraging autonomous agents, LLMs, and advanced Python frameworks. The ideal candidate will have hands-...Show moreLast updated: 1 day ago
    • Promoted
    AI Systems Engineer

    AI Systems Engineer

    MindlabsHyderabad, Republic Of India, IN
    Mindlabs is a fast-growing deeptech startup building intelligent, reliable infrastructure for the physical world.We specialize in real-time sensing systems for cold chain, logistics, and industrial...Show moreLast updated: 13 days ago
    • Promoted
    Generative AI Systems Engineer

    Generative AI Systems Engineer

    Onni Group of CompaniesHyderabad, Republic Of India, IN
    Onni Group is seeking a highly skilled and innovative.This role focuses on designing, developing, and deploying.The ideal candidate will have hands-on experience in. LLM fine-tuning, retrieval-augme...Show moreLast updated: 23 days ago
    • Promoted
    AI Infrastructure Engineer

    AI Infrastructure Engineer

    ValueMomentumHyderabad, Republic Of India, IN
    Evaluate and source appropriate cloud infrastructure solutions for machine learning needs, ensuring cost-effectiveness and scalability based on project requirements. Automate and manage the deployme...Show moreLast updated: 30+ days ago
    • Promoted
    Senior AI Systems Engineer

    Senior AI Systems Engineer

    FoodsmartHyderabad, Republic Of India, IN
    Foodsmart is the leading telenutrition and foodcare solution, backed by a robust network of Registered Dietitians.Our platform is designed to foster healthier food choices, drive lasting behavior c...Show moreLast updated: 9 days ago