Job Title : AI Systems Engineer GPU / ROCm / CUDA | ML Frameworks Optimization
Location : : 3-6 [Mid-Senior]
Job Description :
We are looking for a passionate and experienced AI Systems Engineer to join our team to work on next-generation Machine Learning technologies and optimize performance across AMD GPU accelerators. This role involves low-level GPU programming, custom ML kernel development, and working with state-of-the-art inference engines.
Key Responsibilities :
- Develop and optimize custom Deep Learning GPU kernels using ROCm / CUDA or shader languages
- Support and enhance ML model deployment on Linux platforms
- Optimize performance of ROCm drivers and inferencing engines for AI / ML workloads
- Collaborate closely with internal hardware / software teams to support next-gen GPU accelerators
- Profile, debug, and improve performance of GPU kernels and AI model pipelines
- Contribute to designing and implementing new AI technologies and workflows
Required Skills & Qualifications :
BS / MS in Computer Science, Electrical Engineering, or equivalentStrong programming skills in C / C++, PythonSolid experience working with Linux CLI, bash scripting, or PowerShellHands-on experience with Python ML libraries such as PyTorch, TransformersKnowledge of writing high-performance ML kernels using Triton, JAX, or similarExperience with debugging tools like gdb, valgrind, and profiling tools such as nsys, rocprofFamiliarity with AI inferencing runtimes such as vllm, ollama, llama.cpp, or sglangUnderstanding of GPU and PC architecture, x86 / x64 instruction setsExperience developing with ROCm, CUDA, or shader programmingNice to Have :
Knowledge of x86 AssemblyContributions to open-source ML / DL performance librariesExposure to compiler optimization techniques for GPU codeWhat We Offer :
Work on cutting-edge GPU technologies and ML systemsExposure to performance-critical AI workloadsCollaborative and research-oriented environmentCompetitive compensation and career growth opportunities(ref : hirist.tech)