Job Title : Senior / Engineer HPC / CAE
Location : Chennai Onsite - Hybrid
Notice Period : Joining within 30 Days only
We are looking for a skilled Senior Engineer HPC / CAE to join our team. In this role, you will be the backbone of our supercomputing platform, ensuring our engineers have the reliable, high-performance tools they need to design the next generation of vehicles.
You will work across the entire HPC stack, from bare-metal servers to the end-user applications, making a direct impact on our engineering and research efforts.
What You'll Do :
- Deploy, optimize, and support a wide range of HPC Administration, Computer-Aided Engineering (CAE) applications and workloads in a complex HPC environment featuring advanced CPU, GPU, and interconnect technologies.
- Maintain and enhance the user-facing tools (CLI and APIs) that streamline our customers' access to HPC infrastructure.
- Serve as a key technical resource for troubleshooting and resolving complex issues related to Linux systems, networking, storage, and mission-critical CAE applications.
- Collaborate closely with software developers and product engineers to ensure the seamless integration and scaling of their applications on our HPC platform.
- Develop and maintain clear documentation for software, procedures, and best practices to empower users and streamline operations.
- Stay current with the latest advancements in HPC and AI / ML technologies to recommend and implement continuous & Min 5+ years of hands-on experience in HPC administration, CAE support, Systems Engineering, or a related software engineering role.
- Strong proficiency in Linux system administration and troubleshooting.
- Demonstrated experience supporting CAE applications in a technical or scientific computing environment.
- Proficiency in a scripting language (e.g., Python, Bash) for automation and tooling.
Preferred (Bonus Points)
Experience with containerization technologies like Docker or Kubernetes.Familiarity with HPC job schedulers (e.g., Slurm, LSF) and parallel file systems (e.g., Lustre, GPFS).Hands-on experience with monitoring and alerting tools such as Prometheus, Grafana, Nagios, or Zabbix.Experience with GPU computing and relevant toolchains (e.g., CUDA).(ref : hirist.tech)