High-Performance Computing (HPC) infrastructures provide users with dedicated compute resources to run computation-intensive workloads such as weather simulations, artificial intelligence (AI), and machine learning (ML). Each job submitted by a user may consist of multiple tasks that run concurrently on different nodes, often requiring shared access to intermediate or final data. To facilitate this, HPC systems typically use a Parallel File System (PFS) that allows data to be accessed across nodes. However, this same PFS is commonly shared among all users, meaning that multiple jobs access the storage system simultaneously. This shared usage can lead to I / O interference, where one user's job slows down due to competing I / O demands from other users, thereby affecting overall job execution time. To address this challenge, we are developing software that allows HPC infrastructure providers to provision isolated PFS instances for each user or job. This reduces interference by isolating I / O traffic. Additionally, we are designing our software to support dynamic performance scaling of PFS instances, integrate erasure-coded fault tolerance, and enable data tiering to object storage systems. If you are interested in contributing to this effort or would like to discuss it further, please reach out.
Key Responsibilities
Design, develop, and maintain high-performance software in Golang / C for system-level components.
Utilize advanced data structures and algorithms to solve complex system problems.
Analyze and debug system-level issues, ensuring efficient problem resolution.
Collaborate with cross-functional teams to architect scalable and robust software solutions.
Perform code reviews, mentor junior engineers, and contribute to continuous process improvement.
Required Skills and Qualifications
Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
3-6 years of professional experience in system software development.
Proficiency in Golang / C programming, with a strong understanding of object-oriented and low-level programming concepts.
Expertise in Linux operating system internals, including process management, memory management, and I / O subsystems.
Solid understanding of data structures, algorithms, and their application in system-level programming.
Good debugging skills, with experience using tools like GDB, strace, perf, and system logs.
Strong problem-solving and analytical thinking abilities.
Good communication and collaboration skills.
Why Join Us?
Work on innovative, high-impact projects in system software engineering.
Collaborate with a team of passionate and highly skilled professionals.
Enjoy a culture that values creativity, innovation, and personal growth.
Competitive salary and comprehensive benefits package.
Software Engineer • Pune, Maharashtra, India