What you ’ll do
You will take over the management of our existing quant compute / storage infrastructure and be responsible for the design, automation, and management of future infrastructure solutions to support our portfolio managers as they leverage our large-scale research environment. Specifically, you will :
- Design and manage a variety of quant research platforms
- Research and experiment with new centralized multi-tenant clustered platforms
- Troubleshoot complex storage, OS, and networking issues
- Build tools to improve the performance and monitoring of HPC clusters
- Riveted focus on efficiency through automation of provisioning and housekeeping
What’s required
5+ years engineering, deployment, and management of large-scale (250+ servers) Linux infrastructure experience at a financial services firm3+ years of experience in highly parallelized environments with storage deployments greater than 1PBKnowledge of distributed file systems (Lustre, GPFS) used in large multi-tenant clustered compute environmentsExperience with wide variety of filesystems such as ZFS, XFS, ext4,Extensive experience with Linux (RHEL / CentOS / Ubuntu) configuration, troubleshooting and performance tuningModern x86 / AMD-based computing platforms, expert knowledge of use cases for scaling for transaction speed versus distributed parallel processing10GE / 25GE client network cards, drivers, firmware and kernel bypass capabilitiesKnowledge of Linux software RAIDGood understanding of hardware RAID, erasure encoding, etcExperience in automation (Terraform, Ansible, Puppet, Chef)Python / Shell scriptingProtocol knowledge of NFS, CIFS, SMBUnderstanding of complex ACLs and nuances between protocolsChoosing the right filesystem for the right use caseDeploying and leading hybrid and / or cloud-native high performance compute environments in AWSExperience with AWS storage including s3, storage gateway, EBSKnowledge of HPC (AI / ML) offerings in public cloud providers (AWS / GCP preferred)Experience with HPC job scheduling platforms such as Airflow, Condor, Slurm, SGE, etcExperience NFS v4.1Various HPE / Dell / SupermicroExperience with InfiniBand, NFS over RDMA, RoCECommitment to the highest ethical standardsAbout Point72
Point72 is a leading global alternative investment firm led by Steven A. Cohen. Building on more than 30 years of investing experience, Point72 seeks to deliver superior returns for its investors through fundamental and systematic investing strategies across asset classes and geographies. We aim to attract and retain the industry’s brightest talent by cultivating an investor-led culture and committing to our people’s long-term growth. t .