Nix for HPC: the case of cudaPackages

Sergei K

Playlists: 'nixcon2023' videos starting here

We should briefly see (or recap) what effort it takes today to provision via Nix a typical stack that includes Pytorch and CUDA, see when Nix could still be the better choice, and when Nix may seem to hinder progress. We should try to argue that the obstacles we observe can be eliminated at the nixpkgs level

Nixpkgs and NixOS go a long way in stabilising a program's inherently random build and runtime behaviour. Meanwhile the scientific computing software, and the state of the art "AI" research code in particular, heavily rely on dynamic and "impure" deployment techniques. These include e.g. dynamic linkage using "dlopen", unenforced assumptions about paths, distribution of pre-built black-box binaries, largely facilitated by the use of dependency solvers, and more. Sometimes these "impurities" are necessary, as when using driver-aware libraries like OpenGL or CUDA. This seemingly makes it cheaper to relax the requirements to reproducibility set by Nix and embrace tools such as pip, conda, singularity (apptainer) and docker. We should briefly see (or recap) what effort it takes today to provision via Nix a typical stack that includes Pytorch and CUDA, see when Nix could still be the better choice, and when Nix may seem to hinder progress. We should try to argue that the obstacles we observe can be eliminated at the nixpkgs level.

P.S. This description is preliminary

Download

Embed

Share:

Tags