Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - implement nmap style model loading #1120

Open
michieal opened this issue Oct 20, 2023 · 1 comment
Open

Feature Request - implement nmap style model loading #1120

michieal opened this issue Oct 20, 2023 · 1 comment

Comments

@michieal
Copy link

I didn't see a template for this...

I saw that llama.cpp (github project) has a PR (at the time I saw it) that made use of the nmap? c++ command that allows the model files to be mapped as memory space (like a dedicated swap file) and I was wondering if TorchSharp has something like that or could work with something created in c# to do that?

Having it map the model file(s) memory space to the files on the drive, allows them to be loaded near instantaneously. Also, it decreases the required memory by gigabytes.

I'm not 100% sure about all of the finer details, but I do know that the difference between running the standard llama.cpp and the nmap llama.cpp with the same model files was like night and day. It made getting the program up and running take less than a minute, and didn't slow down the model in a noticeable way.

So, I was wondering if something like this could be implemented, as that would be awesome (and would work cross-platform too.)

@NiklasGustafsson
Copy link
Contributor

NiklasGustafsson commented Oct 20, 2023

I believe that torch.from_file already does this for tensors, but not for modules. In other words, the building blocks already exist, and we would use that instead of loading a state dict. It will require some mulling over in order to get it right, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants