-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to load bfloat (float16) weight into torchsharp model #1204
Comments
Storing using binary
|
|
perfect, let me try it! Thanks @lintao185 UpdateHey @lintao185 , I tried your solution, and it seems that there're two problems? The first problem is in python code, it seems that the tensor will be converted to The second problem is in |
Yes, indeed, you could change it to save_tensor_to_binary(tensor, binary_file). It's worth noting that the conversion to double was initially intended for enhanced compatibility. As an alternative, you could experiment with loading the tensor into TorchSharp and subsequently saving a version of the parameters using native APIs officially offered by TorchSharp. |
Do you have the .cktp file and want to load it? Or do you want to convert it to a file that can be read by the built in methods in TorchSharp? If it's the former, I've just creted a tool to load ckpt files directly, though I haven't tested it on BF16 yet. If you want I can clean it up a bit and create a gist... and also test with bf16 :-) It relies on the |
@phizch The Below is the step of what I want to do. Essentially, the reason of why I want to load directly from
Also, here's the link to the loading function I currently used to load model weight. It's modified based on @lintao185's solution (Thanks BTW) and requires a separate conversion from llama 2 ckpt to torchsharp format, which I'd like to get rid of. And thanks ahead for any potential solution/help ! |
@LittleLittleCloud I haven't tried it myself, but have you tried loading using TorchSharp.PyBridge? You can install it using nuget: Install-Package TorchSharp.PyBridge And then you can load in the PyTorch weights without applying any conversions: model.load_py('/path/to/ckpt') (This should work with the regular pytorch checkpoints, not SafeTensors.) |
@shaltielshmid I just tried your package and your solution works like a charm. Here's the steps I take in case anyone also encounter the similar problem step 1in python, save the # use bf16 as default
# this is a requirement if you want to save llama weight in bfloat16
torch.set_default_dtype(torch.bfloat16)
# some code to load transformer
# save model state dict
with open(llama_torchsharp_weights_path, 'wb') as f:
torch.save(model.state_dict(keep_vars=False), f) step 2in csharp // create transformer
transformer.load_py(llama_torchsharp_weights_path) And the model size (consolidate.0.pth is the original ckpt from llama, llama-2-7b.pt is the model weight converted by And it seems that |
But, since the TorchSharp package includes the cuda binaries already, you can update the package even if you don't have CUDA 12.X on your machine. |
But you do need drivers that are CUDA 12 compatible. |
The current convert python script converts a tensor to np array before writing to file. However, since np array doesn't support the bf16 type, the convert script won't work if the model weight contains bf16 type.
My current workaround is to save model weight in f32 type and set bf16 as default weight before inferencing the model. However, the cost is nearly double the size of the exported model weight. So I wonder if it's possible to 1) add function to save bf16 weight in python convert script and 2) maybe add support to load from pytorch checkpoint file or hf
.safetensor
format to further facilitate the loading model weight process.The text was updated successfully, but these errors were encountered: