-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: ModelKind
with strum
+ derive_more
#335
refactor: ModelKind
with strum
+ derive_more
#335
Conversation
Code Metrics Report=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== Dockerfile 1 34 25 0 9 Happy 1 442 369 0 73 JSON 5 9 9 0 0 Python 21 741 622 21 98 TOML 15 390 353 1 36 ------------------------------------------------------------------------------- Jupyter Notebooks 1 0 0 0 0 |- Markdown 1 60 30 22 8 |- Python 1 96 87 1 8 (Total) 156 117 23 16 ------------------------------------------------------------------------------- Markdown 15 1026 0 758 268 |- BASH 6 205 192 0 13 |- Python 6 121 110 0 11 |- Rust 3 185 172 9 4 (Total) 1537 474 767 296 ------------------------------------------------------------------------------- Rust 84 27822 25504 356 1962 |- Markdown 40 419 0 407 12 (Total) 28241 25504 763 1974 =============================================================================== Total 144 30464 26882 1136 2446 =============================================================================== |
@polarathene, thanks for proposing these changes. Currently, this code is a bit obscure, so any changes to make it more idiomatic is much appreciated. I left some comments below. It also looks like the Clippy lints are failing like #334, can you please take a look?
Yes, that is a mistake. Can you please add an
I think some methods on ModelKind would make this possible.
Sounds good!
Yes, I think we could do that. Please feel free to draft something here!1 |
Sure I'll tackle the feedback later today, current tasks for this project are:
|
@EricLBuehler how should this work for I don't know how this type/feature works, so what would be the expectation if |
I think for these methods, we should return a vector of the resulting data or perhaps some sort of struct. That way, there is much more type-safe control, which really fits the spirit of this PR. For example, you can check if any are quantized with |
9d922cd
to
e7f3208
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context for review added, hope it's helpful 👍
mistralrs-bench/src/main.rs
Outdated
| ModelKind::XLoraGGML | ||
| ModelKind::XLoraGGUF | ||
) | ||
if use_flash_attn && !loader.get_kind().is_quantized() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now simpler and covers both adapter kinds as a bugfix.
NOTE: Speculative
kind will also trigger this now. I am not 100% sure if that's desired? This is due to the any()
call, as I assume if either target
or draft
are quantized it may not be compatible, but likewise this does not guarantee is_quantized == true
equates to Speculative
having both as true.
This comment was marked as resolved.
This comment was marked as resolved.
46da520
to
fb0989a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I think the logic is inverted on the checks for flash attention, but other than that, I am happy to accept a breaking change for the ModelKind
enum and look forward to the quantized pipeline refactor.
Good spotting! 😬
Just to confirm this concern I raised (as it'll be hidden in the resolved review comment):
Using
Yes, as I wasn't sure about breaking changes being approved I've been a bit cautious. A follow-up PR can remove the compat layers here when
I'll tackle that after my next PR (dependent upon changes here) is sorted out 👍 |
In my upcoming PR it was a little awkward/noisy with this new approach to check if the kind has an
As I'm not really familiar with these features yet, am I right to assume that future adapters may be supported that aren't "LoRA" specific? I am new to this domain, so I only have a surface understanding that LoRA is a technique to direct output towards a preferred style (I've noticed it can be applied with image, audio and text generation models). I think it's often referred to as a way to approach "fine-tuning" as a light-weight alternative via a separate file(s), as opposed to duplicating the model (weights?) data but with the adjustments merged (requiring much more disk space and inefficient for changing styles/adapters at runtime? Which the LoRAX project describes). I have no idea about X-LoRA, and haven't found time to look into it yet. Not too important to resolve for now, I'll continue with my refactoring and adjust when appropriate 😅 EDIT: I just took a glance at LoRAX docs for Adapters and noticed an Adapter type called |
The `is_lora` example is only applied to `pipeline/normal.rs` for now as a sibling PR is refactoring `pipeline/{ggml,gguf}.rs` files.
b255f24
to
771864d
Compare
I rebased and squashed the commit suggestions into the prior commit. The two If you're new to squash merge on Github, it should provide by default the content of each commit message I wrote, and use the PR name as the squash commit name (with link ref to this PR). |
Yes, but the Medusa method is a bit different. Speculative Decoding is a sampling technique which uses a (smaller, faster, less precise) "draft" model to accelerate inference of a (larger, slower, more precise) "target" model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you for this change! I think removing the ModelKind
enum is not even a breaking change, actually, as it is not exported publically. Perhaps you could remove it in the GGUF/GGML pipeline refactor?
Sure I can look into throwing that change in as a final commit 👍 |
This is something to consider, but would be a bit awkward without replacing
ModelKind
as a breaking change.It was something I explored after #334 and depends on it for
strum
, likewise I've not added the derive-more crate here either. Going forward with such a change for this type would be rather large, but can be done if deemed worthwhile.I noticed quite a bit of logic to support the different kinds is mostly repeated with very little if any differences. Meanwhile some logic looks like it's already falling out of sync, thus reducing the noise throughout the codebase may help focus on minimizing that concern?
I'm not sure if
Lora
quantized kinds were intentionally excluded here, could simply be a call tokind.is_quantized()
?:mistral.rs/mistralrs-bench/src/main.rs
Lines 316 to 326 in 1d5f9f3
There seems to be some logic to detect a difference between
lora
andxlora
types? (theModelKind
could better indicate that or similar?):mistral.rs/mistralrs-core/src/pipeline/gguf.rs
Lines 406 to 407 in 1d5f9f3
mistral.rs/mistralrs-core/src/pipeline/gguf.rs
Lines 473 to 476 in 1d5f9f3
mistral.rs/mistralrs-core/src/pipeline/gguf.rs
Lines 498 to 507 in 1d5f9f3
mistral.rs/mistralrs-core/src/pipeline/normal.rs
Line 297 in 1d5f9f3
Many of these would be collapsed to the compound type proposed by the PR:
mistral.rs/mistralrs-core/src/pipeline/normal.rs
Lines 231 to 234 in 1d5f9f3
mistral.rs/mistralrs-core/src/pipeline/normal.rs
Lines 277 to 284 in 1d5f9f3
Whereas these would need additional refactoring, but all have the same shape for parameters that they could pass in a struct?
mistral.rs/mistralrs-core/src/pipeline/normal.rs
Lines 235 to 276 in 1d5f9f3
The builder methods also seem rather suitable for deferring to a crate to derive, and that would work better with a breaking change away from these large parameterized methods funneling data through them.
I won't have time to tackle such any time soon, but something to consider as more models and features are added before this sprawls further 😅