Add support for IQ quantizarions #4322

BruceMacD · 2024-05-10T20:24:55Z

This change allows importing IQ type gguf quantization with ollama create.

This change carries the commit from #3657 while moving its changes around to the refactored project structure.

❯ ./ollama create nous-hermes-2-mistral:IQ_4XS -f /Users/bruce/models/nous-hermes-2-mistral/Modelfile
transferring model data 
using existing layer sha256:737258efad6ba5cf7232de66715a26cadba67b0e4bdace5cf03cf49d1e4864a0 
creating new layer sha256:d7285065edcb87b4852f1144dd090812df1b00ade49f74e234066ea9407a14bc 
creating new layer sha256:d8ba2f9a17b3bbdeb5690efaa409b3fcb0b56296a777c7a69c78aa33bbddf182 
creating new layer sha256:b2c4ee0a7317771fcbe7413c369d72ea911c63e6f52b2b0d6298a5a14c8e4983 
writing manifest 
success 

❯ ./ollama run nous-hermes-2-mistral:IQ_4XS
>>> write some python

def print_fruits(fruits):
    for fruit in fruits:
        print(fruit)

Tested with:
IQ1_S
IQ1_M
IQ2_M
IQ3_XXS
IQ3_XS
IQ3_S
IQ4_NL
IQ4_XS

resolves #3622

sammcj

Looks good, This will be great to have, thank you.

sammcj · 2024-05-15T22:04:12Z

Been running my Ollama build with this patch for the past few days without any issues, it's great to use some of the newer IQ quants!

Keen to see this merged.

mann1x · 2024-05-16T10:19:56Z

@BruceMacD Any idea when we can get this merged?

oldmanjk · 2024-05-21T03:49:12Z

Any updates?

sammcj · 2024-05-22T22:08:57Z

@BruceMacD would you please be able to update (rebase) your branch from main?

Co-Authored-By: ManniX-ITA <20623405+mann1x@users.noreply.github.com>

BruceMacD · 2024-05-23T19:19:04Z

llm/filetype.go

@@ -126,10 +150,26 @@ func (t fileType) String() string {
 		return "IQ2_XS"
 	case fileTypeQ2_K_S:
 		return "Q2_K_S"
-	case fileTypeQ3_K_XS:
-		return "Q3_K_XS"
+	case fileTypeIQ3_XS:


ggerganov/llama.cpp@a33e6a0

mxyng

The test might be a flake. I suggest retrying and if it fails again, maybe ping @dhiltgen

BruceMacD · 2024-05-23T20:22:09Z

@sammcj sorry for the delay!

sammcj · 2024-05-23T23:02:21Z

Nice work, this is awesome!

Xanton19 · 2024-05-25T09:06:40Z

Awesome, great job! Just wanted to comment that IQ3_M seems to be absent from the list (referring to "filetype.go"), which means IQ3_M would not work? Otherwise all the other quants seem to be present.

wwjCMP · 2024-05-26T05:23:19Z

I'm curious about which version of ollama will supported this feature.

sammcj · 2024-05-26T05:26:29Z

@wwjCMP there hasn't been a final release in two weeks, but v0.1.39 which is currently marked as pre-release has it - https://github.com/ollama/ollama/releases

wwjCMP · 2024-05-26T05:36:29Z

@wwjCMP there hasn't been a final release in two weeks, but v0.1.39 which is currently marked as pre-release has it - https://github.com/ollama/ollama/releases

thanks

BruceMacD mentioned this pull request May 10, 2024

Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL is not functional #3657

Closed

sammcj approved these changes May 11, 2024

View reviewed changes

oldmanjk approved these changes May 21, 2024

View reviewed changes

Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL

643abf7

Co-Authored-By: ManniX-ITA <20623405+mann1x@users.noreply.github.com>

BruceMacD force-pushed the brucemacd/iq-quants branch from c7731dc to 643abf7 Compare May 23, 2024 19:18

BruceMacD commented May 23, 2024

View reviewed changes

BruceMacD changed the title ~~Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL~~ Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS, IQ4_NL May 23, 2024

mxyng approved these changes May 23, 2024

View reviewed changes

BruceMacD changed the title ~~Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS, IQ4_NL~~ Add support for IQ quantizarions May 23, 2024

BruceMacD merged commit d6f692a into main May 23, 2024
15 checks passed

BruceMacD deleted the brucemacd/iq-quants branch May 23, 2024 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for IQ quantizarions #4322

Add support for IQ quantizarions #4322

BruceMacD commented May 10, 2024 •

edited

sammcj left a comment

sammcj commented May 15, 2024 •

edited

mann1x commented May 16, 2024

oldmanjk commented May 21, 2024

sammcj commented May 22, 2024

BruceMacD May 23, 2024

mxyng left a comment

BruceMacD commented May 23, 2024

sammcj commented May 23, 2024

Xanton19 commented May 25, 2024 •

edited

wwjCMP commented May 26, 2024

sammcj commented May 26, 2024

wwjCMP commented May 26, 2024

Add support for IQ quantizarions #4322

Add support for IQ quantizarions #4322

Conversation

BruceMacD commented May 10, 2024 • edited

sammcj left a comment

Choose a reason for hiding this comment

sammcj commented May 15, 2024 • edited

mann1x commented May 16, 2024

oldmanjk commented May 21, 2024

sammcj commented May 22, 2024

BruceMacD May 23, 2024

Choose a reason for hiding this comment

mxyng left a comment

Choose a reason for hiding this comment

BruceMacD commented May 23, 2024

sammcj commented May 23, 2024

Xanton19 commented May 25, 2024 • edited

wwjCMP commented May 26, 2024

sammcj commented May 26, 2024

wwjCMP commented May 26, 2024

BruceMacD commented May 10, 2024 •

edited

sammcj commented May 15, 2024 •

edited

Xanton19 commented May 25, 2024 •

edited