Releases · ollama/ollama

02 Jun 03:30

github-actions

v0.1.41

476fb8e

v0.1.41 Latest

Latest

What's Changed

Fixed issue on Windows 10 and 11 with Intel CPUs with integrated GPUs where Ollama would encounter an error

Full Changelog: v0.1.40...v0.1.41

Assets 10

31 May 05:49

github-actions

v0.1.40

829ff87

v0.1.40

New models

Codestral: Codestral is Mistral AI’s first-ever code model designed for code generation tasks.
IBM Granite Code: now in 3B and 8B parameter sizes.
Deepseek V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

What's Changed

Fixed out of memory and incorrect token issues when running Codestral on 16GB Macs
Fixed issue where full-width characters (e.g. Japanese, Chinese, Russian) were deleted at end of the line when using ollama run

New Examples

Use open-source models as coding assistant with Continue

New Contributors

@zhewang1-intc made their first contribution in #3278

Full Changelog: v0.1.39...v0.1.40

Contributors

zhewang1-intc

Assets 10

22 May 02:46

github-actions

v0.1.39

ad89708

v0.1.39

New models

Cohere Aya 23: A new state-of-the-art, multilingual LLM covering 23 different languages.
Mistral 7B 0.3: A new version of Mistral 7B with initial support for function calling.
Phi-3 Medium: a 14B parameters, lightweight, state-of-the-art open model by Microsoft.
Phi-3 Mini 128K and Phi-3 Medium 128K: versions of the Phi-3 models that support a context window size of 128K
Granite code: A family of open foundation models by IBM for Code Intelligence

Llama 3 import

It is now possible to import and quantize Llama 3 and its finetunes from Safetensors format to Ollama.

First, clone a Hugging Face repo with a Safetensors model:

git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
cd Meta-Llama-3-8B-Instruct

Next, create a Modelfile:

FROM .

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>

Then, create and quantize a model:

ollama create --quantize q4_0 -f Modelfile my-llama3 
ollama run my-llama3

What's Changed

Fixed issues where wide characters such as Chinese, Korean, Japanese and Russian languages.
Added new OLLAMA_NOHISTORY=1 environment variable that can be set to disable history when using ollama run
New experimental OLLAMA_FLASH_ATTENTION=1 flag for ollama serve that improves token generation speed on Apple Silicon Macs and NVIDIA graphics cards
Fixed error that would occur on Windows running ollama create -f Modelfile
ollama create can now create models from I-Quant GGUF files
Fixed EOF errors when resuming downloads via ollama pull
Added a Ctrl+W shortcut to ollama run

New Contributors

@rapmd73 made their first contribution in #4467
@sammcj made their first contribution in #4120
@likejazz made their first contribution in #4535

Full Changelog: v0.1.38...v0.1.39

Contributors

sammcj, likejazz, and rapmd73

Assets 10

15 May 00:28

github-actions

v0.1.38

d1692fd

v0.1.38

New Models

Falcon 2: A new 11B parameters causal decoder-only model built by TII and trained over 5T tokens.
Yi 1.5: A new high-performing version of Yi, now licensed as Apache 2.0. Available in 6B, 9B and 34B sizes.

What's Changed

`ollama ps`

A new command is now available: ollama ps. This command displays currently loaded models, their memory footprint, and the processors used (GPU or CPU):

% ollama ps
NAME             	ID          	SIZE  	PROCESSOR      	UNTIL              
mixtral:latest   	7708c059a8bb	28 GB 	47%/53% CPU/GPU	Forever           	
llama3:latest    	a6990ed6be41	5.5 GB	100% GPU       	4 minutes from now	
all-minilm:latest	1b226e2802db	585 MB	100% GPU       	4 minutes from now

`/clear`

To clear the chat history for a session when running ollama run, use /clear:

>>> /clear
Cleared session context

Fixed issue where switching loaded models on Windows would take several seconds
Running /save will no longer abort the chat session if an incorrect name is provided
The /api/tags API endpoint will now correctly return an empty list [] instead of null if no models are provided

New Contributors

@fangtaosong made their first contribution in #4387
@machimachida made their first contribution in #4424

Full Changelog: v0.1.37...v0.1.38

Contributors

machimachida and fangtaosong

Assets 10

12 May 01:59

github-actions

v0.1.37

41ba301

v0.1.37

What's Changed

Fixed issue where models with uppercase characters in the name would not show with ollama list
Fixed usage string for ollama create
Fix finish_reason being "" instead of null in the Open-AI compatible chat API.

New Contributors

@todashuta made their first contribution in #4362

Full Changelog: v0.1.36...v0.1.37

Contributors

todashuta

Assets 10

11 May 06:37

github-actions

v0.1.36

92ca2cc

v0.1.36

What's Changed

Fixed exit status 0xc0000005 error with AMD graphics cards on Windows
Fixed rare out of memory errors when loading a model to run with CPU

Full Changelog: v0.1.35...v0.1.36

Assets 10

10 May 15:15

github-actions

v0.1.35

86f9b58

v0.1.35

New models

Llama 3 ChatQA: A model from NVIDIA based on Llama 3 that excels at conversational question answering (QA) and retrieval-augmented generation (RAG).

What's Changed

Quantization: ollama create can now quantize models when importing them using the --quantize or -q flag:

ollama create -f Modelfile --quantize q4_0 mymodel

Note

--quantize works when importing float16 or float32 models:

From a binary GGUF files (e.g. FROM ./model.gguf)
From a library model (e.g. FROM llama3:8b-instruct-fp16)

Fixed issue where inference subprocesses wouldn't be cleaned up on shutdown.
Fixed a series out of memory errors when loading models on multi-GPU systems
Ctrl+J characters will now properly add newlines in ollama run
Fixed issues when running ollama show for vision models
OPTIONS requests to the Ollama API will no longer result in errors
Fixed issue where partially downloaded files wouldn't be cleaned up
Added a new done_reason field in responses describing why generation stopped responding
Ollama will now more accurately estimate how much memory is available on multi-GPU systems especially when running different models one after another

New Contributors

@fmaclen made their first contribution in #3884
@Renset made their first contribution in #3881
@glumia made their first contribution in #3043
@boessu made their first contribution in #4236
@gaardhus made their first contribution in #2307
@svilupp made their first contribution in #2192
@WolfTheDeveloper made their first contribution in #4300

Full Changelog: v0.1.34...v0.1.35

Contributors

Renset, fmaclen, and 5 other contributors

Assets 10

07 May 05:13

github-actions

v0.1.34

adeb40e

v0.1.34

New models

Llava Llama 3: A new high-performing LLaVA model fine-tuned from Llama 3 Instruct.
Llava Phi 3: A new small LLaVA model fine-tuned from Phi 3.
StarCoder2 15B Instruct: A new instruct fine-tune of the StarCoder2 model
CodeGemma 1.1: A new release of the CodeGemma model.
StableLM2 12B: A new 12B version of the StableLM 2 model from Stability AI
Moondream 2: Moondream 2's runtime parameters have been improved for better responses

What's Changed

Fixed issues with LLaVa models where they would respond incorrectly after the first request
Fixed out of memory errors when running large models such as Llama 3 70B
Fixed various issues with Nvidia GPU discovery on Linux and Windows
Fixed a series of Modelfile errors when running ollama create
Fixed no slots available error that occurred when cancelling a request and then sending follow up requests
Improved AMD GPU detection on Fedora
Improved reliability when using the experimental OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED flags
ollama serve will now shut down quickly, even if a model is loading

New Contributors

@drnic made their first contribution in #4116
@bernardo-bruning made their first contribution in #4111
@Drlordbasil made their first contribution in #4174
@Saif-Shines made their first contribution in #4119
@HydenLiu made their first contribution in #4194
@jl-codes made their first contribution in #3621
@Nurgo made their first contribution in #3473
@adrienbrault made their first contribution in #3129
@Darinochka made their first contribution in #3945

Full Changelog: v0.1.33...v0.1.34

Contributors

drnic, adrienbrault, and 7 other contributors

Assets 10

28 Apr 17:51

github-actions

v0.1.33

9164b01

v0.1.33

New models:

Llama 3: a new model by Meta, and the most capable openly available LLM to date
Phi 3 Mini: a new 3.8B parameters, lightweight, state-of-the-art open model by Microsoft.
Moondream moondream is a small vision language model designed to run efficiently on edge devices.
Llama 3 Gradient 1048K: A Llama 3 fine-tune by Gradient to support up to a 1M token context window.
Dolphin Llama 3: The uncensored Dolphin model, trained by Eric Hartford and based on Llama 3 with a variety of instruction, conversational, and coding skills.
Qwen 110B: The first Qwen model over 100B parameters in size with outstanding performance in evaluations

What's Changed

Fixed issues where the model would not terminate, causing the API to hang.
Fixed a series of out of memory errors on Apple Silicon Macs
Fixed out of memory errors when running Mixtral architecture models

Experimental concurrency features

New concurrency features are coming soon to Ollama. They are available

OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously for a single model
OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously

To enable these features, set the environment variables for ollama serve. For more info see this guide:

OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve

New Contributors

@hmartinez82 made their first contribution in #3972
@Cephra made their first contribution in #4037
@arpitjain099 made their first contribution in #4007
@MarkWard0110 made their first contribution in #4031
@alwqx made their first contribution in #4073
@sidxt made their first contribution in #3705
@ChengenH made their first contribution in #3789
@secondtruth made their first contribution in #3503
@reid41 made their first contribution in #3612
@ericcurtin made their first contribution in #3626
@JT2M0L3Y made their first contribution in #3633
@datvodinh made their first contribution in #3655
@MapleEve made their first contribution in #3817
@swuecho made their first contribution in #3810
@brycereitano made their first contribution in #3895
@bsdnet made their first contribution in #3889
@fyxtro made their first contribution in #3855
@natalyjazzviolin made their first contribution in #3962

Full Changelog: v0.1.32...v0.1.33

Contributors

secondtruth, swuecho, and 15 other contributors

Assets 10

10 Apr 23:01

github-actions

v0.1.32

fb9580d

v0.1.32

New models

WizardLM 2: State of the art large language model from Microsoft AI with improved performance on complex chat, multilingual, reasoning and agent use cases.
- wizardlm2:8x22b: large 8x22B model based on Mixtral 8x22B
- wizardlm2:7b: fast, high-performing model based on Mistral 7B
Snowflake Arctic Embed: A suite of text embedding models by Snowflake, optimized for performance.
Command R+: a powerful, scalable large language model purpose-built for RAG use cases
DBRX: A large 132B open, general-purpose LLM created by Databricks.
Mixtral 8x22B: the new leading Mixture of Experts (MoE) base model by Mistral AI.

What's Changed

Ollama will now better utilize available VRAM, leading to less out-of-memory errors, as well as better GPU utilization
When running larger models that don't fit into VRAM on macOS, Ollama will now split the model between GPU and CPU to maximize performance.
Fixed several issues where Ollama would hang upon encountering an error
Fix issue where using quotes in OLLAMA_ORIGINS would cause an error

New Contributors

@sugarforever made their first contribution in #3400
@yaroslavyaroslav made their first contribution in #3378
@Nagi-ovo made their first contribution in #3423
@ParisNeo made their first contribution in #3436
@philippgille made their first contribution in #3437
@cesto93 made their first contribution in #3461
@ThomasVitale made their first contribution in #3515
@writinwaters made their first contribution in #3539
@alexmavr made their first contribution in #3555

Full Changelog: v0.1.31...v0.1.32

Contributors

philippgille, sugarforever, and 7 other contributors

Assets 10

Releases: ollama/ollama

v0.1.41

What's Changed

v0.1.40

New models

What's Changed

New Examples

New Contributors

Contributors

v0.1.39

New models

Llama 3 import

What's Changed

New Contributors

Contributors

v0.1.38

New Models

What's Changed

ollama ps

/clear

New Contributors

Contributors

v0.1.37

What's Changed

New Contributors

Contributors

v0.1.36

What's Changed

v0.1.35

New models

What's Changed

New Contributors

Contributors

v0.1.34

New models

What's Changed

New Contributors

Contributors

v0.1.33

New models:

What's Changed

Experimental concurrency features

New Contributors

Contributors

v0.1.32

New models

What's Changed

New Contributors

Contributors

`ollama ps`

`/clear`