ollama

mirror of https://github.com/jmorganca/ollama synced 2025-10-06 08:42:42 +02:00

Author	SHA1	Message	Date
Patrick Devine	8b894933a7	engine: add remote proxy (#12307 )	2025-09-17 14:40:53 -07:00
Devon Rifkin	47991940d4	add qwen3-coder tool support The format qwen3-coder uses is relatively unique, both in rendering and in parsing. To implement parsing, I wrote a custom parser in similar style to harmony. For the rendering, I found that the logic would be much more difficult to follow in a template, so I introduced the concept of a built-in renderer that uses go code, rather than a template to generate prompts. I set us up for future built-in parsers and renderers by making it so they can be specified in a Modelfile like so: ``` RENDERER "qwen3-coder" PARSER "qwen3-coder" ``` These need to be provided explicitly because the architecture alone is not enough to understand what format the model expects to receive, and what format we expect it to output (e.g., qwen3-coder is `qwen3moe`, which includes other qwen3-family models as well) I haven't converted harmony to be one of these "built-ins" yet, since some of it is in flux with the changes @ParthSareen has been making to move harmony to the runner. It is likely that many other built-ins will need to move to the runner as well, but I'm able to slightly defer that decision since qwen3-coder doesn't have thinking (and therefore doesn't need to be in the runner to make structured outputs work). I expect to unify harmony with this approach very soon. Whether a particular model supports tools or thinking was previously inferred from templates, but without a template we now also use the parser itself to declare what it supports. If we have future models that re-use the same parsing format, but have different capabilities, we'll want to parameterize them and give them different names to be specified as a `PARSER`. Misc changes: - I worked on the renderer by diffing outputs from the reference implementation and ours. To make it easier to do this, I extended <https://github.com/ollama/ollama/pull/11875> to also support returning the prompt via the openai compat layer	2025-09-15 11:33:47 -07:00
Michael Yang	61aeaf7e81	remove support for multiple ggufs in a single file (#10722 ) * remove support for multiple ggufs in a single file this was an attempt to make it easier to import multimodal models into ollama. this was rarely used and error prone so remove it * fix: create fused model from blob	2025-05-21 13:55:31 -07:00
Daniel Hiltgen	1a0cfd080a	avoid kv truncation during create (#10761 )	2025-05-19 13:54:54 -07:00
Jesse Gross	94ab428e3f	ggml: Seperate tensor load from backend creation Currently, when the backend is created, the tensors are loaded at the same time, which is a slow operation. This separates them to be two steps: - Create backend, including enumerating tensors and memory allocation - Loading tensor data This allows more flexibility in managing model loading.	2025-05-19 09:54:22 -07:00
Daniel Hiltgen	ff80718e9c	fix crash in old clients with quantization progress (#10710 ) Older clients assumed the digest was at least 19 characters long so increase the size of the dummy digest to avoid array out of bounds crashes.	2025-05-14 14:54:18 -07:00
Bruce MacDonald	ad035ad595	convert: quantize from safetensors needs kv (#10675 ) When creating a quantized model from safetensors we need the array KV values to be loaded.Changing this value to -1 loads the KV values on the returned layer to be used and saved during quantization.	2025-05-12 12:04:20 -07:00
Daniel Hiltgen	424810450f	Move quantization to new backend (#10363 ) * Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.	2025-05-06 11:20:48 -07:00
Michael Yang	340448d2d1	explicitly decode maxarraysize 1024	2025-04-25 16:59:01 -07:00
Michael Yang	88738b357b	create tempdir in models directory the models directory should have plenty of storage and also ensure there's no cross-device copy	2025-04-18 18:13:05 -07:00
Bruce MacDonald	bebb6823c0	server: validate local path on safetensor create (#9379 ) More validation during the safetensor creation process. Properly handle relative paths (like ./model.safetensors) while rejecting absolute paths Add comprehensive test coverage for various paths No functionality changes for valid inputs - existing workflows remain unaffected Leverages Go 1.24's new os.Root functionality for secure containment	2025-02-28 16:10:43 -08:00
Michael Yang	58245413f4	next ollama runner (#7913 ) feat: add new Ollama engine using ggml through cgo This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this. - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go` - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go` - `ml.Tensor` defines the interface for a tensor and tensor operations This is the first implementation of the new engine. Follow up PRs will implement more features: - non-greedy sampling (#8410) - integration with Ollama and KV caching (#8301) - more model support (#9080) with more coming soon Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2025-02-13 16:31:21 -08:00
Patrick Devine	2539f2dbf9	Fix absolute path names + gguf detection (#8428 )	2025-01-14 19:01:24 -08:00
Patrick Devine	8bccae4f92	show a more descriptive error in the client if it is newer than the server (#8351 )	2025-01-09 10:12:30 -08:00
Patrick Devine	86a622cbdc	Update the /api/create endpoint to use JSON (#7935 ) Replaces `POST /api/create` to use JSON instead of a Modelfile. This is a breaking change.	2024-12-31 18:02:30 -08:00

15 Commits