Commit Graph

  • 2c2f4deaa9 openai: refactor to split compat layer and middleware drifkin/oai-compat-refactor Devon Rifkin 2025-10-05 14:18:56 -07:00
  • 292767afb4 CI: fix win arm build (#12502) main v0.12.4-rc6 Daniel Hiltgen 2025-10-04 11:46:45 -07:00
  • ae5e0f0889 CI: replace clang compiler for windows (#12495) v0.12.4-rc5 Daniel Hiltgen 2025-10-04 09:18:42 -07:00
  • 19e6796eac llm: Support KV cache quantization with gpt-oss Jesse Gross 2025-10-03 13:50:02 -07:00
  • b91c1f6749 update tests parth/add-websearch-client ParthSareen 2025-10-03 14:49:49 -07:00
  • 33801c1597 Fixed Deepseek2 adding nil tensor error Grace 2025-10-03 14:20:06 -07:00
  • 4f45f39bc6 remove auth for tests ParthSareen 2025-10-03 14:14:28 -07:00
  • e4340667e3 Workaround broken NVIDIA iGPU free VRAM data (#12490) Daniel Hiltgen 2025-10-03 12:17:21 -07:00
  • c3fa8b2f54 clean up the renderer grace/qwen3-vl-renderer Grace Guo 2025-10-03 12:13:13 -07:00
  • 2fa1e92a99 test: add template error test (#12489) Patrick Devine 2025-10-03 12:05:34 -07:00
  • 94c41579ac restore qwen3coder Grace Guo 2025-10-03 12:05:13 -07:00
  • fc3222c99f working tests, changed code to find the first open tag Grace Guo 2025-10-03 11:38:30 -07:00
  • 07e36761c3 ci: place rocm windows in correct runner dir (#12487) v0.12.4-rc4 Daniel Hiltgen 2025-10-03 07:28:40 -07:00
  • c29fb007c0 CI: temporarily disable clang install (#12486) v0.12.4-rc3 Daniel Hiltgen 2025-10-02 20:31:18 -07:00
  • 730ed6e9e1 ci: fix windows build (#12485) v0.12.4-rc2 Daniel Hiltgen 2025-10-02 19:16:01 -07:00
  • dc06601677 ci: fix windows build (#12484) v0.12.4-rc1 Daniel Hiltgen 2025-10-02 18:59:26 -07:00
  • 1ed2881ef0 templates: fix crash in improperly defined templates (#12483) Patrick Devine 2025-10-02 17:25:55 -07:00
  • 0bda72892c llm: Enable flash attention by default for qwen3 and qwen3moe v0.12.4-rc0 Jesse Gross 2025-10-02 16:51:51 -07:00
  • 55ca827267 AMD: block running on unsupported gfx900/gfx906 (#12481) Daniel Hiltgen 2025-10-02 16:53:05 -07:00
  • c68f367ef6 Update GGML to b6646 (#12245) Daniel Hiltgen 2025-10-02 14:47:10 -07:00
  • fc55584580 testing for qwen3vl parser - toolparser is working Grace Guo 2025-10-02 14:18:34 -07:00
  • fdb109469f llm: Allow overriding flash attention setting Jesse Gross 2025-10-01 14:38:09 -07:00
  • 05a43e078a fix panic on bootstrapDevices (#12475) Daniel Hiltgen 2025-10-01 17:39:29 -07:00
  • 198e7a02d6 llm: Allow overriding flash attention setting jessegross/flash Jesse Gross 2025-10-01 14:38:09 -07:00
  • bc8909fb38 Use runners for GPU discovery (#12090) Daniel Hiltgen 2025-10-01 15:12:32 -07:00
  • 03e1d64aac add tests ParthSareen 2025-10-01 13:23:49 -07:00
  • c6f1fcfe7d Tests work, other than image tags (tests do not go through server) and tools (not in the correct order, but contents are the same) Grace Guo 2025-10-01 13:23:48 -07:00
  • f88174c55d routes/client: add web search and fetch ParthSareen 2025-10-01 13:08:57 -07:00
  • 4854ebcb68 Qwen3VL tests Grace Guo 2025-10-01 12:05:00 -07:00
  • 2047dd2b38 add tests for new code paths parth/double-req-structured-outputs ParthSareen 2025-09-30 17:56:26 -07:00
  • 9205299b7d routes: add thinking model support ParthSareen 2025-09-30 17:32:40 -07:00
  • 0d40b96cb7 harmony: rm harmony changes ParthSareen 2025-09-30 17:01:18 -07:00
  • a0d795b60f routes: do not rely on token boundaries ParthSareen 2025-09-30 17:00:51 -07:00
  • 8306081c71 routes: add structured outputs for GenerateHandler ParthSareen 2025-09-30 14:57:26 -07:00
  • 7dbed62582 routes: structured outputs for gpt-oss ParthSareen 2025-09-30 14:31:53 -07:00
  • 6b50f2b9cd Merge pull request #12461 from ollama/drifkin/qwen3-coder-tweaks Devon Rifkin 2025-09-30 19:47:44 -07:00
  • 35ac4eb12c fix keep alive Michael Yang 2025-09-30 17:12:37 -07:00
  • 3d0b1734c0 ggml: Preallocate CUDA pool memory Jesse Gross 2025-09-09 16:17:31 -07:00
  • efaee8c2d6 ggml: Backport scale kernel fixes Jesse Gross 2025-09-23 12:13:39 -07:00
  • 734b57da0e ggml: Remove allocation status reporting Jesse Gross 2025-09-22 17:27:03 -07:00
  • 83021fcf0f qwen3-coder: fix tool definition type rendering Devon Rifkin 2025-09-30 15:03:15 -07:00
  • d5877d7699 working (other than tool call is the incorrect order) for tool calls and tools Grace Guo 2025-09-30 15:00:00 -07:00
  • 0469861d9d build: call find_package to instantiate library paths Michael Yang 2025-09-30 12:58:31 -07:00
  • ff1b9bb2f3 build: call find_package to instantiate library paths mxyng/fix-build Michael Yang 2025-09-30 12:58:31 -07:00
  • f944382424 lint nicole/websearch_local nicole pardal 2025-09-29 20:10:38 -07:00
  • 3aa34ff0e6 added fetch nicole pardal 2025-09-29 20:06:53 -07:00
  • 0797490d9a added local endpoints nicole pardal 2025-09-29 18:52:57 -07:00
  • abc6a300de model: tweak renderer for qwen3coder jmorganca/qwen3-coder-updates jmorganca 2025-09-28 01:29:03 -07:00
  • 76cc9135ad ggml: Preallocate CUDA pool memory jessegross/memory Jesse Gross 2025-09-09 16:17:31 -07:00
  • 874a7e6a69 ggml: Backport scale kernel fixes Jesse Gross 2025-09-23 12:13:39 -07:00
  • cee83b4aa9 ggml: Remove allocation status reporting Jesse Gross 2025-09-22 17:27:03 -07:00
  • c5cd7fbead works for 3.1, but regression in 3??? grace/deepseek-v3.1-update gr4ceG 2025-09-26 14:35:06 -07:00
  • cd17efc9eb working with deepseekv3.1 thinking - sdpa + non-flash gr4ceG 2025-09-26 14:27:45 -07:00
  • 6248d2c226 Changed ggml backend to support mla Grace Guo 2025-09-26 12:22:39 -07:00
  • c47154c08d fix: correct condition for AMDGPU_TARGETS filtering logic (#12412) 羊撅撅 2025-09-27 02:38:47 +08:00
  • b04e46da3e bugfix: restore the current runOptions if loading fails in the CLI (#12402) v0.12.3 Patrick Devine 2025-09-25 18:30:45 -07:00
  • 34efbbd3f0 Merge pull request #12417 from ollama/drifkin/qwen3-coder-unicode Devon Rifkin 2025-09-25 15:56:34 -07:00
  • ad188ecd19 splitting kvb gr4ceG 2025-09-25 15:50:26 -07:00
  • 05ba4ca1f4 parsers: fix unicode handling for qwen3-coder Devon Rifkin 2025-09-25 15:47:46 -07:00
  • 5a56ff3cf0 cli: add device signin flow when doing ollama push (#12405) Patrick Devine 2025-09-25 15:04:43 -07:00
  • 2fba04b5fb tools: handle the case where a tool call sends "arguments" or "parameters" as a serialized json string (#12413) Gabe Goodhart 2025-09-25 15:37:39 -06:00
  • 6fd37e573f start of deepseek v3.1 stuff grace/deepseek-v3.1-additions gr4ceG 2025-09-25 10:55:21 -07:00
  • fbd82ba5bb Grace/deepseek v3 migration (#12385) Grace 2025-09-24 15:19:47 -07:00
  • 2e742544bf prefer ollama engine for qwen3moe (#12374) v0.12.2-rc0 v0.12.2 Michael Yang 2025-09-24 11:21:32 -07:00
  • bbb195a6ff Merge pull request #12393 from ollama/drifkin/fix-built-ins Devon Rifkin 2025-09-23 23:45:31 -07:00
  • fd88cd7cb0 harmony: don't sanitize built-ins Devon Rifkin 2025-09-23 23:34:55 -07:00
  • e1979c571a fix: leaf alt name (#12390) Michael Yang 2025-09-23 17:50:53 -07:00
  • bf78ed6ee9 add pre:, suf: to tags (#12274) Michael Yang 2025-09-23 16:08:57 -07:00
  • 909232168d deepseek tests grace/deepseek-v3-migration-tests gr4ceG 2025-09-23 14:08:17 -07:00
  • a40d427bce multi-regexp pretokenizer (#12325) Michael Yang 2025-09-23 13:21:47 -07:00
  • 64883e3c4c auth: fix problems with the ollama keypairs (#12373) v0.12.1-rc2 v0.12.1-rc1 v0.12.1 Patrick Devine 2025-09-22 23:20:20 -07:00
  • ffaf2e7916 update tests mxyng/fix-create Michael Yang 2025-09-22 12:57:30 -07:00
  • 41efdd4048 Merge pull request #12339 from ollama/drifkin/harmony-refactor-to-builtin Devon Rifkin 2025-09-22 13:13:40 -07:00
  • b846eacf42 fix: create with nested directories Michael Yang 2025-09-18 17:24:39 -07:00
  • c23e6f4cae tests: add single threaded history test (#12295) Daniel Hiltgen 2025-09-22 11:23:14 -07:00
  • af060eb250 docs: update cloud.md for cloud models jmorganca 2025-09-19 15:50:41 -07:00
  • ae5c33008e docs: move turbo.md to cloud.md jmorganca 2025-09-19 15:49:56 -07:00
  • 4ef2b2852d server: serve original error for remote models jmorganca/cloud-errors jmorganca 2025-09-20 16:46:32 -07:00
  • 3677842ff1 Merge pull request #12358 from ollama/drifkin/qwen3-coder-ampersands v0.12.1-rc0 Devon Rifkin 2025-09-20 12:40:33 -07:00
  • 242df70a75 parsers: fix &s in qwen3coder parameter values Devon Rifkin 2025-09-20 12:10:58 -07:00
  • dba39b2eee gemma: fix rope scaling for qat models (#12348) Patrick Devine 2025-09-19 15:04:40 -07:00
  • 220a0da37e simplify expand path mxyng/expand-path Michael Yang 2025-09-19 10:01:28 -07:00
  • 9f3a37fd36 fix: model load for unsupported embedding models (#12311) v0.12.0-rc1 v0.12.0 Michael Yang 2025-09-18 16:11:08 -07:00
  • 7460259eb3 feat: qwen3 embed (#12301) Michael Yang 2025-09-18 15:50:32 -07:00
  • 22ccdd74c2 server: add unauthorized error to remote chat handler (#12338) Jeffrey Morgan 2025-09-18 19:40:31 -03:00
  • 0c3d0e7533 build: avoid unbounded parallel builds (#12319) Daniel Hiltgen 2025-09-18 14:57:01 -07:00
  • e7f56ef3d8 harmony: remove special casing in routes.go Devon Rifkin 2025-09-18 14:55:59 -07:00
  • eb0a5d4459 auth: check the permissions on the private key to see if it's readable (#12336) Patrick Devine 2025-09-18 14:34:34 -07:00
  • ceac416ec2 fix(integration): check truncated length (#12337) Michael Yang 2025-09-18 14:00:21 -07:00
  • 2717dce6fe convert: convert bf16 vision weights to fp16 (#12324) v0.12.0-rc0 Patrick Devine 2025-09-17 17:43:17 -07:00
  • 9b8187b487 server: skip parsing initial <think> if provided in the prompt for /api/generate (#12289) frob 2025-09-18 01:39:04 +02:00
  • 8b894933a7 engine: add remote proxy (#12307) Patrick Devine 2025-09-17 14:40:53 -07:00
  • 9c5bf342bc fix: multi-cuda version skew (#12318) Daniel Hiltgen 2025-09-17 13:05:09 -07:00
  • 564b558c92 fix(llama): other llama flavours (#12308) Michael Yang 2025-09-17 12:12:21 -07:00
  • a417ac97ee prefer ollama engine for qwen3 (#12310) Michael Yang 2025-09-17 09:48:21 -07:00
  • 05d53457af refactor: use the built-in max/min to simplify the code (#12280) russcoss 2025-09-16 20:14:21 -04:00
  • b225508c9b logutil: fix source field (#12279) Michael Yang 2025-09-16 16:18:07 -07:00
  • fa1c987a29 Merge pull request #12248 from ollama/drifkin/qwen3-coder-parsing Devon Rifkin 2025-09-16 10:21:43 -07:00
  • ad95d5b30b use split activations when possible (#12293) Michael Yang 2025-09-16 09:51:19 -07:00
  • b47b9d9063 s/From*Slice/From*s/ mxyng/cleanup Michael Yang 2025-09-11 11:41:24 -07:00