ollama

mirror of https://github.com/jmorganca/ollama synced 2025-10-05 16:22:53 +02:00

Files

Jesse Gross fdb109469f llm: Allow overriding flash attention setting

As we automatically enable flash attention for more models, there
are likely some cases where we get it wrong. This allows setting
OLLAMA_FLASH_ATTENTION=0 to disable it, even for models that usually
have flash attention.

2025-10-02 12:07:20 -07:00

llm_darwin.go

Optimize container images for startup (#6547 )

2024-09-12 12:10:30 -07:00

llm_linux.go

Optimize container images for startup (#6547 )

2024-09-12 12:10:30 -07:00

llm_windows.go

win: lint fix (#10571 )

2025-05-05 11:08:12 -07:00

memory_test.go

Use runners for GPU discovery (#12090 )

2025-10-01 15:12:32 -07:00

memory.go

llm: Allow overriding flash attention setting

2025-10-02 12:07:20 -07:00

server_test.go

Use runners for GPU discovery (#12090 )

2025-10-01 15:12:32 -07:00

server.go

llm: Allow overriding flash attention setting

2025-10-02 12:07:20 -07:00

status.go

Improve crash reporting (#7728 )

2024-11-19 16:26:57 -08:00