[resolved] Ollama Error: POST predict: Post EOF

Message

seb95passionlinux · Post by **seb95passionlinux** » Tue Jun 10, 2025 9:28 am

Hello, I'm using Ollama from Guru. Until now, it was fine. I had the GPU via Cuda, and everything worked fine. Then, recently, the Ollama build asked me to upgrade to dev-util/nvidia-cuda-toolkit-12.9.0 to satisfy its installation. Since then, no matter what I do, I get this:

Code: Select all

ollama run llama3.2
>>> ça va?
Error: POST predict: Post "http://127.0.0.1:43593/completion": EOF

Code: Select all

[sebastien@passionlinuxgentoo ~]$ ollama run llama3.2
>>> racontes moi l'histoire de France
Error: POST predict: Post "http://127.0.0.1:36375/completion": EOF
[sebastien@passionlinuxgentoo ~]$

Some info:

Code: Select all

[I] dev-util/nvidia-cuda-toolkit 
Available versions: ~11.8.0-r4(0/11.8.0)^md ~12.3.2(0/12.3.2)^md ~12.4.1(0/12.4.1)^md ~12.5.1(0/12.5.1)^md ~12.6.3-r1(0/12.6.3)^mstd ~12.8.1-r1(0/12.8.1)^mstd (~)12.9.0(0/12.9.0)^mstd {clang debugger examples nsight profiler rdma sanitizer vis-profiler PYTHON_TARGETS="python3_11 python3_12 python3_13"} 
Installed versions: 12.9.0(0/12.9.0)^mstd(01:48:51 09/06/2025)(-clang -debugger -examples -nsight -profiler -rdma -sanitizer PYTHON_TARGETS="python3_13 -python3_11 -python3_12") 
Homepage: https://developer.nvidia.com/cuda-zone 
Description: NVIDIA CUDA Toolkit (compiler and friends)


[I] sci-ml/ollama [1] 
Available versions: (~)0.6.5-r1^s (~)0.6.6^s (~)0.6.8^st (~)0.7.0^st (~)0.7.1^st **9999*l^st {blas cuda mkl rocm AMDGPU="+targets_gfx90a targets_gfx803 targets_gfx900 +targets_gfx906 +targets_gfx908 targets_gfx940 targets_gfx941 +targets_gfx942 targets_gfx1010 targets_gfx1011 targets_gfx1012 +targets_gfx1030 targets_gfx1031 +targets_gfx1100 targets_gfx1101 targets_gfx1102" CPU_FLAGS_X86="amx_int8 amx_tile avx avx2 avx512_bf16 avx512_vnni avx512f avx512vbmi avx_vnni bmi2 f16c fma3 sse4_2"} 
Installed versions: 0.7.1^st(02:46:03 09/06/2025)(cuda -blas -mkl -rocm AMDGPU="targets_gfx90a targets_gfx906 targets_gfx908 targets_gfx942 targets_gfx1030 targets_gfx1100 -targets_gfx803 -targets_gfx900 -targets_gfx940 -targets_gfx941 -targets_gfx1010 -targets_gfx1011 -targets_gfx1012 -targets_gfx1031 -targets_gfx1101 -targets_gfx1102" CPU_FLAGS_X86="avx avx2 f16c fma3 sse4_2 -avx512_vnni -avx512f -avx512vbmi -avx_vnni -bmi2") 
Homepage: https://ollama.com 
Description: Get up and running with Llama 3, Mistral, Gemma, and other language models.

I'm trying to test it without cuda support and will give the result here.

seb95passionlinux · Post by **seb95passionlinux** » Tue Jun 10, 2025 9:43 am

It works, so it's the change of cuda that makes it stuck, but I had tested with the old version and I had the same.

Code: Select all

ollama run llama3.2
>>> racontes moi l'histoire de France
Quelle tâche ! L'histoire de la France est riche et complexe, étendue sur des millénaires. Je vais essayer de vous donner un aperçu général, mais gardez à l'esprit que cette histoire est en constante évolution.

**Les Origines (à partir du VIIIe siècle avant J.-C.) [...]**

Here are my Cuda changes:

Code: Select all

genlop -l | grep nvidia-cuda-toolkit
     Sat Apr  5 03:13:29 2025 >>> dev-util/nvidia-cuda-toolkit-12.8.1
     Fri May  2 11:36:51 2025 >>> dev-util/nvidia-cuda-toolkit-12.8.1
     Sat Jun  7 22:43:19 2025 >>> dev-util/nvidia-cuda-toolkit-12.9.0
     Mon Jun  9 01:49:46 2025 >>> dev-util/nvidia-cuda-toolkit-12.9.0

I tried to come back with the version of cuda which worked but nothing:

Code: Select all

grep nvidia-cuda-toolkit /var/log/emerge.log
1743815484: >>> emerge (6 of 7) dev-util/nvidia-cuda-toolkit-12.8.1 to /
1743815484: === (6 of 7) Cleaning (dev-util/nvidia-cuda-toolkit-12.8.1::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.8.1.ebuild)
1743815498: === (6 of 7) Compiling/Merging (dev-util/nvidia-cuda-toolkit-12.8.1::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.8.1.ebuild)
1743815596: === (6 of 7) Merging (dev-util/nvidia-cuda-toolkit-12.8.1::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.8.1.ebuild)
1743815607: >>> AUTOCLEAN: dev-util/nvidia-cuda-toolkit:0
1743815609: === (6 of 7) Post-Build Cleaning (dev-util/nvidia-cuda-toolkit-12.8.1::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.8.1.ebuild)
1743815609: ::: completed emerge (6 of 7) dev-util/nvidia-cuda-toolkit-12.8.1 to /
1746178447: >>> emerge (74 of 149) dev-util/nvidia-cuda-toolkit-12.8.1 to /
1746178447: === (74 of 149) Cleaning (dev-util/nvidia-cuda-toolkit-12.8.1::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.8.1.ebuild)
1746178466: === (74 of 149) Compiling/Merging (dev-util/nvidia-cuda-toolkit-12.8.1::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.8.1.ebuild)
1746178580: === (74 of 149) Merging (dev-util/nvidia-cuda-toolkit-12.8.1::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.8.1.ebuild)
1746178606: >>> AUTOCLEAN: dev-util/nvidia-cuda-toolkit:0
1746178606: === Unmerging... (dev-util/nvidia-cuda-toolkit-12.8.1)
1746178608: >>> unmerge success: dev-util/nvidia-cuda-toolkit-12.8.1
1746178611: === (74 of 149) Post-Build Cleaning (dev-util/nvidia-cuda-toolkit-12.8.1::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.8.1.ebuild)
1746178611: ::: completed emerge (74 of 149) dev-util/nvidia-cuda-toolkit-12.8.1 to /
1749328829: >>> emerge (1 of 4) dev-util/nvidia-cuda-toolkit-12.9.0 to /
1749328896: === (1 of 4) Cleaning (dev-util/nvidia-cuda-toolkit-12.9.0::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.9.0.ebuild)
1749328912: === (1 of 4) Compiling/Merging (dev-util/nvidia-cuda-toolkit-12.9.0::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.9.0.ebuild)
1749328974: === (1 of 4) Merging (dev-util/nvidia-cuda-toolkit-12.9.0::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.9.0.ebuild)
1749328993: >>> AUTOCLEAN: dev-util/nvidia-cuda-toolkit:0
1749328993: === Unmerging... (dev-util/nvidia-cuda-toolkit-12.8.1)
1749328995: >>> unmerge success: dev-util/nvidia-cuda-toolkit-12.8.1
1749328999: === (1 of 4) Post-Build Cleaning (dev-util/nvidia-cuda-toolkit-12.9.0::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.9.0.ebuild)
1749328999: ::: completed emerge (1 of 4) dev-util/nvidia-cuda-toolkit-12.9.0 to /
1749426232: *** emerge --ask --autounmask --autounmask-license=y --autounmask-use=y --complete-graph --deep --jobs=3 --keep-going --with-bdeps=y --getbinpkg --quiet --regex-search-auto=y --verbose --usepkg dev-util/nvidia-cuda-toolkit
1749426316: *** emerge --ask --autounmask --autounmask-license=y --autounmask-use=y --complete-graph --deep --jobs=3 --keep-going --with-bdeps=y --getbinpkg --quiet --regex-search-auto=y --verbose --usepkg dev-util/nvidia-cuda-toolkit
1749426426: *** emerge --ask --autounmask --autounmask-license=y --autounmask-use=y --complete-graph --deep --jobs=3 --keep-going --with-bdeps=y --getbinpkg --quiet --regex-search-auto=y --verbose --usepkg dev-util/nvidia-cuda-toolkit
1749426450: >>> emerge (1 of 1) dev-util/nvidia-cuda-toolkit-12.9.0 to /
1749426450: === (1 of 1) Cleaning (dev-util/nvidia-cuda-toolkit-12.9.0::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.9.0.ebuild)
1749426471: === (1 of 1) Compiling/Merging (dev-util/nvidia-cuda-toolkit-12.9.0::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.9.0.ebuild)
1749426535: === (1 of 1) Merging (dev-util/nvidia-cuda-toolkit-12.9.0::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.9.0.ebuild)
1749426581: >>> AUTOCLEAN: dev-util/nvidia-cuda-toolkit:0
1749426581: === Unmerging... (dev-util/nvidia-cuda-toolkit-12.9.0)
1749426583: >>> unmerge success: dev-util/nvidia-cuda-toolkit-12.9.0
1749426586: === (1 of 1) Updating world file (dev-util/nvidia-cuda-toolkit-12.9.0)
1749426586: === (1 of 1) Post-Build Cleaning (dev-util/nvidia-cuda-toolkit-12.9.0::/var/db/repos/gentoo/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-12.9.0.ebuild)
1749426586: ::: completed emerge (1 of 1) dev-util/nvidia-cuda-toolkit-12.9.0 to /

seb95passionlinux · Post by **seb95passionlinux** » Thu Jun 12, 2025 6:57 am

Hello, am I the only one? Should I file a bug report? Kind regards

logrusx · Post by **logrusx** » Thu Jun 26, 2025 9:13 am

After

Code: Select all

USE="cuda" ACCEPT_KEYWORDS="~amd64" AMDGPU_TARGETS="" emerge -av =ollama-0.9.2

and

Code: Select all

systemctl start ollama

And in the system log:

Code: Select all

ollama[224518]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3060 Laptop GPU) - 5610 MiB free

then:

Code: Select all

$ ollama run llama3.2
pulling manifest 
pulling dde5aa3fc5ff: 100% ▕████████████████████████████████████████████████████████████████████████████████▏ 2.0 GB                         
pulling 966de95ca8a6: 100% ▕████████████████████████████████████████████████████████████████████████████████▏ 1.4 KB                         
pulling fcc5a6bec9da: 100% ▕████████████████████████████████████████████████████████████████████████████████▏ 7.7 KB                         
pulling a70ff7e570d9: 100% ▕████████████████████████████████████████████████████████████████████████████████▏ 6.0 KB                         
pulling 56bb8bd477a5: 100% ▕████████████████████████████████████████████████████████████████████████████████▏   96 B                         
pulling 34bb5ab01051: 100% ▕████████████████████████████████████████████████████████████████████████████████▏  561 B                         
verifying sha256 digest 
writing manifest 
success 
>>> ça va? 
C'est très bien, merci ! Comment puis-je vous aider aujourd'hui ?

>>> Send a message (/? for help)

With nvidia-cuda-toolkit-12.0.9 and nvidia-drivers-575.64

Is your user part of the video group?

Best Regards,
Georgi

seb95passionlinux · Post by **seb95passionlinux** » Fri Jun 27, 2025 5:16 pm

Hi, thanks for your reply.
We did what was mentioned in the post viewtopic-t-1173644-highlight-.html :

Code: Select all

The real solution is to unmask it:
/etc/portage/profile/package.use.mask:
acct-user/ollama -cuda

It might be worth asking the author of the ebuild guru why it is masked (and perhaps to unmask it for this package).
Alternatively, disable the cuda USE on sci-ml/ollama to avoid all of this entirely.

You haven't done this yourself, should I remove this?

I restart the installation:

Code: Select all

[ebuild R] sci-ml/ollama-0.9.2 USE="cuda -blas -mkl -rocm" AMDGPU_TARGETS="gfx90a gfx908 gfx942 gfx1030 gfx1100 -gfx803 -gfx900 -gfx906 -gfx940 -gfx941 -gfx1010 -gfx1011 -gfx1012 -gfx1031 -gfx1101 -gfx1102 -gfx1200 -gfx1201" CPU_FLAGS_X86="avx avx2 f16c fma3 sse4_2 -avx512_vnni -avx512f -avx512vbmi -avx_vnni -bmi2"

I activate the service but already activated:

Code: Select all

● ollama.service - Ollama Service 
Loaded: loaded (/usr/lib/systemd/system/ollama.service; enabled; preset: disabled) 
Active: active (running) since Fri 2025-06-27 17:53:46 CEST; 1h 13min ago 
Summon: 19582fd86c6f43c788653c7c2473d0e6 
Main PID: 646 (ollama) 
Tasks: 12 (limit: 38314) 
Memory: 54.4M (peak: 63.9M) 
CPU: 370ms 
CGroup: /system.slice/ollama.service 
└─646 /usr/bin/ollama serve

I have the result:

Code: Select all

$ ollama run llama3.2
>>> ça va?
Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details

juin 27 19:16:16 passionlinuxgentoo ollama[646]: time=2025-06-27T19:16:16.383+02:00 level=ERROR source=server.go:800 msg="post predict" error="Post \"http://127.0.0.1:42045/completion\": EOF"
juin 27 19:16:16 passionlinuxgentoo ollama[646]: [GIN] 2025/06/27 - 19:16:16 | 200 | 799.081919ms | 127.0.0.1 | POST "/api/chat"
juin 27 19:16:16 passionlinuxgentoo ollama[646]: time=2025-06-27T19:16:16.405+02:00 level=ERROR source=server.go:457 msg="llama runner terminated" error="exit status 2"

with dev-util/nvidia-cuda-toolkit 12.9.0 and x11-drivers/nvidia-drivers 570.153.02-r1

seb95passionlinux · Post by **seb95passionlinux** » Fri Jun 27, 2025 5:38 pm

With the installation of the nvidia 575.64 driver everything works. Thank you, but is it normal that it doesn't work with the nvidia driver from the stable channel?

logrusx · Post by **logrusx** » Fri Jun 27, 2025 6:30 pm

seb95passionlinux wrote:Hi, thanks for your reply.
We did what was mentioned in the post viewtopic-t-1173644-highlight-.html :
Code: Select all
The real solution is to unmask it:
/etc/portage/profile/package.use.mask:
acct-user/ollama -cuda

It might be worth asking the author of the ebuild guru why it is masked (and perhaps to unmask it for this package).
Alternatively, disable the cuda USE on sci-ml/ollama to avoid all of this entirely.
You haven't done this yourself, should I remove this?

That's no longer relevant.

seb95passionlinux wrote:is it normal that it doesn't work with the nvidia driver from the stable channel?

None of this is from the stable channel. It's also possible you updated your kernel without emerging the driver again or emerged it against the old kernel.

Best Regards,
Georgi

seb95passionlinux · Post by **seb95passionlinux** » Sat Jun 28, 2025 7:36 am

Hello, and thanks again, I didn't think it could have been the nvidia driver at fault but cuda and let me explain; I use a kernel from the stable channel 6.12.31 (6.12.31^tu) and I was using the nvidia driver from the same channel 570.153.02-r1, it worked well with ollama (coming from guru and therefore considered unstable) in its versions up to 0.7.1 including the latter. Then there was ollama 0.9.0 or maybe 0.8.0, and with it a request to change the version for cuda to 12.9.0 and from there nothing. What surprises me is that on Debian it's cuda 11.8.89 with nvidia-driver 535.247.01 and kernel 6.1. Nixos also uses kernel 6.12.34, NVIDIA driver 570.124.06, Cuda 12.8, and Ollama 0.7.0.

All this to say, I followed the recommendations (during kernel updates, etc.). I tried going back to an older version of Ollama (0.7.0) with the previous version of Cuda, and nothing.

Thank you so much!

logrusx · Post by **logrusx** » Sat Jun 28, 2025 9:30 am

There's always the chance some dependencies weren't captured properly in the ebuild and a rebuild was needed. Or some other thing that required rebuild. I'm added nvidia driver to package.accept_keywords because I needed it to avoid an issue in Hyprland and never removed it from there, so I'm staying on unstable. Plus it's pretty stable nowadays. Note that this is Gentoo's perception of unstable, not NVIDIA's.

Also it seem I had unversioned entry for cuda toolkit as well, so coincidentally I was having it all the latest versions of unstable. If it weren't for that, I don't think we would have found the problem so easily.

I'm not sure why I didn't have ollama there, maybe I didn't do a complete cleanup after that. Also I think I needed cuda toolkit for llama-cpp, that's why it was in package.accept_keywords.

The rest of the system is mostly stable though. I only add packages to testing on a per-package basis, I don't do it without a reason, be it mundane one.

Best Regards,
Georgi

tim1724 · Post by **tim1724** » Wed Jul 09, 2025 11:13 pm

On my RTX 3090 machine it was working with CUDA 12.6 and driver 550. I upgraded it to CUDA 12.9 and I believe ollama was still working. Then when I upgraded to driver 570 is stopped working Upgrading it to driver 575 made it work again.

I just tried installing ollama on my machine with a pair of RTX 4090s and it didn't work with driver 570 and CUDA 12.9. I upgraded it to driver 575 and it started working there too.

I also run a machine with a pair of H200 cards, but those cards are currently attached to VMs running Ubuntu. (Not my choice; the faculty running those projects insisted on Ubuntu.) So I can't do any reasonable testing on those at the moment. But I imagine I'd see similar results.

logrusx · Post by **logrusx** » Fri Jul 11, 2025 12:49 pm

Hello and welcome to Gentoo Forums!

Thanks for sharing your observations. Quite likely your guess is correct. In the rare occasions I run AI, I use llama.cpp and can't say anything about the stability or usability of the drivers and toolkit. But in the past I've had ollama not picking up acceleration, despite correctly reporting the hardware. It didn't pass through my mind back then it might have been a drivers issue.

Best Regards,
Georgi