Random emerge failures on updates [SOLVED]

OldTango · l33t Joined: 21 Feb 2004 Posts: 737

What I get during major package emerges (system updates) are segfaults, general protection faults and waiting for unfinished jobs. Sometimes I even get an out of memory error which seems very unlikely in my case.

The issues always happen when a large package like LLVM is in the mix of package updates or other large packages are in the mix ie... gcc, clang etc. Which is why I have added --keep-going to my emerge commands so I can get through most of the packages then I check the logs for failed packages and the errors involved. I can recover form the failed emerges only buy doing a system reboot and restarting the emerge process. Rebooting solves the problem until it surfaces again. I have not been able to pin down the problem as yet but it seems like the RAM is not being cleared properly.

I am not sure what is causing this ongoing issue but I am looking into the possible causes.

Maybe due to heat build up however my system never exceeds 75C. My cooler is an older (about 4 years) Corsair 360 AIO and it barley keeps my system from throttling.

I do not have a swap-drive. I use a tempfs and its possible the settings are insufficient.

Josef.95 · Advocate Joined: 03 Sep 2007 Posts: 4839 Location: Germany

Hm, for a first check, i would try it with only two memory modules.
If it works, then check it with the other two memory modules too.

OldTango · l33t Joined: 21 Feb 2004 Posts: 737

Hu · Administrator Joined: 06 Mar 2007 Posts: 23671

This sounds like hardware failure to me. In my opinion, you should use a cooler that can do better than "barely keeps [ ... ] from throttling."

Rebooting probably rearranges the used memory not to use the failing module, though it could also be that the reboot in some way brings the system temperature down enough to stabilize the CPU.

An improper tmpfs configuration cannot cause this.

What diagnostics have you done to rule out hardware failure?

pietinger · Posted: Tue Jun 24, 2025 12:25 am Post subject:

OldTango,

do you have set in your make.conf EMERGE_DEFAULT_OPTS="--jobs X" ?

(see more here: https://wiki.gentoo.org/wiki/User:Pietinger/Tutorials/Optimize_compile_times#Using_EMERGE_DEFAULT_OPTS )

(maybe show us your settings in make.conf?)

If yes, it could be that 16 GB for /var/tmp/portage is not sufficient. Yes, there is no package which really needs 16 GB:
https://wiki.gentoo.org/wiki/Portage_TMPDIR_on_tmpfs#Considering_tmpfs_size

... but if you install/emerge more packages at the same time you can reach this limit of 16 GB.

If you have a 128 GB machine then you can safely set a higher value; dont worry, the kernel will allocate this memory only if needed. See my "df":

Hu · Administrator Joined: 06 Mar 2007 Posts: 23671

At one time, with certain CFLAGS, clang needed ~30GiB of space in the build directory.

OldTango · l33t Joined: 21 Feb 2004 Posts: 737

Hu · Administrator Joined: 06 Mar 2007 Posts: 23671

In my opinion, if you're using a cooler that was near top end when new, and it's properly installed and used in a good environment, then it ought to be doing better than yours seems to be doing. However, before modifying hardware, I would rule out faulty RAM. Run a memory test. If you find errors, they are not necessarily defective RAM sticks, but they are a sign of a serious problem. Properly operating systems should be able to run memtest indefinitely with no errors reported.

niderecha · n00b Joined: 10 Nov 2024 Posts: 66

If we talk about possibly failing memory modules, did you stress test the memory with something like memtest or memtester?

OldTango · l33t Joined: 21 Feb 2004 Posts: 737

OldTango · l33t Joined: 21 Feb 2004 Posts: 737

After 4 days of testing my RAM, I have 2 bad sticks. One with a single but repeatable error, and one that memtest86 can't even do a full pass on without aborting. So hoping the RMA gets approved but until then I’ll be running with limited RAM for now.

Thanks to all.

Tango