View previous topic :: View next topic |
Author |
Message |
JustAnother Apprentice

Joined: 23 Sep 2016 Posts: 209
|
Posted: Wed Apr 09, 2025 10:30 pm Post subject: Removed PYTHON_TARGETS, and computer freezes |
|
|
1I read the eselect section about PYTHON_TARGETS.
I decided (on 04-07) to go ahead after a fresh update and remove any PHTHON_*
from make.conf and update.
Emerge said it had to rebuild 290 packages, even packages that seemed to
have nothing to do with python.
I said go ahead.
Then the computer started hard freezing every few hours. No ssh, ping, no nothing.
No log messages. Frozen.
So I'm having to reboot and emerge --resume every few hours to try to
get through this update cycle.
I did the same process with another laptop computer (with nividia gpu),
and it still working on llvm after two days, but still running.
Q: why does llvm depend on PYTHON_TARGETS?
Before this situation happened, there were occasional hard freezes, not very
recently, and not nearly as often.
Is anybody else having issues like this or have any insights?
Code: | Portage 3.0.67 (python 3.12.9-final-0, default/linux/amd64/23.0/split-usr/desktop, gcc-14, glibc-2.40-r8, 6.12.21-gentoo_2025-04-07 x86_64)
=================================================================
System uname: Linux-6.12.21-gentoo_2025-04-07-x86_64-AMD_Athlon-tm-_64_X2_Dual_Core_Processor_6000+-with-glibc2.40
KiB Mem: 3885124 total, 50156 free
KiB Swap: 16777212 total, 16016888 free
Timestamp of repository gentoo: Wed, 09 Apr 2025 17:45:00 +0000
Head commit of repository gentoo: becc959316536ef611a853aad8a0b8a42bcb6028
Head commit of repository brother-overlay: c7e774490529149a447a06da85da595dc0ba4615
Timestamp of repository fordfrog: Wed, 02 Apr 2025 07:20:39 +0000
Head commit of repository fordfrog: e9ef92e5a7d8de7c7c75f723d7a76b784a904d71
Timestamp of repository gentoobr: Fri, 04 Apr 2025 17:22:41 +0000
Head commit of repository gentoobr: 78220894b69d296c02a84df49833be697b053d37
sh bash 5.2_p37
ld GNU ld (Gentoo 2.44 p1) 2.44.0
app-misc/pax-utils: 1.3.8::gentoo
app-shells/bash: 5.2_p37::gentoo
dev-build/autoconf: 2.72-r1::gentoo
dev-build/automake: 1.17-r1::gentoo
dev-build/cmake: 3.31.5::gentoo
dev-build/libtool: 2.5.4::gentoo
dev-build/make: 4.4.1-r100::gentoo
dev-build/meson: 1.7.0::gentoo
dev-java/java-config: 2.3.4::gentoo
dev-lang/perl: 5.40.0-r1::gentoo
dev-lang/python: 3.11.11_p2::gentoo, 3.12.9::gentoo, 3.13.2::gentoo
dev-lang/rust-bin: 1.84.1-r2::gentoo
llvm-core/clang: 18.1.8-r6::gentoo, 19.1.7::gentoo
llvm-core/llvm: 18.1.8-r6::gentoo, 19.1.7::gentoo
sys-apps/baselayout: 2.17::gentoo
sys-apps/openrc: 0.56::gentoo
sys-apps/sandbox: 2.39::gentoo
sys-devel/binutils: 2.44::gentoo
sys-devel/binutils-config: 5.5.2::gentoo
sys-devel/gcc: 14.2.1_p20241221::gentoo
sys-devel/gcc-config: 2.12.1::gentoo
sys-kernel/linux-headers: 6.12::gentoo (virtual/os-headers)
sys-libs/glibc: 2.40-r8::gentoo
Repositories:
gentoo
location: /usr/portage
sync-type: rsync
sync-uri: rsync://rsync.gentoo.org/gentoo-portage
priority: -1000
volatile: True
sync-rsync-verify-jobs: 1
sync-rsync-extra-opts:
sync-rsync-verify-max-age: 3
sync-rsync-verify-metamanifest: yes
brother-overlay
location: /usr/local/overlay/brother-overlay
sync-type: git
sync-uri: https://github.com/stefan-langenmaier/brother-overlay.git
masters: gentoo
volatile: True
fordfrog
location: /var/db/repos/fordfrog
sync-type: git
sync-uri: https://github.com/gentoo-mirror/fordfrog.git
masters: gentoo
volatile: False
gentoobr
location: /var/db/repos/gentoobr
sync-type: git
sync-uri: https://github.com/gentoo-mirror/gentoobr.git
masters: gentoo
volatile: False
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="@FREE Vivaldi"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe -ggdb"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt /var/bind"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php8.2/ext-active/ /etc/php/cgi-php8.2/ext-active/ /etc/php/cli-php8.2/ext-active/ /etc/php/fpm-php8.2/ext-active/ /etc/php/phpdbg-php8.2/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O2 -pipe -ggdb"
DISTDIR="/usr/portage/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg-live clean-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync merge-wait multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox split-log strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://mirror.rackspace.com/gentoo/ https://mirror.rackspace.com/gentoo/ http://www.gtlib.gatech.edu/pub/gentoo"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,-z,pack-relative-relocs"
LEX="flex"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
PYTHONPATH="/home/me/app/code/python"
SHELL="/bin/bash"
USE="X a52 aac acl acpi alsa amd64 bash-completion bluetooth branding bzip2 cairo cdda cdr cet crypt cscope cups dbus dri dts dvd dvdr egl elogind encode exif flac gdbm gif gpm gtk gui hddtemp iconv icu ipv6 jpeg kf6compat lcms libgda libinput libnotify libtirpc lm-sensors mad mng mp3 mp4 mpeg multilib ncurses nls objc ogg opengl openmp pam pango pcre pdf perl php png policykit postscript ppds python qml qt5 qt6 raw readline samba sdl seccomp sockets sound spell split-usr sqlite ssl startup-notification svg symlink syslog test-rust tiff tokenizer truetype udev udisks unicode upower usb vorbis vulkan wayland wxwidgets x264 xattr xcb xft xml xv xvid zlib" ABI_X86="64" ADA_TARGET="gcc_14" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_anon authn_dbm authn_file authz_dbm authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir env expires ext_filter file_cache filter headers include info log_config logio mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="3dnow 3dnowext mmx mmxext sse sse2 sse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax navcom oceanserver oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 tsip tripmate tnt ublox" GUILE_SINGLE_TARGET="3-0" GUILE_TARGETS="3-0" INPUT_DEVICES="libinput" KERNEL="linux" LCD_DEVICES="bayrad cfontz glk hd44780 lb216 lcdm001 mtxorb text" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-2" POSTGRES_TARGETS="postgres17" PYTHON_SINGLE_TARGET="python3_12" PYTHON_TARGETS="python3_12" RUBY_TARGETS="ruby32" VIDEO_CARDS="radeon r300" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipp2p iface geoip fuzzy condition tarpit sysrq proto logmark ipmark dhcpmac delude chaos account"
Unset: ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LFLAGS, LIBTOOL, LINGUAS, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS |
[Moderator edit: added [code] tags to preserve output layout. -- pietinger] |
|
Back to top |
|
 |
Banana Moderator


Joined: 21 May 2004 Posts: 2023 Location: Germany
|
Posted: Thu Apr 10, 2025 5:49 am Post subject: |
|
|
Quote: | I said go ahead.
Then the computer started hard freezing every few hours. No ssh, ping, no nothing.
No log messages. Frozen.
So I'm having to reboot and emerge --resume every few hours to try to
get through this update cycle. |
Does this mean the required emerge command after the make.conf change got stuck?
Please provide the command you are executing. _________________ Forum Guidelines
PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire |
|
Back to top |
|
 |
Zucca Moderator


Joined: 14 Jun 2007 Posts: 4074 Location: Rasi, Finland
|
Posted: Thu Apr 10, 2025 8:52 am Post subject: |
|
|
Hm. I'm not sure if this can affect, but why do you have -ggdb enabled globally? _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
 |
Hu Administrator

Joined: 06 Mar 2007 Posts: 23449
|
Posted: Thu Apr 10, 2025 3:21 pm Post subject: |
|
|
If I recall correctly, -ggdb will greatly increase the memory/disk requirements, due to all the debug symbols. The requirement is notably worse for template-heavy C++ programs, relative to plain C programs.
It is not normal for a system to ever suffer a "hard freeze", so if this system has been intermittently failing like that even before this last round of updates, I would start with the idea that the system has an underlying fault and that the load of these updates is provoking that fault more frequently. |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 10020 Location: almost Mile High in the USA
|
Posted: Thu Apr 10, 2025 4:19 pm Post subject: |
|
|
At 4GiB RAM, running -j2 for MAKEOPTS, and having X running at the same time, you're probably really stressing your swap and it's possible it makes it looks like the machine hangs. Keeping the gdb symbols around probably exasperates the issue (and I thought portage strips binaries by default (FEATURES=nostrip || FEATURES=splitdebug?) so -ggdb basically gets thrown away?) _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
JustAnother Apprentice

Joined: 23 Sep 2016 Posts: 209
|
Posted: Thu Apr 10, 2025 10:46 pm Post subject: |
|
|
I'll try to describe this in more detail, since I left out a few things, and
there are ongoing changes.
I'll deal with the -ggdb issue shortly and fix that. But first, this.
Up until Monday, I had very occasional freezes, starting a few months ago. I just cussed and rebooted.
Here is what I do every week:
Code: | emaint sync -a | tee ./sync/$(date "+%Y-%m-%d")_sync.log && eix-update
emerge -uDUva --with-bdeps=y @world 2>&1 | tee emerge.log; date
emerge --depclean
eclean-dist --deep && eclean-pkg --deep |
By the way, these don't seem to do the same thing as the wiki pages say:
Code: | eclean-dist --deep && eclean-pkg --deep
eclean -v --nocolor distfiles |
They delete different files. Just so you might know.
On Monday (04-07) I did the usual updates, with no problems as usual.
Then I read the eselect blurb about Python, which said to get rid of any
in make.conf.
So I got rid of these lines and made sure they were the only lines:
Code: | ## 2024-06-03: temporary?
#PYTHON_TARGETS="python3_11 python3_12"
#PYTHON_SINGLE_TARGET=python3_11 |
The idea was to make only this change and do the above update. I only made
these changes because I'm a little wary from past experience of touching
these variables unless it is truly necessary.
I did not run emaint first for the purpose of not entangling any updated
ebuild files with the package updates.
So I directly ran
Code: | emerge -uDUva --with-bdeps=y @world 2>&1 | tee emerge.log; date |
To my surprise it said it wanted to rebuild 290 packages, even packages like
boost and clang which I thought were basic libraries.
I don't get this part. Python is a high level language with a lot of baggage,
and has its share of hiccups. If python is directly coupled to many basic libraries,
isn't that kind of a single point of failure issue?
Anyway, I hit yes, and left the computer to start a long emerge session.
And the computer froze after -- guessing -- a couple of hours.
A second laptop computer (nvidia) was undergoing (almost) the exact same process.
I checked the other machine and it had
Code: | CFLAGS="-march=native -O2 -pipe" |
My freezing machine has
Code: | CFLAGS="-march=native -O2 -pipe -ggdb" |
So that is a vote for the possibility of -ggdb being a problem on the
freezing machine.
The second (nvidia) computer made it through the update after a long time.
It is worth pointing out that the aberrant -ggdb flag has been there for a long
time and there has never been this kind of problem.
That is a vote against the -ggdb possibility.
So I rebooted the frozen computer and did
emerge seemed to know what was going on and started to resume.
Then if I remember this correctly boost failed.
If I had known what was unfolding I would have captured all the details, but
I didn't. That's how these things unfold: you don't know what you're being pulled
into until you've been pulled into it and are trying to swim out of it.
I remember that the boost fail seemed to say there was a bad instruction.
So I resumed again with --skipfirst to skip boost.
At this point I got into a pattern of freezes after ~ 1hour, followed by reboot/resume attempts.
At one point clang made some real progress and crashed, only this time I did
capture the output:
https://pastebin.com/8eib8KLU
By this time I knew I was in a predicament.
I had seen the discussion about freezes with Radeon chips and how they pointed the
finger at the latest mesa version, saying that mesa sent some kind of command to the radeon
driver that caused the driver to step on a landmine.
While all this mud-against-the-wall stuff was happening, I realized that boost had built,
even though it had crashed before.
So, being stuck, I decided to downgrade mesa:
package.mask:
Code: | >media-libs/mesa-24.1.7-r1 |
The computer froze before mesa completed, so I rebooted and this time it finished
mesa. I then directly tried to run revdep-rebuild as emerge said.
The computer froze in inkscape. I didn't make much of this, as I had not
rebooted yet and halfway expected this. So I rebooted and resumed.
It froze again in inkscape. I checked to ensure mesa was right:
Quote: | media-libs/mesa
Available versions: 24.1.7-r1^t [m]24.3.4-r1^t [m]~25.0.0^t [m]~25.0.1^t [m]~25.0.2^t [m]~25.0.3^t [m]**9999*l^t {+X d3d9 debug +llvm lm-sensors opencl +opengl osmesa +proprietary-codecs selinux test unwind vaapi valgrind vdpau vulkan vulkan-overlay wayland xa +zstd ABI_MIPS="n32 n64 o32" ABI_S390="32 64" ABI_X86="32 64 x32" CPU_FLAGS_X86="sse2" LLVM_SLOT="15 16 17 (+)18 +19" VIDEO_CARDS="d3d12 freedreno intel lavapipe lima nouveau nvk panfrost r300 r600 radeon radeonsi v3d vc4 virgl vivante vmware zink"}
Installed versions: 24.1.7-r1^t(10:15:35 PM 04/09/2025)(X llvm lm-sensors opengl proprietary-codecs vulkan wayland zstd -d3d9 -debug -opencl -osmesa -selinux -test -unwind -vaapi -valgrind -vdpau -vulkan-overlay -xa ABI_MIPS="-n32 -n64 -o32" ABI_S390="-32 -64" ABI_X86="64 -32 -x32" CPU_FLAGS_X86="sse2" LLVM_SLOT="18 -15 -16 -17" VIDEO_CARDS="r300 radeon -d3d12 -freedreno -intel -lavapipe -lima -nouveau -nvk -panfrost -r600 -radeonsi -v3d -vc4 -virgl -vivante -vmware -zink") |
It was right. If the mesa downgrade had really worked, the latest freeze should not have happened.
But on the other hand, inkscape counts the files as it builds, and both freezes happened sonewhere
around the 900/1100 range. Is there the outside change that the build froze twice in the same place?
Who knows, but to not get stuck on such a situation I decided to go ahead and try to do a
standard update (with emaint) as shown at the top, knowing that clang would rear its head and be the litmus test.
(clang all but hangs my computer as it heads for the finish line, even on a good day.)
So I started the update about eight hours ago, and emerge is chomping on clang right now,
at 975/2009 files.
Eight hours of uptime seems to argue (maybe) that mesa did the job. The inkscape double crash seems to argue
that mesa did not do the job. But this is only the third inning.
There is always the chance that this is some kind of hardware issue, but the major problems
only appeared precisely when I started the Python_Targets run.
I'll update this when there is more to say, and then yank out the -ggdb factor.
This situation is worth following to some kind of point of clarity.
So far the conjectures above add up to little. But that is how these things unfold. |
|
Back to top |
|
 |
JustAnother Apprentice

Joined: 23 Sep 2016 Posts: 209
|
Posted: Thu Apr 10, 2025 11:29 pm Post subject: |
|
|
Update: clang crashed. See https://pastebin.com/gaK2bK0Q
The last time it crashed at file 969. This time it crashed at file 1258.
The computer has been working on this for ~ 10 hours and there is no freeze.
I'm going to try
Code: | emerge --skipfirst --resume | and see if the other packages build.
Right now it's starting on inkscape, which crashed before.
Update: inkscape crashed. See https://pastebin.com/gAhKbPZ0
At this point I'm going to get rid of the -ggdb flag and do
emerge --resume
which should I believe crash inkscape in exactly the same way.
Update again: still no freezes today. Inkscape crashed again, in a different place.
This time I'll change nothing and do a --resume, and inkscape should positively crash in the same place as before.
Note how the errors involve bad characters in files and misspelled variable names. Weird.
See https://pastebin.com/wGMamvPa
Update again: inkscape crashed in still another place.
See https://pastebin.com/CXMRhLpw
So now we have a seemingly impossible situation: attempts to do what should be exactly the same thing are getting different results.
Now look at the inkscape run that crashed on file 418. Here is the complaint:
Quote: | In file included from [01m[K/usr/include/boost/limits.hpp:19[m[K,
from [01m[K/usr/include/boost/iterator/iterator_concepts.hpp:23[m[K,
from [01m[K/usr/include/boost/range/concepts.hpp:20[m[K,
from [01m[K/usr/include/boost/range/algorithm/equal.hpp:14[m[K,
from [01m[K/var/tmp/portage/media-gfx/inkscape-1.4-r1/work/inkscape-1.4/src/3rdparty/2geom/include/2geom/pathvector.h:39[m[K,
from [01m[K/var/tmp/portage/media-gfx/inkscape-1.4-r1/work/inkscape-1.4/src/live_effects/parameter/parameter.h:15[m[K,
from [01m[K/var/tmp/portage/media-gfx/inkscape-1.4-r1/work/inkscape-1.4/src/live_effects/parameter/bool.h:15[m[K,
from [01m[K/var/tmp/portage/media-gfx/inkscape-1.4-r1/work/inkscape-1.4/src/live_effects/effect.h:13[m[K,
from [01m[K/var/tmp/portage/media-gfx/inkscape-1.4-r1/work/inkscape-1.4/src/live_effects/parameter/enum.h:15[m[K,
from [01m[K/var/tmp/portage/media-gfx/inkscape-1.4-r1/work/inkscape-1.4/src/live_effects/lpe-perspective-envelope.h:21[m[K,
from [01m[K/var/tmp/portage/media-gfx/inkscape-1.4-r1/work/inkscape-1.4/src/live_effects/lpe-perspective-envelope.cpp:17[m[K:
[01m[K/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14/limits:1784:62:[m[K [01;31m[Kerror: [m[K‘[01m[Khas_quied_NaN[m[K’ was not declared in this scope; did you mean ‘[01m[Khas_quiet_NaN[m[K’?
1784 | static _GLIBCXX_USE_CONSTEXPR bool has_signaling_NaN = [01;31m[Khas_quiet_NaN[m[K;
| [01;31m[K^~~~~~~~~~~~~[m[K
| [32m[Khas_quiet_NaN[m[K
|
I looked at the file
Code: | /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14/limits |
and there is no "quied" anything, plenty of "quiet" stuff.
So it looks like the version of the file being fed to the compiler is being corrupted.
But the corruption does not seem to be random in characters or packages. The same quied and bogus ` character complaint
keeps showing up.
Update: I tried this:
Code: | emerge --skipfirst --resume |
and got this list of packages:
Quote: | skipped: [ebuild rR ] media-gfx/inkscape-1.4-r1::gentoo USE="X cdr exif jpeg openmp postscript readline spell -dia -graphicsmagick -imagemagick -inkjar -sourceview -svg2 -test -visio -wpg" PYTHON_SINGLE_TARGET="python3_12 -python3_10 -python3_11 -python3_13" 0 KiB
[ebuild rR ] app-text/zathura-pdf-poppler-0.3.3::gentoo 0 KiB
[ebuild rR ] net-print/libcupsfilters-2.1.0::gentoo USE="dbus exif jpeg pdf png poppler postscript tiff -test" 0 KiB
[ebuild U ] net-analyzer/wireshark-4.4.5-r1:0/4.4.5::gentoo [4.4.5:0/4.4.5::gentoo] USE="capinfos captype dftest dumpcap editcap filecaps gui mergecap minizip netlink pcap plugins randpkt randpktdump reordercap sharkd ssl text2pcap tshark udpdump zlib zstd -androiddump -bcg729 -brotli -ciscodump -doc -dpauxmon -http2 -http3 -ilbc -kerberos -libxml2 -lua -lz4 -maxminddb -opus -sbc -sdjournal (-selinux) -smi -snappy -spandsp -sshdump -test -verify-sig -wifi" LUA_SINGLE_TARGET="-lua5-3 -lua5-4" 0 KiB
[ebuild U ] www-client/firefox-bin-137.0.1:rapid::gentoo [137.0:rapid::gentoo] USE="gmp-autoupdate wayland (-selinux)" L10N="-ach -af -an -ar -ast -az -be -bg -bn -br -bs -ca -ca-valencia -cak -cs -cy -da -de -dsb -el -en-CA -en-GB -eo -es-AR -es-CL -es-ES -es-MX -et -eu -fa -ff -fi -fr -fy -ga -gd -gl -gn -gu -he -hi -hr -hsb -hu -hy -ia -id -is -it -ja -ka -kab -kk -km -kn -ko -lij -lt -lv -mk -mr -ms -my -nb -ne -nl -nn -oc -pa -pl -pt-BR -pt-PT -rm -ro -ru -sco -si -sk -skr -sl -son -sq -sr -sv -ta -te -th -tl -tr -trs -uk -ur -uz -vi -xh -zh-CN -zh-TW" 70,998 KiB |
The computer froze in wireshark, after being up and mostly compiling for 13 hours.
Update: I rebooted and resumed, which started on wireshark again. The computer froze at file 248.
So I rebooted and tried again, to see if the freeze happens at the same place. It doesn't.
Instead, no freeze, but wireshark crashes. See
https://pastebin.com/GKi6XY2u
At this point I'll refrain from any emerges and just use the computer, and see if it freezes or not over night. |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 10020 Location: almost Mile High in the USA
|
Posted: Fri Apr 11, 2025 6:00 am Post subject: |
|
|
-ggdb should not cause your computer to crash.
Looks like some of your pastebin expired already, but since you're getting random behavior, likely it is hardware related.
How is your cooling? how old is the PSU? I suspect bad ram should give you exceptions but it's also worth to be tested.
Actually you should go check ram with memtest86+ or something. Getting errors like
Code: | {standard input}:25865: Error: bad register name `%b15' |
is abnormal. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
Hu Administrator

Joined: 06 Mar 2007 Posts: 23449
|
Posted: Fri Apr 11, 2025 2:55 pm Post subject: |
|
|
I concur with eccerr0r regarding register %b15, but that is interesting. %r15 is a valid register on amd64. Per man ascii, b is 0x62, and r is 0x72. Thus, they are only one bit away in representation. If an r was stored into a RAM cell that changed bit 4 from on to off, you would change that r into b, and get the reported error.
Likewise, t changing bit 4 to off would produce d, hence quiet becomes quied. This is further supported by how the compiler's own output shows the string was quiet, yet the error text complains about quied. |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 10020 Location: almost Mile High in the USA
|
Posted: Fri Apr 11, 2025 5:02 pm Post subject: |
|
|
Funny... I've never had a computer have bad memory like this that it causes human notable errors but computer still runs...
Usually it's so subtle that I lose bits only when copying large files, or so bad the computer constantly segfaults. But this time it's bad enough to be visible while working with strings (and silent data corruption too) but doesn't constantly segfault.
So yeah, check your ram. Might need to replace or stop overclocking or underclock RAM to see if it helps. At 4GiB RAM I wouldn't recommend blocking off bad blocks but that was an option I had when dealing with bad RAM because the machine had more than necessary (blocking off 512MB RAM on 64GiB is no big deal, but 512MB on 4GB is a huge deal.)
Then I did use memtest86+ to notice a bad resistor and bad clock lines on a few DIMMs I had once (apparently I bought mishandled DIMMs), had to do DIMM surgery to fix them. The test patterns gave a really good sign at what to look for and I found them! _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55255 Location: 56N 3W
|
Posted: Fri Apr 11, 2025 5:08 pm Post subject: |
|
|
/me puts a few shillings 'on the nose' for a hardware related problem. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
JustAnother Apprentice

Joined: 23 Sep 2016 Posts: 209
|
Posted: Fri Apr 11, 2025 9:20 pm Post subject: |
|
|
The points about the bit flips are well taken. But there is still a cloud of confusion here.
But first an update. I let the computer idle last night. It was still up after ~15 hours.
So I got the idea to do an emaint sync and a regular update, excluding clang, inkscape, and wireshark,
which have caused crashes and freezes.
The computer rebooted this time. Thus far in this saga emerge has either failed or frozen the computer.
I ran a quick memtest which showed nothing, but a good memtest must run overnight, so I will do that.
Anyway I get the memo about this computer needing a date with the glue factory. I let this slide
because of the hassles of researching the hardware to prevent this kind of frustration, and my case is 3" taller
than the new ones. I need the same height.
As for the PSU, memory, etc. the motherboard is ~2008 and the PSU is ~2016, and the hard drive is ~2017, so
everything is, ahem, "mature", like a patient over 60.
I have dealt with bad power supplies, seen bad memory, bad motherboards, but this is weird.
But this situation raises several tough questions. To reiterate:
: Occasional freezes and reboots over the last few months.
: A switch of PYTHON_TARGETS greatly accelerates the freezes.
: A downgrade of mesa greatly attenuates the freezes, but not completely.
: Packages fail in a way that suggest bit flips in source code.
: But the package failures are not random. Some packages seem much more prone.
The same errors (quied, bogus ` characters, %b15, etc.) occur, but at seemingly random places in the sequence of files to build.
If this is a pure hardware problem, it is a weird one.
What scenario could explain this contradictory set of randomness and non-randomness?
I think there is still a chance there is some bug involved with this, but that chance appears slim.
By the way has anybody ever seen any evidence that the screensaver has any relationship to freezes?
Just asking.
Anyway, I'll get about a week to play with this while a new computer is in transit.
I'll update after a long memtest. |
|
Back to top |
|
 |
pjp Administrator


Joined: 16 Apr 2002 Posts: 20609
|
Posted: Sat Apr 12, 2025 4:43 am Post subject: |
|
|
JustAnother wrote: | If this is a pure hardware problem, it is a weird one. | Sounds textbook to me.
JustAnother wrote: | What scenario could explain this contradictory set of randomness and non-randomness? | Perception. I had a problem that ONLY caused a system to reboot when I was trying to compile a hardened kernel. memtest eventually showed that it was a problem in a high section of the 2nd channel. It was in the motherboard, not the ram. More recently another "high enough" section of RAM that it only occurred when compiling bigger packages. In both instances, there were zero other indications of a problem.
PSU issues can also cause random problems. As well as overheating.
Until memtest reports 0 problems after a sufficiently long run, that's where I'd put my money. FYI, "overnight" may not be enough.
Also, if you mentioned it, I missed it... did you check logs for errors? _________________ Quis separabit? Quo animo? |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55255 Location: 56N 3W
|
Posted: Sat Apr 12, 2025 10:12 am Post subject: |
|
|
With a motherboaid of that vintage, the capacators on the 12v CPU regulators need to be looked at.
Domed, tilted or leaking examples mean that they all need to be replaced with low ESR parts.
It's not too bad a job if you already have moderate skills with a soldering iron.
It all works when things are stable, or change slowly.
As soon as the CPU does a big speed change, that equals a big power step, the capacators can't cope, voltages go low and anything can happen. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 10020 Location: almost Mile High in the USA
|
Posted: Sat Apr 12, 2025 1:34 pm Post subject: |
|
|
I've also had bad disk controllers and bad chipsets but nowadays those are integrated onto the motherboard and warranted replacement.... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
pjp Administrator


Joined: 16 Apr 2002 Posts: 20609
|
Posted: Sat Apr 12, 2025 8:05 pm Post subject: |
|
|
NeddySeagoon wrote: | With a motherboaid of that vintage, the capacators on the 12v CPU regulators need to be looked at.
Domed, tilted or leaking examples mean that they all need to be replaced with low ESR parts. | Bad capacitors were one of the single most difficult "what's causing erratic problems for a user." And the problems were only somewhat worse than "standard" problems users encountered at the time. Fortunately that was during the "leaky capacitor" problem days and others new of the recall. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
 |
JustAnother Apprentice

Joined: 23 Sep 2016 Posts: 209
|
Posted: Sat Apr 12, 2025 8:49 pm Post subject: |
|
|
Update: I ran a full memtest last night and a few hours of memtest today. It passed.
More attempts at emerge --resume froze it.
Just to fully rule out the portage build process itself, I just happen to have the build process for openwrt on the
computer, which exercises things pretty hard. So I tried that, and the computer froze. Oddly, if I just do ordinary
stuff like web browsing and file editing, the computer is stable.
So this does indeed look like a hardware problem. As for capacitors, this motherboard circa 2009 post-dates the
capacitor debacle of ~ 2003, but they do go bad, and other parts go bad.
I have an old ~2004 hp laptop sitting around waiting for a funeral. It boots erratically. I think the gridballs on the nvidia gpu
are the problem - they got sued for this, but it could be the capacitors. I'll eyeball the caps before I toss the thing. I'm still fond of it.
Never let a good piece of junk go to waste.
The ultimate scapegoat for failing motherboards is of course the word electromigration. In the early 1960s the IC industry freaked
out over this. But it turned out that a small doping amount of copper in the aluminum strips would slow down the problem quite a bit.
Until recently, when the currents are much larger. |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 10020 Location: almost Mile High in the USA
|
Posted: Sun Apr 13, 2025 6:42 am Post subject: |
|
|
perhaps tried a different disk controller? (USB vs PATA vs SATA?)
I have a computer that constantly corrupts stuff over ethernet but not wifi...
I've had very few boards fail to EM. Most were GPUs of all sorts. I had one CPU fail due to overclocking (and overvolting). And had one chipset on a m/b that kept corrupting stuff as it shuffled data through, but that board was acquired second hand so unsure of its history. But all in all, RAM was the most likely culprit of errors, but most were acquired bad versus failed over time.
In any case I'll need to check all my machines once in a while with a thorough system test... but knock on wood, no recent failures other than power supplies and cooling fans... And have yet to have a hard drive return corrupted stuff (other than the bad disk controller) - hard drives have so far only given me what I wrote or nothing at all. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55255 Location: 56N 3W
|
Posted: Sun Apr 13, 2025 9:26 am Post subject: |
|
|
JustAnother,
My money is on power supply (not always the metal box) transient response.
When the CPU goes from idle to flat out, there in a huge step change in the input current to the CPU. 100A or more, in a CPU clock cycle. The PSU has to cope with this step and keep the voltage stable within a few mV.
As parts age, particularly capacitors, the PSU transient response gets worse. You don't need the infamous 2003 capacitor problem.
It's not something you can test at home.
Try running prime95. That's a horrible CPU stress test. In turn that will stress the CPU PSU.
If prime95 crashes on start, you have a pointer. If not, we have found another thing that it isn't. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
JustAnother Apprentice

Joined: 23 Sep 2016 Posts: 209
|
Posted: Sun Apr 13, 2025 10:06 pm Post subject: |
|
|
Here's another attempt to fit these pieces together.
To reiterate again my personal take on this:
: Portage seems to be the problem, freezing and crashing the computer, but it really isn't.
: Ordinary usage of the computer is not a problem, but that isn't quite the case.
: Within portage,
: non-random with respect to packages
: With packages that do cause freeze/fail problems, random with respect to
file number, non-random with respect to specific failures.
: memtest catches nothing on the time scale of a day.
So what is going on here? Is there one pattern behind all of this?
Ask the following simple question: what does portage actually "do"?
The answer: portage grinds on gcc and copies a bunch of files all over the place.
Another question: what are the two types of packages on a build system?
The answer: one type grinds on gcc and copies a bunch of files all over the place,
and the other type just copies a bunch of files all over the place.
Take all these things mentioned above and put them on a table, and then ask:
can the "things" be separated into two sets that have one key difference?
I think the answer is yes, and the key difference is the amount of heat being
generated within the substrate of the cpu and northbridge, and to a lesser extent
the substrates within the dimm's.
In other words, a conditional intermittent connection within a chip substrate,
with the condition being the thermal stress on the chip substrate -- i.e. the temperature.
This explains two of the portage mysteries. Packages like clang and inkscape that
grind on gcc put more thermal stress on the silicon and once the temperature of the
substrate hits a certain level the chance of an intermittent rises dramatically.
Packages like firefox-bin don't grind on gcc, and the thermal stress is much lower,
so the packages have a much higher chance of building.
Once a grinder package induces a high substrate temperature and activates the
condition, a failure (mostly) is statistical and only a matter of time.
As for the other part about the specific failures: not obvious.
One thing I have noticed over time is that the sounds from the computer cpu fan
are a decent (kind of) indication of the thermal stress on the cpu.
In the morning, if I see the hard drive light on and the computer is quiet, I know
the computer is wrapped around the axle with the swap file.
And what is the computer doing? It is copying a bunch of files all over the place -- low thermal stress.
If the computer is making noise, I know it is grinding on gcc and making real progress -- high thermal stress.
Once I got memtest running, I noticed that the fan was making some noise (and uniformly over time),
but not a whole lot. Kind of like a stove on medium low - a low simmer, but not a boil.
If any of this funny business about intermittent connections sounds strange,
every TV repairman used to have to deal with stuff like this.
I went though this with the wiring harness on a 75 Volvo.
What I am saying is that if all the stuff mentioned in this topic is seen within the context
of thermal stress, it all fits together better.
Consider this question: if this scenario has any merit, who is the likely culprit --
the cpu, the northbridge, or the dimm's?
There are discussions about this. People warn not to directly compare specs between cpu's
and dimm's.
DDR2's (like mine) seem to not get much notice. DDR4 gets complaints about the need for heatsinks.
DDR5 is faster but is more power efficient, so there are fewer complaints.
This is being actively published, but the papers seem to mostly figure out what the
designers already know.
People point out that with dimm's the heat diffusion area is spread out over 8 chips.
With a cpu the heat is coming out of an area significantly smaller than the area of the cpu package.
In other words, if you have very little in the way of hard facts and have to start pointing
the finger, the cpu/northbridge may be the better bet.
Concerning memtest, another important dimension to whole process comes into play: current
fluctuations on various time scales, and their relationship to thermal fluctuations.
What does memtest do? It is one process that generates a lot of very fast current
fluctuations, but on a longer time scale generates zero fluctuations, because it is in a
tight loop doing just one thing, kind of like a workout that only exercises a
couple of muscles. Portage on the other hand is more like a full body workout, and it is
bringing down the machine in minutes.
Memtest should be changed to have another test that exercises the memory but adjusts the
overall thermal loading over time to approximate some power spectral density, with 1/f being is
good candidate.
And there is a lot more to consider, but I'll stop here for now. |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 10020 Location: almost Mile High in the USA
|
Posted: Sun Apr 13, 2025 11:53 pm Post subject: |
|
|
That is a problem with most people in the world, they don't have an appreciation of what's actually in a computer and how it all needs to work together to get a seamless system. This goes for software too, as well as the hardware-software interface.
Portage is merely a python script that runs gcc among lots of other stuff. It's software just like any other software.
Computers nowadays tries to use less power when it's not doing anything. When it starts doing something, it demands power and if it's not there, it will likely do something unintended - that's why we have to blame power supplies but again everything works together as a system and you can't always instantly blame one or another.
I still run machines with DDR2 and DDR3. I recently got one machine with DDR4. I've used all sorts of ram from SRAM to the original DRAM, FPM, EDO, SDRAM, DDR. I have not used DDR5 or RAMBUS. However they all at a software level are the same: write data in, expect to read data back out. Main reason why RAM is blamed first: statistically, most of the transistors on your computer are used for RAM.
Same with pretty much any hardware no matter what software is thrown on it, it should be reliable based on the constraints of the hardware - one of which is clock rate and you have to muck with that by itself in firmware settings or board settings, it usually is protected from random control as it can cause crashes too if you go too far. And as said earlier there's a whole bunch of parts that need to work together to get you a sane running environment. If you break any part of it, the whole things come crashing down like your computer.
That DDR2 machine of mine is a Core2 Quad and I run it 24/7. The machine is probably 15+ years old now, yeah it is newer than an Athlon 64, but no matter - I verified the machine is stable and that's why it runs and still runs fine... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
logrusx Advocate


Joined: 22 Feb 2018 Posts: 2985
|
Posted: Mon Apr 14, 2025 4:18 am Post subject: |
|
|
I had core2quad like eccerr0r. It could fit max 8GB on 4 slots. At one point it was hard to find 2GB pieces, so I got what I found. It had a defect and it only manifested itself during emerge because it was at the end of the range.
How many passes did memtest do? Mine needed a lot. I don't remember well but it might well have been over 8, which is the recommended minimum. It took maybe 11 hours and generated a lot of heat. Especially on the memory chips.
I doubt the problem is heat because you would have noticed it way before things get heated. The system generates heat at idle and any increase in temperature would be enough, once such issues start manifesting.
Best Regards,
Georgi |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55255 Location: 56N 3W
|
Posted: Mon Apr 14, 2025 8:37 am Post subject: |
|
|
JustAnother,
Run with one stick of memory at a time until you have tested all your RAM, one stick at a time.
Memtest is not flawless.
This means that you will 'wipe the contacts' on the RAM as a bonus. That's been known to fix problems too.
Run prime95 to stress your CPU, or even cpuburn. The latter is in the ::gentoo repo.
Keep an eye on your CPU temperatures. Both drive your CPU (and motherboard VRM) hard. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 10020 Location: almost Mile High in the USA
|
Posted: Mon Apr 14, 2025 9:38 am Post subject: |
|
|
I had a Pentium II motherboard whose RAM ran fine but if I passed a lot of data through ATAPI it eventually would come up with errors while copying data.
I threw that board away. Kept CPU for no real reason. CPU, hdd, and RAM worked fine on other boards. Detection was md5summing copied data.
Another weird issue. I had a 8GB DDR3 DIMM I got for free and stuck it into a machine with another 8GB and four 4GB. A few of the memory locations readily and consistently showed up as bad on memtest86+ so I noted them. I then swapped out all the 4GB DIMMs and filled the rest of the slots with 8GB DIMMs for 64GiB. Retested the RAM and I still see the bad locations. Then a few months later I swapped/upgraded to a new CPU and retested the RAM... the bad RAM disappeared! Unsure if it was the CPU or firmware as I did have to reset the CMOS settings which I did not do prior to this last change... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55255 Location: 56N 3W
|
Posted: Mon Apr 14, 2025 10:07 am Post subject: |
|
|
eccerr0r,
DDR3 and later is a nest of vipers.
The CPU (memory controller) sets up the timings and signal drive strengths by trial and error.
Its called training. As things drift with temperature changes, the training may not be quite right any more. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|