View previous topic :: View next topic |
Author |
Message |
Kresp Tux's lil' helper

Joined: 17 Oct 2016 Posts: 77
|
Posted: Sun Apr 02, 2017 7:23 am Post subject: Segfaults during compilation on AMD Ryzen. |
|
|
I often encounter segfaults during emerge builds of heavy packages like curl, hgc, llvm:
Code: | Apr 2 16:55:42 wagner kernel: [ 2188.416231] sh[5627]: segfault at 34 ip 0000000000406215 sp 00007ffdadd984c8 error 6 in bash[400000+a8000]
Apr 2 16:57:08 wagner kernel: [ 2273.706264] sh[15390]: segfault at e ip 0000000000406215 sp 00007ffc1c9a0d78 error 6 in bash[400000+a8000]
Apr 2 17:00:16 wagner kernel: [ 2461.767997] sh[19903]: segfault at 8 ip 0000000000406215 sp 00007ffd970f79c8 error 6 in bash[400000+a8000] |
Usually just trying again is enough to finish it, even though sometimes it takes few retries.
I did memtest on all 4 RAM sticks just few days ago, before gentoo installation, so memory should be fine. CPU is not overclocked, RAM runs on 2133, nothing too crazy.
I don't really stress the system yet, CPU heatsink is always cold. I can not watch temps yet, since linux does not yet support new AMD R7 sensors, but according to BIOS, CPU fans never go above about 300+ RPM, with idle temp of about 44 C.
Gcc is 4.9.4, kernel - 4.10.6. CPU - AMD R7 1800X, motherboard - MSI X370 Titanium, BIOS 1.30 stable.
Any tips on where I should start looking?
Think I'm going to try running sysbench for some time and see if it crashes for starters, but I don't think this is hardware related.
Cpu flags set, from cpuinfo2cpuflags-x86:
Code: | CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3" |
Last edited by Kresp on Mon Apr 03, 2017 4:00 am; edited 1 time in total |
|
Back to top |
|
 |
Logicien Veteran


Joined: 16 Sep 2005 Posts: 1555 Location: Montréal
|
Posted: Sun Apr 02, 2017 7:36 am Post subject: |
|
|
How do you optimise Gcc compile time in /etc/portage/make.conf? _________________ Paul |
|
Back to top |
|
 |
Kresp Tux's lil' helper

Joined: 17 Oct 2016 Posts: 77
|
Posted: Sun Apr 02, 2017 7:40 am Post subject: |
|
|
Logicien wrote: | How do you optimise Gcc compile time in /etc/portage/make.conf? |
I'm not sure what you mean. Are you talking about MAKEOPTS?
I'll just post full make.conf for completeness:
Code: | # These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /usr/share/portage/config/make.conf.example for a more
# detailed example.
CFLAGS="-march=native -O2 -pipe"
CXXFLAGS="${CFLAGS}"
MAKEOPTS="-j16"
GRUB_PLATFORMS="efi-64"
VIDEO_CARDS="nouveau"
# WARNING: Changing your CHOST is not something that should be done lightly.
# Please consult http://www.gentoo.org/doc/en/change-chost.xml before changing.
CHOST="x86_64-pc-linux-gnu"
# These are the USE and USE_EXPAND flags that were used for
# buidling in addition to what is provided by the profile.
USE="X aac alsa asm bash-completion cli crypt cups emacs encode exif fbcon ffmpeg flac fontconfig gif git gnome-keyring gtk idn -ieee1394 imap ipv6 -java javascript jit jpeg lame lm_sensors lzma mad matroska mime mng modules mozilla mp3 mp4 mpeg multilib ncurses offensive ogg opengl openmp pdf png policykit posix -pulseaudio python quicktime raw rdp readline rss samba scanner smp sockets socks5 sound sqlite ssl svg -systemd theora threads truetype unicode usb vdpau vorbis wav wavpack x264 xattr xml xvid zlib"
CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3"
PORTDIR="/usr/portage"
DISTDIR="${PORTDIR}/distfiles"
PKGDIR="${PORTDIR}/packages"
|
|
|
Back to top |
|
 |
Logicien Veteran


Joined: 16 Sep 2005 Posts: 1555 Location: Montréal
|
Posted: Sun Apr 02, 2017 11:09 am Post subject: |
|
|
Kresp wrote: | #CFLAGS="-march=native -O2 -pipe"
#CXXFLAGS="${CFLAGS}"
#MAKEOPTS="-j16"
|
I would revert those variables to their defaults by putting them in remarks. I am not an expert of Gcc and make but -j16 is very excessive. It is known that too agressive optimisations lead to errors during compilation. _________________ Paul |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55027 Location: 56N 3W
|
Posted: Sun Apr 02, 2017 11:31 am Post subject: |
|
|
Kresp,
Segfaults during build normally indicate a hardware issue.
However, with such an old gcc on such new hardware, I would not rule out other things.
gcc-4.9.4 does not understand -march=native for Ryzen. That needs gcc-6.3 which is still hard masked in Gentoo.
I'm not suggesting that you upgrade to that. The update is not trivial and gcc-6.3 does not work for everything yet.
That the problem is intermittent points to hardware.
Is your BIOS the latest available version?
There have been a rash of BIOS updates since Rzyen was released.
Update your BIOS as a first step, if there in a newer one for your motherboard. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
Kresp Tux's lil' helper

Joined: 17 Oct 2016 Posts: 77
|
Posted: Sun Apr 02, 2017 12:51 pm Post subject: |
|
|
NeddySeagoon wrote: |
Is your BIOS the latest available version?
|
It is latest stable, 1.30. There was 1.41 beta, but it was pulled due to some issues - apparently some people had motherboard bricking due to applying it.
I ran sysbench for cpu with 16 threads for three hours, everything was fine. Heatsink got warm, but barely - about 35-37 C I'd say, so probably not an overheating issue.
I upgraded to gcc 5.4.0-r3. revdep-rebuild ended up rebuilding 60 packages, including few heavy ones like thunderbird. Also installed emacs, gdb and openmw with dependencies - not a single segfault so far.
I'll continue watching emerges closely for the next few days, but this upgrade to gcc5 seems to have fixed the problem.
I've read in Ryzen thread that with this CPU gcc6 is desirable, but decided against using it, since there is a slew of open tickets for in on bug tracker yet. |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55027 Location: 56N 3W
|
Posted: Sun Apr 02, 2017 2:20 pm Post subject: |
|
|
Kresp,
The rebuilds between gcc-4.x and gcc-5.x are due to the C++ ABI change.
Its not required every gcc major version update.
gcc-5.x to gcc-6.x is painless except for packages that have problems with gcc-6.
You are quite right to take things slowly. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
limn l33t

Joined: 13 May 2005 Posts: 997
|
Posted: Sun Apr 02, 2017 3:59 pm Post subject: |
|
|
Those are not compiler faults, at least not directly. It's your shell the kernel is complaining about. |
|
Back to top |
|
 |
c1pherx n00b

Joined: 02 Apr 2017 Posts: 7
|
Posted: Sun Apr 02, 2017 4:50 pm Post subject: |
|
|
I'm running a Ryzen 1700X with the ASRock Taichi and 32GB of RAM. I've tried GCC 4.9.4, 5.4.0, and 6.3.0 and I've seen some sporadic segmentation faults on all three. Also an up-to-date BIOS, Memtest returns no issues, and stress doesn't seem to cause issues. So far I haven't been able to isolate exactly what is causing it, but I don't think it's hardware. Some things I do know:
gcc-5.4.0 with -march=native is -march=bdver4.
gcc-6.3.0 with -march=native is -march=znver1
I know there are some significant differences between Bulldozer and Zen, but 6.3.0 has also produces the occasional segfault despite the newer -march. Currently re-emerging all of world with 6.3.0 to see if that helps matters.
Last edited by c1pherx on Sun Apr 02, 2017 9:37 pm; edited 1 time in total |
|
Back to top |
|
 |
toralf Developer


Joined: 01 Feb 2004 Posts: 3943 Location: Hamburg
|
Posted: Sun Apr 02, 2017 4:53 pm Post subject: Re: Segfaults during compilation. |
|
|
Kresp wrote: | I often encounter segfaults during emerge builds of heavy packages like curl, hgc, llvm:
Code: | Apr 2 16:55:42 wagner kernel: [ 2188.416231] sh[5627]: segfault at 34 ip 0000000000406215 sp 00007ffdadd984c8 error 6 in bash[400000+a8000]
Apr 2 16:57:08 wagner kernel: [ 2273.706264] sh[15390]: segfault at e ip 0000000000406215 sp 00007ffc1c9a0d78 error 6 in bash[400000+a8000]
Apr 2 17:00:16 wagner kernel: [ 2461.767997] sh[19903]: segfault at 8 ip 0000000000406215 sp 00007ffd970f79c8 error 6 in bash[400000+a8000] |
Usually just trying again is enough to finish it, even though sometimes it takes few retries. | I'd change the parallel make jobs from -j16 to -j8 and would check whether the compile issues go away or not. |
|
Back to top |
|
 |
Jaglover Watchman


Joined: 29 May 2005 Posts: 8291 Location: Saint Amant, Acadiana
|
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55027 Location: 56N 3W
|
Posted: Sun Apr 02, 2017 5:35 pm Post subject: |
|
|
Jaglover,
Its worse than that.
Only if memtest tells you the RAM is bad at the same address on several cycles, is there a good chance its the RAM.
It can also be the memory controller (in the processor on Ryzen) or the local voltage regulator (on the motherboard) for the RAM.
Errors at random addresses are probably not RAM. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
limn l33t

Joined: 13 May 2005 Posts: 997
|
Posted: Sun Apr 02, 2017 5:38 pm Post subject: |
|
|
FMA3 instruction problem?
It is described as intermittent, but also as not affecting Linux. |
|
Back to top |
|
 |
Kresp Tux's lil' helper

Joined: 17 Oct 2016 Posts: 77
|
Posted: Mon Apr 03, 2017 10:41 am Post subject: |
|
|
Well, gcc5 was not a panacea - segfaults still happen.
I removed fma3 flag to check if this is related to that CPU bug, but sudo emerge --ask --newuse --update @world did not rebuild anything.
Will now try disabling SMT in UEFI and changing MAKEOPTS to 8. SMT/HT is marketing gimmick anyway. |
|
Back to top |
|
 |
limn l33t

Joined: 13 May 2005 Posts: 997
|
Posted: Mon Apr 03, 2017 11:15 am Post subject: |
|
|
emerge --newuse --update does not consider changes to CPU_FLAGS_X86.
Recompile a package that failed an arbitrary number of times with the cpu flag until you have at least one failure. Then remove flag and compile it at least as many times.
Even then you may not know until you apply the FMA3 fix. |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55027 Location: 56N 3W
|
Posted: Mon Apr 03, 2017 2:23 pm Post subject: |
|
|
Kresp, limn,
CPU_FLAGS_X86 applies only to hand optimised code segments where a package advertises that such speedups are available for user selection.
They do not affect the code emitted by gcc.
To stop gcc using FMA3, you need to find the name of the option add add it to cflags. looks promising.
You cannot tell where gcc has used FMA3, if anywhere, so to be sure its not used, you need to do Code: | emerge -e @world --with-bdeps=y |
_________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
limn l33t

Joined: 13 May 2005 Posts: 997
|
Posted: Mon Apr 03, 2017 4:47 pm Post subject: |
|
|
Neddy,
Are you saying that CPU_FLAGS_X86 activate optimizations in the binary at run time? |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55027 Location: 56N 3W
|
Posted: Mon Apr 03, 2017 5:53 pm Post subject: |
|
|
limn,
No. In the source at build time. They are exactly like USE flags, which is what they were at one time.
CPU_FLAGS_X86="mmx" will include sections of optional code in the source that have been hand optimised to make use of the mmx instruction set.
CFLAGS="-mmmx" (is that right?) allows gcc to emit mmx instructions in the course of any build. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
liewyec n00b

Joined: 03 Apr 2017 Posts: 9
|
Posted: Mon Apr 03, 2017 6:08 pm Post subject: |
|
|
i got the same problem. I have ryzen 1800x, asus prime x370 pro. I have only two sticks of ram, memtest also no errors.
I can sometimes compile entire chromium with no errors and sometimes it crashes multiple times in a row compiling just a few packages and it is not even running 16 threads, It is really strange. I tried disable optimization but it didn't help.
In the end i wrote a script that will restart emerge few times if it fails. |
|
Back to top |
|
 |
c1pherx n00b

Joined: 02 Apr 2017 Posts: 7
|
Posted: Tue Apr 04, 2017 5:52 pm Post subject: |
|
|
Just a quick update.
I switched to GCC-6.3.0 and the segfaults continued.
Then I switched from 4x8GB of CMK16GX4M2B3000C15 to 4x8GB of CMK16GX4M2B3200C16. The Segfaults became slightly less frequent (but this may just have been luck). Then I switched down to 2x8GB of CMK16GX4M2B3200C16 and OC'd it to 3200MHz 16-15-15-15-36 @ 1.35V and I haven't seen a Segfault since. I've compiled GHC, Chromium, and Firefox multiple times each.
I have some Ripjaws V arriving tomorrow. Going to see if I can get 32GB stable. |
|
Back to top |
|
 |
Keepco n00b

Joined: 02 Apr 2017 Posts: 5
|
Posted: Tue Apr 04, 2017 6:43 pm Post subject: |
|
|
Seems like I'm plagued by this as well. My Specs:
Ryzen 7 1700
MSI X370 XPower Titanium
16GB of Corsair Dominator RAM (CMD16GX4M2B3000C15), currently running @ 2133MHz
Just a few "big" packages fail for me, namely:
chromium webkit-gtk electron libreoffice
Is there any other way to get rid of this other than trying different RAM sticks? |
|
Back to top |
|
 |
c1pherx n00b

Joined: 02 Apr 2017 Posts: 7
|
Posted: Tue Apr 04, 2017 7:50 pm Post subject: |
|
|
Yea. I spoke too soon. I've reduced the frequency of it happening, but it is still happening. On to the next ideas.
One pattern I'm noticing is that now it seems to be happening with builds that use libtool. This may just be a correlation, but my most recent failures were gnutls (first time that's happened) and libseccomp (first time here too). Both use Libtool. |
|
Back to top |
|
 |
Keepco n00b

Joined: 02 Apr 2017 Posts: 5
|
Posted: Tue Apr 04, 2017 8:40 pm Post subject: |
|
|
What irritates me is that my chromium builds always segfaults with at least the last 2 lines being exactly the same (can't recall the other ones, should've written those down somewhere..)[/code]
Code: | In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/string:52:0,
from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/stdexcept:39,
from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/array:39,
from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/tuple:39,
from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/bits/stl_map.h:63,
from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/map:61,
from ../../ppapi/shared_impl/tracked_callback.h:10,
from ../../ppapi/thunk/ppb_output_protection_private_thunk.cc:13:
/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/bits/basic_string.h:1316:59: internal compiler error: Segmentation fault
insert(const_iterator __p, size_type __n, _CharT __c)
^
|
Even though it seems to be at different places during the compilation (but always during the main part of chromium, not during the sandbox etc.)
I also tried just using one of my RAM sticks (recompiled gcc after swapping them) and increased their voltage to the recommended value (1.35V, was 1.2V by default). |
|
Back to top |
|
 |
c1pherx n00b

Joined: 02 Apr 2017 Posts: 7
|
Posted: Tue Apr 04, 2017 9:00 pm Post subject: |
|
|
Keepco wrote: | What irritates me is that my chromium builds always segfaults with at least the last 2 lines being exactly the same (can't recall the other ones, should've written those down somewhere..)[/code]
Code: | In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/string:52:0,
from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/stdexcept:39,
from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/array:39,
from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/tuple:39,
from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/bits/stl_map.h:63,
from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/map:61,
from ../../ppapi/shared_impl/tracked_callback.h:10,
from ../../ppapi/thunk/ppb_output_protection_private_thunk.cc:13:
/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/bits/basic_string.h:1316:59: internal compiler error: Segmentation fault
insert(const_iterator __p, size_type __n, _CharT __c)
^
|
Even though it seems to be at different places during the compilation (but always during the main part of chromium, not during the sandbox etc.)
I also tried just using one of my RAM sticks (recompiled gcc after swapping them) and increased their voltage to the recommended value (1.35V, was 1.2V by default). |
Did you go from GCC-4.8 right up to GCC-6.3? If yes, did you remember to re-emerge libtool and run revdep-rebuild --library 'libstdc++\.so\.5'? |
|
Back to top |
|
 |
Keepco n00b

Joined: 02 Apr 2017 Posts: 5
|
Posted: Tue Apr 04, 2017 9:01 pm Post subject: |
|
|
c1pherx wrote: | Yea. I spoke too soon. I've reduced the frequency of it happening, but it is still happening. On to the next ideas.
One pattern I'm noticing is that now it seems to be happening with builds that use libtool. This may just be a correlation, but my most recent failures were gnutls (first time that's happened) and libseccomp (first time here too). Both use Libtool. |
Can't seem to reproduce the gnutls failure, just tried recompiling it 15 times, worked every time. Guess my problems is elsewhere.
EDIT: Just re-emerged GCC without -march=native it seems like that did the job.
Last edited by Keepco on Wed Apr 05, 2017 5:35 am; edited 1 time in total |
|
Back to top |
|
 |
|