Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Segfaults during compilation on AMD Ryzen.
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3 ... 9, 10, 11  Next  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
Kresp
Tux's lil' helper
Tux's lil' helper


Joined: 17 Oct 2016
Posts: 77

PostPosted: Sun Apr 02, 2017 7:23 am    Post subject: Segfaults during compilation on AMD Ryzen. Reply with quote

I often encounter segfaults during emerge builds of heavy packages like curl, hgc, llvm:
Code:
Apr  2 16:55:42 wagner kernel: [ 2188.416231] sh[5627]: segfault at 34 ip 0000000000406215 sp 00007ffdadd984c8 error 6 in bash[400000+a8000]
Apr  2 16:57:08 wagner kernel: [ 2273.706264] sh[15390]: segfault at e ip 0000000000406215 sp 00007ffc1c9a0d78 error 6 in bash[400000+a8000]
Apr  2 17:00:16 wagner kernel: [ 2461.767997] sh[19903]: segfault at 8 ip 0000000000406215 sp 00007ffd970f79c8 error 6 in bash[400000+a8000]

Usually just trying again is enough to finish it, even though sometimes it takes few retries.

I did memtest on all 4 RAM sticks just few days ago, before gentoo installation, so memory should be fine. CPU is not overclocked, RAM runs on 2133, nothing too crazy.
I don't really stress the system yet, CPU heatsink is always cold. I can not watch temps yet, since linux does not yet support new AMD R7 sensors, but according to BIOS, CPU fans never go above about 300+ RPM, with idle temp of about 44 C.

Gcc is 4.9.4, kernel - 4.10.6. CPU - AMD R7 1800X, motherboard - MSI X370 Titanium, BIOS 1.30 stable.

Any tips on where I should start looking?
Think I'm going to try running sysbench for some time and see if it crashes for starters, but I don't think this is hardware related.

Cpu flags set, from cpuinfo2cpuflags-x86:
Code:
CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3"


Last edited by Kresp on Mon Apr 03, 2017 4:00 am; edited 1 time in total
Back to top
View user's profile Send private message
Logicien
Veteran
Veteran


Joined: 16 Sep 2005
Posts: 1555
Location: Montréal

PostPosted: Sun Apr 02, 2017 7:36 am    Post subject: Reply with quote

How do you optimise Gcc compile time in /etc/portage/make.conf?
_________________
Paul
Back to top
View user's profile Send private message
Kresp
Tux's lil' helper
Tux's lil' helper


Joined: 17 Oct 2016
Posts: 77

PostPosted: Sun Apr 02, 2017 7:40 am    Post subject: Reply with quote

Logicien wrote:
How do you optimise Gcc compile time in /etc/portage/make.conf?

I'm not sure what you mean. Are you talking about MAKEOPTS?
I'll just post full make.conf for completeness:
Code:
# These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /usr/share/portage/config/make.conf.example for a more
# detailed example.
CFLAGS="-march=native -O2 -pipe"
CXXFLAGS="${CFLAGS}"
MAKEOPTS="-j16"
GRUB_PLATFORMS="efi-64"
VIDEO_CARDS="nouveau"
# WARNING: Changing your CHOST is not something that should be done lightly.
# Please consult http://www.gentoo.org/doc/en/change-chost.xml before changing.
CHOST="x86_64-pc-linux-gnu"
# These are the USE and USE_EXPAND flags that were used for
# buidling in addition to what is provided by the profile.
USE="X aac alsa asm bash-completion cli crypt cups emacs encode exif fbcon ffmpeg flac fontconfig gif git gnome-keyring gtk idn -ieee1394 imap ipv6 -java javascript jit jpeg lame lm_sensors lzma  mad matroska mime mng modules mozilla mp3 mp4 mpeg multilib ncurses offensive ogg opengl openmp pdf png policykit posix -pulseaudio python quicktime raw rdp readline rss samba scanner smp sockets socks5 sound sqlite ssl svg -systemd theora threads truetype unicode usb vdpau vorbis wav wavpack x264 xattr xml xvid zlib"
CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3"
PORTDIR="/usr/portage"
DISTDIR="${PORTDIR}/distfiles"
PKGDIR="${PORTDIR}/packages"
Back to top
View user's profile Send private message
Logicien
Veteran
Veteran


Joined: 16 Sep 2005
Posts: 1555
Location: Montréal

PostPosted: Sun Apr 02, 2017 11:09 am    Post subject: Reply with quote

Kresp wrote:
#CFLAGS="-march=native -O2 -pipe"
#CXXFLAGS="${CFLAGS}"
#MAKEOPTS="-j16"


I would revert those variables to their defaults by putting them in remarks. I am not an expert of Gcc and make but -j16 is very excessive. It is known that too agressive optimisations lead to errors during compilation.
_________________
Paul
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Sun Apr 02, 2017 11:31 am    Post subject: Reply with quote

Kresp,

Segfaults during build normally indicate a hardware issue.

However, with such an old gcc on such new hardware, I would not rule out other things.
gcc-4.9.4 does not understand -march=native for Ryzen. That needs gcc-6.3 which is still hard masked in Gentoo.
I'm not suggesting that you upgrade to that. The update is not trivial and gcc-6.3 does not work for everything yet.

That the problem is intermittent points to hardware.

Is your BIOS the latest available version?
There have been a rash of BIOS updates since Rzyen was released.
Update your BIOS as a first step, if there in a newer one for your motherboard.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Kresp
Tux's lil' helper
Tux's lil' helper


Joined: 17 Oct 2016
Posts: 77

PostPosted: Sun Apr 02, 2017 12:51 pm    Post subject: Reply with quote

NeddySeagoon wrote:

Is your BIOS the latest available version?

It is latest stable, 1.30. There was 1.41 beta, but it was pulled due to some issues - apparently some people had motherboard bricking due to applying it.

I ran sysbench for cpu with 16 threads for three hours, everything was fine. Heatsink got warm, but barely - about 35-37 C I'd say, so probably not an overheating issue.



I upgraded to gcc 5.4.0-r3. revdep-rebuild ended up rebuilding 60 packages, including few heavy ones like thunderbird. Also installed emacs, gdb and openmw with dependencies - not a single segfault so far.
I'll continue watching emerges closely for the next few days, but this upgrade to gcc5 seems to have fixed the problem.

I've read in Ryzen thread that with this CPU gcc6 is desirable, but decided against using it, since there is a slew of open tickets for in on bug tracker yet.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Sun Apr 02, 2017 2:20 pm    Post subject: Reply with quote

Kresp,

The rebuilds between gcc-4.x and gcc-5.x are due to the C++ ABI change.
Its not required every gcc major version update.
gcc-5.x to gcc-6.x is painless except for packages that have problems with gcc-6.

You are quite right to take things slowly.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
limn
l33t
l33t


Joined: 13 May 2005
Posts: 997

PostPosted: Sun Apr 02, 2017 3:59 pm    Post subject: Reply with quote

Those are not compiler faults, at least not directly. It's your shell the kernel is complaining about.
Back to top
View user's profile Send private message
c1pherx
n00b
n00b


Joined: 02 Apr 2017
Posts: 7

PostPosted: Sun Apr 02, 2017 4:50 pm    Post subject: Reply with quote

I'm running a Ryzen 1700X with the ASRock Taichi and 32GB of RAM. I've tried GCC 4.9.4, 5.4.0, and 6.3.0 and I've seen some sporadic segmentation faults on all three. Also an up-to-date BIOS, Memtest returns no issues, and stress doesn't seem to cause issues. So far I haven't been able to isolate exactly what is causing it, but I don't think it's hardware. Some things I do know:

gcc-5.4.0 with -march=native is -march=bdver4.
gcc-6.3.0 with -march=native is -march=znver1

I know there are some significant differences between Bulldozer and Zen, but 6.3.0 has also produces the occasional segfault despite the newer -march. Currently re-emerging all of world with 6.3.0 to see if that helps matters.


Last edited by c1pherx on Sun Apr 02, 2017 9:37 pm; edited 1 time in total
Back to top
View user's profile Send private message
toralf
Developer
Developer


Joined: 01 Feb 2004
Posts: 3919
Location: Hamburg

PostPosted: Sun Apr 02, 2017 4:53 pm    Post subject: Re: Segfaults during compilation. Reply with quote

Kresp wrote:
I often encounter segfaults during emerge builds of heavy packages like curl, hgc, llvm:
Code:
Apr  2 16:55:42 wagner kernel: [ 2188.416231] sh[5627]: segfault at 34 ip 0000000000406215 sp 00007ffdadd984c8 error 6 in bash[400000+a8000]
Apr  2 16:57:08 wagner kernel: [ 2273.706264] sh[15390]: segfault at e ip 0000000000406215 sp 00007ffc1c9a0d78 error 6 in bash[400000+a8000]
Apr  2 17:00:16 wagner kernel: [ 2461.767997] sh[19903]: segfault at 8 ip 0000000000406215 sp 00007ffd970f79c8 error 6 in bash[400000+a8000]

Usually just trying again is enough to finish it, even though sometimes it takes few retries.
I'd change the parallel make jobs from -j16 to -j8 and would check whether the compile issues go away or not.
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 8291
Location: Saint Amant, Acadiana

PostPosted: Sun Apr 02, 2017 5:16 pm    Post subject: Reply with quote

Memtest results are not conclusive, it can run for days without error and the memory can still be faulty. Only if it tells you the RAM is bad you can believe it.
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Sun Apr 02, 2017 5:35 pm    Post subject: Reply with quote

Jaglover,

Its worse than that.

Only if memtest tells you the RAM is bad at the same address on several cycles, is there a good chance its the RAM.
It can also be the memory controller (in the processor on Ryzen) or the local voltage regulator (on the motherboard) for the RAM.

Errors at random addresses are probably not RAM.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
limn
l33t
l33t


Joined: 13 May 2005
Posts: 997

PostPosted: Sun Apr 02, 2017 5:38 pm    Post subject: Reply with quote

FMA3 instruction problem?
It is described as intermittent, but also as not affecting Linux.
Back to top
View user's profile Send private message
Kresp
Tux's lil' helper
Tux's lil' helper


Joined: 17 Oct 2016
Posts: 77

PostPosted: Mon Apr 03, 2017 10:41 am    Post subject: Reply with quote

Well, gcc5 was not a panacea - segfaults still happen.

I removed fma3 flag to check if this is related to that CPU bug, but sudo emerge --ask --newuse --update @world did not rebuild anything.


Will now try disabling SMT in UEFI and changing MAKEOPTS to 8. SMT/HT is marketing gimmick anyway.
Back to top
View user's profile Send private message
limn
l33t
l33t


Joined: 13 May 2005
Posts: 997

PostPosted: Mon Apr 03, 2017 11:15 am    Post subject: Reply with quote

emerge --newuse --update does not consider changes to CPU_FLAGS_X86.

Recompile a package that failed an arbitrary number of times with the cpu flag until you have at least one failure. Then remove flag and compile it at least as many times.

Even then you may not know until you apply the FMA3 fix.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Mon Apr 03, 2017 2:23 pm    Post subject: Reply with quote

Kresp, limn,

CPU_FLAGS_X86 applies only to hand optimised code segments where a package advertises that such speedups are available for user selection.
They do not affect the code emitted by gcc.

To stop gcc using FMA3, you need to find the name of the option add add it to cflags.
Code:
 -mno-fma
looks promising.
You cannot tell where gcc has used FMA3, if anywhere, so to be sure its not used, you need to do
Code:
emerge -e @world --with-bdeps=y

_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
limn
l33t
l33t


Joined: 13 May 2005
Posts: 997

PostPosted: Mon Apr 03, 2017 4:47 pm    Post subject: Reply with quote

Neddy,

Are you saying that CPU_FLAGS_X86 activate optimizations in the binary at run time?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Mon Apr 03, 2017 5:53 pm    Post subject: Reply with quote

limn,

No. In the source at build time. They are exactly like USE flags, which is what they were at one time.

CPU_FLAGS_X86="mmx" will include sections of optional code in the source that have been hand optimised to make use of the mmx instruction set.
CFLAGS="-mmmx" (is that right?) allows gcc to emit mmx instructions in the course of any build.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
liewyec
n00b
n00b


Joined: 03 Apr 2017
Posts: 9

PostPosted: Mon Apr 03, 2017 6:08 pm    Post subject: Reply with quote

i got the same problem. I have ryzen 1800x, asus prime x370 pro. I have only two sticks of ram, memtest also no errors.

I can sometimes compile entire chromium with no errors and sometimes it crashes multiple times in a row compiling just a few packages and it is not even running 16 threads, It is really strange. I tried disable optimization but it didn't help.

In the end i wrote a script that will restart emerge few times if it fails.
Back to top
View user's profile Send private message
c1pherx
n00b
n00b


Joined: 02 Apr 2017
Posts: 7

PostPosted: Tue Apr 04, 2017 5:52 pm    Post subject: Reply with quote

Just a quick update.

I switched to GCC-6.3.0 and the segfaults continued.

Then I switched from 4x8GB of CMK16GX4M2B3000C15 to 4x8GB of CMK16GX4M2B3200C16. The Segfaults became slightly less frequent (but this may just have been luck). Then I switched down to 2x8GB of CMK16GX4M2B3200C16 and OC'd it to 3200MHz 16-15-15-15-36 @ 1.35V and I haven't seen a Segfault since. I've compiled GHC, Chromium, and Firefox multiple times each.

I have some Ripjaws V arriving tomorrow. Going to see if I can get 32GB stable.
Back to top
View user's profile Send private message
Keepco
n00b
n00b


Joined: 02 Apr 2017
Posts: 5

PostPosted: Tue Apr 04, 2017 6:43 pm    Post subject: Reply with quote

Seems like I'm plagued by this as well. My Specs:

Ryzen 7 1700
MSI X370 XPower Titanium
16GB of Corsair Dominator RAM (CMD16GX4M2B3000C15), currently running @ 2133MHz

Just a few "big" packages fail for me, namely:

chromium webkit-gtk electron libreoffice

Is there any other way to get rid of this other than trying different RAM sticks?
Back to top
View user's profile Send private message
c1pherx
n00b
n00b


Joined: 02 Apr 2017
Posts: 7

PostPosted: Tue Apr 04, 2017 7:50 pm    Post subject: Reply with quote

Yea. I spoke too soon. I've reduced the frequency of it happening, but it is still happening. On to the next ideas.

One pattern I'm noticing is that now it seems to be happening with builds that use libtool. This may just be a correlation, but my most recent failures were gnutls (first time that's happened) and libseccomp (first time here too). Both use Libtool.
Back to top
View user's profile Send private message
Keepco
n00b
n00b


Joined: 02 Apr 2017
Posts: 5

PostPosted: Tue Apr 04, 2017 8:40 pm    Post subject: Reply with quote

What irritates me is that my chromium builds always segfaults with at least the last 2 lines being exactly the same (can't recall the other ones, should've written those down somewhere..)[/code]

Code:
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/string:52:0,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/stdexcept:39,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/array:39,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/tuple:39,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/bits/stl_map.h:63,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/map:61,
                 from ../../ppapi/shared_impl/tracked_callback.h:10,
                 from ../../ppapi/thunk/ppb_output_protection_private_thunk.cc:13:
/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/bits/basic_string.h:1316:59: internal compiler error: Segmentation fault
       insert(const_iterator __p, size_type __n, _CharT __c)
                                                                                              ^


Even though it seems to be at different places during the compilation (but always during the main part of chromium, not during the sandbox etc.)
I also tried just using one of my RAM sticks (recompiled gcc after swapping them) and increased their voltage to the recommended value (1.35V, was 1.2V by default).
Back to top
View user's profile Send private message
c1pherx
n00b
n00b


Joined: 02 Apr 2017
Posts: 7

PostPosted: Tue Apr 04, 2017 9:00 pm    Post subject: Reply with quote

Keepco wrote:
What irritates me is that my chromium builds always segfaults with at least the last 2 lines being exactly the same (can't recall the other ones, should've written those down somewhere..)[/code]

Code:
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/string:52:0,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/stdexcept:39,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/array:39,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/tuple:39,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/bits/stl_map.h:63,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/map:61,
                 from ../../ppapi/shared_impl/tracked_callback.h:10,
                 from ../../ppapi/thunk/ppb_output_protection_private_thunk.cc:13:
/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/include/g++-v6/bits/basic_string.h:1316:59: internal compiler error: Segmentation fault
       insert(const_iterator __p, size_type __n, _CharT __c)
                                                                                              ^


Even though it seems to be at different places during the compilation (but always during the main part of chromium, not during the sandbox etc.)
I also tried just using one of my RAM sticks (recompiled gcc after swapping them) and increased their voltage to the recommended value (1.35V, was 1.2V by default).


Did you go from GCC-4.8 right up to GCC-6.3? If yes, did you remember to re-emerge libtool and run revdep-rebuild --library 'libstdc++\.so\.5'?
Back to top
View user's profile Send private message
Keepco
n00b
n00b


Joined: 02 Apr 2017
Posts: 5

PostPosted: Tue Apr 04, 2017 9:01 pm    Post subject: Reply with quote

c1pherx wrote:
Yea. I spoke too soon. I've reduced the frequency of it happening, but it is still happening. On to the next ideas.

One pattern I'm noticing is that now it seems to be happening with builds that use libtool. This may just be a correlation, but my most recent failures were gnutls (first time that's happened) and libseccomp (first time here too). Both use Libtool.


Can't seem to reproduce the gnutls failure, just tried recompiling it 15 times, worked every time. Guess my problems is elsewhere.

EDIT: Just re-emerged GCC without -march=native it seems like that did the job.


Last edited by Keepco on Wed Apr 05, 2017 5:35 am; edited 1 time in total
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page 1, 2, 3 ... 9, 10, 11  Next
Page 1 of 11

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum