Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Segfaults during compilation on AMD Ryzen.
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4 ... 9, 10, 11  Next  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Mon Apr 17, 2017 8:51 am    Post subject: Reply with quote

Nothing changed:
- R5 1600 with -march=Haswell, gcc-5.4.0 shows segfaults
- R7 1700 with -march=native, gcc-6.3.0 also shows segfaults

For testing I disabled the iommu on the R7 => segfaults
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54220
Location: 56N 3W

PostPosted: Mon Apr 17, 2017 9:04 am    Post subject: Reply with quote

drizzt,

You need to rebuild the entire toolchain (if you can), since one small section could have been built using unsupported opcodes.
Its likely that its something in the toolchain is affected rather than the input its processing. However a few packages do compile code then try to run it as a part of the build system.

Start off by running
Code:
/usr/portage/scripts/bootstrap.sh
then reboot as its going to rebuild glibc.
If that won't run cleanly ... it has to complete with no interruptions, unpick a stage 3 for the toolchain components and start with something you know is clean.

You are trying to to build something that works correctly using something that may be faulty.

If you have lots of time, you can replace the toolchain components one at at time, then test ...
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Mon Apr 17, 2017 9:12 am    Post subject: Reply with quote

Just to confirm:
I run the script with gcc-6.3.0 und march=native, right ?
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54220
Location: 56N 3W

PostPosted: Mon Apr 17, 2017 10:16 am    Post subject: Reply with quote

drizzt,

You run the script with your existing gcc and whatever -march you have set.
That script is the first step in a stage1 install. It gets you from the stage1 that you (used to) download, to stage2.

In days of old, there were i386 and i686 stage1 tarballs.
If you were installing on i484 or i586, you would use the i386 stage1 and edit the CHOST in make.conf to get the tool chain optimised for your CPU.

Other threads here suggest that -march=haswell is the least worst choice for Ryzen meanwhile, as gcc has few, if any optimisations for Ryzen yet.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Mon Apr 17, 2017 10:20 am    Post subject: Reply with quote

Yeah,
I remember those days with stage1.
I now run the script on the R5(gcc-5.4.0, march=haswell) and it went through. It told me to run emerge -e system which I do now.

Update
Same steps repeated with the R7. Let's see if anything works

Just out of curiosity:
Which programs are the toolchain ?
I assume:
- gcc
- binutils
- libtool
- glibc

Another question:
Do I need multiple versions of binutils ? I think portage tells me there is 2.27 and 2.25 installed. Maybe somethingis wrong there ?
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54220
Location: 56N 3W

PostPosted: Mon Apr 17, 2017 10:58 am    Post subject: Reply with quote

drizzt,

Q1 ... Read the script :)

binutils is slotted. You can choose which one is used.
Code:
$ eselect binutils list
 [1] aarch64-unknown-linux-gnu-2.27 *

 [2] armv6j-hardfloat-linux-gnueabi-2.26.1
 [3] armv6j-hardfloat-linux-gnueabi-2.27 *

 [4] armv7a-hardfloat-linux-gnueabi-2.27 *

 [5] i686-pc-linux-gnu-2.27 *

 [6] x86_64-pc-linux-gnu-2.27 *
I have several as I cross compile things.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
bgamari
n00b
n00b


Joined: 11 Apr 2017
Posts: 9

PostPosted: Mon Apr 17, 2017 1:08 pm    Post subject: Reply with quote

For what it's worth, I highly doubt that this has anything to do with compiler optimizations. I have also seen crashes of the Glasgow Haskell Compiler, the native code generator of which implements essentially no microarchitecture-specific optimizations. Moreover, I have seen segmentation faults of otherwise stable long-running processes after starting a compilation workload (e.g. mprime runs for hours on end alone, but crashes within an hour after a build is started).

A hypothesis I have been meaning to test is that the fluctuating nature of compilation workloads might be in part responsible for the instability.
Back to top
View user's profile Send private message
c1pherx
n00b
n00b


Joined: 02 Apr 2017
Posts: 7

PostPosted: Mon Apr 17, 2017 2:37 pm    Post subject: Reply with quote

bgamari wrote:
For what it's worth, I highly doubt that this has anything to do with compiler optimizations. I have also seen crashes of the Glasgow Haskell Compiler, the native code generator of which implements essentially no microarchitecture-specific optimizations. Moreover, I have seen segmentation faults of otherwise stable long-running processes after starting a compilation workload (e.g. mprime runs for hours on end alone, but crashes within an hour after a build is started).

A hypothesis I have been meaning to test is that the fluctuating nature of compilation workloads might be in part responsible for the instability.


I can add some supporting evidence to this hypothesis.

I have had my system slightly overclocked and undervolted since I was able to cause a Segfault running at stock frequency / voltage. As it turns out, I was able to cause failures in Prime95 in Windows when I added other workloads into the equation (I was trying to emulate compiling while running P95 which I've seen cause issues in Linux). After removing the overclock / undervolt, I could run Prime95 and some other CPU heavy task in Windows for 2 hours without failure. After that I switched back to Gentoo and ran Prime95 and a constant emerge of GHC for 5 hours before I induced an error. I'll need to give Windows a similar shake and see if I can cause it there too.

Other things I've figured out recently: Cool'n'Quiet needs to be enabled in the BIOS if you want your CPU frequency governor to apply. Switching to the "performance" governor seems to increase stability a bit.
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Mon Apr 17, 2017 4:05 pm    Post subject: Reply with quote

Following this old link http://gentoo-what-did-you-say.blogspot.com/2011/07/finding-cpu-flags-using-gcc.html, I find the following on my 4.9.4 kaveri machine
Code:
COLLECT_GCC_OPTIONS='-e' '-v' '-march=native'
 /usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.4/cc1 -quiet /usr/include/stdlib.h "-march=bdver3" -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mlwp -mfma -mfma4 -mxop -mbmi -mno-bmi2 -mtbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mno-rdrnd -mf16c -mfsgsbase -mno-rdseed -mprfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 --param "l1-cache-size=16" --param "l1-cache-line-size=64" --param "l2-cache-size=2048" "-mtune=bdver3" -quiet -dumpbase stdlib.h -auxbase stdlib -o /tmp/ccGEroKo.s "--output-pch=/usr/include/stdlib.h.gch"

COLLECT_GCC_OPTIONS='-e' '-v' '-march=kaveri'
 /usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.4/cc1 -quiet /usr/include/stdlib.h -quiet -dumpbase stdlib.h "-march=kaveri" -auxbase stdlib -o /tmp/cckDdkXv.s "--output-pch=/usr/include/stdlib.h.gch"

COLLECT_GCC_OPTIONS='-e' '-v' '-march=znver1'
 /usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.4/cc1 -quiet -v /usr/include/stdlib.h -quiet -dumpbase stdlib.h "-march=COLLECT_GCC_OPTIONS=-e" "-march=znver1" -auxbase stdlib -version -o /tmp/ccyO3gUd.s "--output-pch=/usr/include/stdlib.h.gch"

COLLECT_GCC_OPTIONS='-e' '-v' '-march= '-march=haswell'
 /usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.4/cc1 -quiet -v /usr/include/stdlib.h -quiet -dumpbase stdlib.h "-march=COLLECT_GCC_OPTIONS=-e" "-march=haswell" -auxbase stdlib -version -o /tmp/ccN2RAAN.s "--output-pch=/usr/include/stdlib.h.gch"


On my 6.3.0 Athlon II machine:
Code:
COLLECT_GCC_OPTIONS='-e' '-v' '-march=COLLECT_GCC_OPTIONS=-e' '-v' '-march=znver1'
 /usr/libexec/gcc/x86_64-pc-linux-gnu/6.3.0/cc1 -quiet -v /usr/include/stdlib.h -quiet -dumpbase stdlib.h "-march=COLLECT_GCC_OPTIONS=-e" "-march=znver1" -auxbase stdlib -version -o /tmp/cc17GWpy.s "--output-pch=/usr/include/stdlib.h.gch"

COLLECT_GCC_OPTIONS='-e' '-v' '-march=COLLECT_GCC_OPTIONS=-e' '-v' '-march=haswell'
 /usr/libexec/gcc/x86_64-pc-linux-gnu/6.3.0/cc1 -quiet -v /usr/include/stdlib.h -quiet -dumpbase stdlib.h "-march=COLLECT_GCC_OPTIONS=-e" "-march=haswell" -auxbase stdlib -version -o /tmp/cc1ZOHNR.s "--output-pch=/usr/include/stdlib.h.gch"


I don't think the link does what we want. I'll be happy to rerun these tests if someone has a better test command.

IMHO "haswell" avoids the segfaults because it avoids the instructions that have a bug in Ryzen. I have a hard time beleiving that AMD copied their architecture.
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 6051
Location: Removed by Neddy

PostPosted: Mon Apr 17, 2017 4:13 pm    Post subject: Reply with quote

Tony0945 wrote:


IMHO "haswell" avoids the segfaults because it avoids the instructions that have a bug in Ryzen. I have a hard time beleiving that AMD copied their architecture.
I agree, so it is either a bug in the uarch or a bug in gcc.
Haswell is "fine for now" option but obviously not the end solution.

what does cat /proc/cpuinfo | grep -m 1 flags show when compared to gcc options
_________________
Quote:
Removed by Chiitoo
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54220
Location: 56N 3W

PostPosted: Mon Apr 17, 2017 4:28 pm    Post subject: Reply with quote

Tony0945,

What does app-portage/cpuid2cpuflags say about a Ryzen too?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
trippels
Tux's lil' helper
Tux's lil' helper


Joined: 24 Nov 2010
Posts: 137
Location: Berlin

PostPosted: Mon Apr 17, 2017 4:59 pm    Post subject: Reply with quote

Naib wrote:
Tony0945 wrote:


IMHO "haswell" avoids the segfaults because it avoids the instructions that have a bug in Ryzen. I have a hard time beleiving that AMD copied their architecture.
I agree, so it is either a bug in the uarch or a bug in gcc.


If the kernel encounters an illegal instruction, you will get "trap invalid opcode" errors in dmesg and gcc. gcc will not randomly segfault in this case.
So this is a red herring.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Mon Apr 17, 2017 5:17 pm    Post subject: Reply with quote

Thank you all for your help and suggestions,
after rebuilding the toolchain with "correct" march several times I still encounter random segfaults during
Code:
emerge -e system

At the moment I suspect an incompatibility between RAM and Motherboard since I read Ryzen is (again) especially picky about RAM. I will try to get other Asus approved RAM as soon as possible.

Btw. as per Asus BIOS description and AMD my latest BIOS has the AGESA updates fixing the "FMA3"-Bug.

Thank you all again for trying to help and if anybody has a good idea I'm always open for testing.

At the moment I clocked down my RAM from 2400 to 2133 (read somewhere that most 2400 are not stable in 2 chips configuration) and I am testing the famous
Code:
emerge -e system
.

I'll post my results here.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Mon Apr 17, 2017 5:26 pm    Post subject: Reply with quote

NeddySeagoon wrote:
Tony0945,

What does app-portage/cpuid2cpuflags say about a Ryzen too?


Here the flags for Ryzen 7 1700:

Code:
CPU_FLAGS_X86: aes avx avx2 f16c fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3


and for Ryzen 5 1600:
Code:
CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3"

_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Mon Apr 17, 2017 5:39 pm    Post subject: Reply with quote

Setting RAM to 2133 doesn't help either. I will order new RAM and see if this helps.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
bgamari
n00b
n00b


Joined: 11 Apr 2017
Posts: 9

PostPosted: Tue Apr 18, 2017 12:12 am    Post subject: Reply with quote

drizzt wrote:
Setting RAM to 2133 doesn't help either. I will order new RAM and see if this helps.


If your experience reflects mine, it will make no difference. I have tried three different sets of memory to no avail.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Tue Apr 18, 2017 7:31 am    Post subject: Reply with quote

bgamari wrote:
drizzt wrote:
Setting RAM to 2133 doesn't help either. I will order new RAM and see if this helps.


If your experience reflects mine, it will make no difference. I have tried three different sets of memory to no avail.


Do you mean, the system still does not work correctly ?

Oh man, another problem:

Has anybody got the gentoo minimal AMD64 boot cd got up and running ?

if I try to boot with the cd, I can enter the kernel(or just press enter) and then the system just resets.

I can boot Clonezilla from USB though.

Update
Ok, seems to be a CD-booting-thing. I can boot gentoo fine from usb-stick.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54220
Location: 56N 3W

PostPosted: Tue Apr 18, 2017 8:23 am    Post subject: Reply with quote

drizzt,

The Gentoo minimal CDs do not support UEFI booting yet, (unless its changed recently).
Use System Rescue CD instead. Its Gentoo based.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
bgamari
n00b
n00b


Joined: 11 Apr 2017
Posts: 9

PostPosted: Tue Apr 18, 2017 11:54 am    Post subject: Reply with quote

drizzt wrote:
bgamari wrote:
drizzt wrote:
Setting RAM to 2133 doesn't help either. I will order new RAM and see if this helps.


If your experience reflects mine, it will make no difference. I have tried three different sets of memory to no avail.


Do you mean, the system still does not work correctly ?


Correct, I was able to reproduce the crashes with all three sets of memory.
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 6051
Location: Removed by Neddy

PostPosted: Tue Apr 18, 2017 12:11 pm    Post subject: Reply with quote

There are other threads where people Ryzen builds are functional, this would imply subtleties with your setup
could you provide

1) CPU type (1800, 1700...)
2) Motherboard,
3) BIOS version
4) RAM
5) BIOS settings w.r.t. RAM (voltage, timings, freq)
_________________
Quote:
Removed by Chiitoo
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Tue Apr 18, 2017 3:18 pm    Post subject: Reply with quote

NeddySeagoon wrote:
drizzt,

Q1 ... Read the script :)

binutils is slotted. You can choose which one is used.
Code:
$ eselect binutils list
 [1] aarch64-unknown-linux-gnu-2.27 *

 [2] armv6j-hardfloat-linux-gnueabi-2.26.1
 [3] armv6j-hardfloat-linux-gnueabi-2.27 *

 [4] armv7a-hardfloat-linux-gnueabi-2.27 *

 [5] i686-pc-linux-gnu-2.27 *

 [6] x86_64-pc-linux-gnu-2.27 *
I have several as I cross compile things.


NeddySeagoon, you are a genius !
I had 3 versions of binutils for the same architecture on my system. Guess what happened:
The newest one got always rebuild, but the oldest one was used.
I cleaned this mess up and I am compiling like crazy the whole day for testing. No segfaults so far on both systems.

Thank you all for your help and suggestions. Let's see if things are sorted out.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
liewyec
n00b
n00b


Joined: 03 Apr 2017
Posts: 9

PostPosted: Tue Apr 18, 2017 4:31 pm    Post subject: Reply with quote

drizzt wrote:

NeddySeagoon, you are a genius !
I had 3 versions of binutils for the same architecture on my system. Guess what happened:
The newest one got always rebuild, but the oldest one was used.
I cleaned this mess up and I am compiling like crazy the whole day for testing. No segfaults so far on both systems.

Thank you all for your help and suggestions. Let's see if things are sorted out.


well i have only one version of binutils 2.26.1. Do you still get segfaults?
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Tue Apr 18, 2017 5:49 pm    Post subject: Reply with quote

liewyec wrote:
drizzt wrote:

NeddySeagoon, you are a genius !
I had 3 versions of binutils for the same architecture on my system. Guess what happened:
The newest one got always rebuild, but the oldest one was used.
I cleaned this mess up and I am compiling like crazy the whole day for testing. No segfaults so far on both systems.

Thank you all for your help and suggestions. Let's see if things are sorted out.


well i have only one version of binutils 2.26.1. Do you still get segfaults?


No, still compiling like a maniac on both systems and no segfaults.

My Systems:
- R7 1700, 16GB RAM, gcc-5.4.0 (march=haswell), binutils 2.27, Kernel 4.10.8
- R5 1600, 16GB RAM, gcc-5.4.0 (march=haswell), binutils 2.27, Kernel 4.10.8

If you "upgraded" an existing system like me => I recompiled the toolchain at least 10 times. Looking back I think I should have started fresh.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
liewyec
n00b
n00b


Joined: 03 Apr 2017
Posts: 9

PostPosted: Tue Apr 18, 2017 6:01 pm    Post subject: Reply with quote

drizzt wrote:
liewyec wrote:
drizzt wrote:

NeddySeagoon, you are a genius !
I had 3 versions of binutils for the same architecture on my system. Guess what happened:
The newest one got always rebuild, but the oldest one was used.
I cleaned this mess up and I am compiling like crazy the whole day for testing. No segfaults so far on both systems.

Thank you all for your help and suggestions. Let's see if things are sorted out.


well i have only one version of binutils 2.26.1. Do you still get segfaults?


No, still compiling like a maniac on both systems and no segfaults.

My Systems:
- R7 1700, 16GB RAM, gcc-5.4.0 (march=haswell), binutils 2.27, Kernel 4.10.8
- R5 1600, 16GB RAM, gcc-5.4.0 (march=haswell), binutils 2.27, Kernel 4.10.8

If you "upgraded" an existing system like me => I recompiled the toolchain at least 10 times. Looking back I think I should have started fresh.


I recompiled existing system, but i tried new instalation, because of the segfaults. Today I upgraded to bin utils 2.27 and kernel 4.11-rc7 and i wil test this. My system is r7 1800x, 32gb ram, gcc-6.3.0
Back to top
View user's profile Send private message
roarinelk
Guru
Guru


Joined: 04 Mar 2004
Posts: 520

PostPosted: Tue Apr 18, 2017 7:21 pm    Post subject: Reply with quote

With gcc-6.3, I use "-march=znver1 -mtune=broadwell -fno-delete-null-pointer-checks" as CFLAGS. -march=znver1 enables use of all instruction sets available on Zen, and the broadwell tuning model produces notably faster-running code (than mtune=znver1). fno-delete-null-pointer-checks gets rid of a few segfaults in readline/ncurses/bash.
Upgrading from an older AMD system to Zen is tricky, because Zen dropped support for a few instructions which were introduced previously by AMD (3dnow, xop, fma4, tbm), and that does
cause tons of segfaults (SIGILL). starting fresh or upgrading from a haswell-based system is easier in this case.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page Previous  1, 2, 3, 4 ... 9, 10, 11  Next
Page 3 of 11

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum