View previous topic :: View next topic |
Author |
Message |
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Mon Apr 17, 2017 8:51 am Post subject: |
|
|
Nothing changed:
- R5 1600 with -march=Haswell, gcc-5.4.0 shows segfaults
- R7 1700 with -march=native, gcc-6.3.0 also shows segfaults
For testing I disabled the iommu on the R7 => segfaults _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54033 Location: 56N 3W
|
Posted: Mon Apr 17, 2017 9:04 am Post subject: |
|
|
drizzt,
You need to rebuild the entire toolchain (if you can), since one small section could have been built using unsupported opcodes.
Its likely that its something in the toolchain is affected rather than the input its processing. However a few packages do compile code then try to run it as a part of the build system.
Start off by running Code: | /usr/portage/scripts/bootstrap.sh | then reboot as its going to rebuild glibc.
If that won't run cleanly ... it has to complete with no interruptions, unpick a stage 3 for the toolchain components and start with something you know is clean.
You are trying to to build something that works correctly using something that may be faulty.
If you have lots of time, you can replace the toolchain components one at at time, then test ... _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Mon Apr 17, 2017 9:12 am Post subject: |
|
|
Just to confirm:
I run the script with gcc-6.3.0 und march=native, right ? _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54033 Location: 56N 3W
|
Posted: Mon Apr 17, 2017 10:16 am Post subject: |
|
|
drizzt,
You run the script with your existing gcc and whatever -march you have set.
That script is the first step in a stage1 install. It gets you from the stage1 that you (used to) download, to stage2.
In days of old, there were i386 and i686 stage1 tarballs.
If you were installing on i484 or i586, you would use the i386 stage1 and edit the CHOST in make.conf to get the tool chain optimised for your CPU.
Other threads here suggest that -march=haswell is the least worst choice for Ryzen meanwhile, as gcc has few, if any optimisations for Ryzen yet. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Mon Apr 17, 2017 10:20 am Post subject: |
|
|
Yeah,
I remember those days with stage1.
I now run the script on the R5(gcc-5.4.0, march=haswell) and it went through. It told me to run emerge -e system which I do now.
Update
Same steps repeated with the R7. Let's see if anything works
Just out of curiosity:
Which programs are the toolchain ?
I assume:
- gcc
- binutils
- libtool
- glibc
Another question:
Do I need multiple versions of binutils ? I think portage tells me there is 2.27 and 2.25 installed. Maybe somethingis wrong there ? _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54033 Location: 56N 3W
|
Posted: Mon Apr 17, 2017 10:58 am Post subject: |
|
|
drizzt,
Q1 ... Read the script :)
binutils is slotted. You can choose which one is used.
Code: | $ eselect binutils list
[1] aarch64-unknown-linux-gnu-2.27 *
[2] armv6j-hardfloat-linux-gnueabi-2.26.1
[3] armv6j-hardfloat-linux-gnueabi-2.27 *
[4] armv7a-hardfloat-linux-gnueabi-2.27 *
[5] i686-pc-linux-gnu-2.27 *
[6] x86_64-pc-linux-gnu-2.27 * | I have several as I cross compile things. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
bgamari n00b
Joined: 11 Apr 2017 Posts: 9
|
Posted: Mon Apr 17, 2017 1:08 pm Post subject: |
|
|
For what it's worth, I highly doubt that this has anything to do with compiler optimizations. I have also seen crashes of the Glasgow Haskell Compiler, the native code generator of which implements essentially no microarchitecture-specific optimizations. Moreover, I have seen segmentation faults of otherwise stable long-running processes after starting a compilation workload (e.g. mprime runs for hours on end alone, but crashes within an hour after a build is started).
A hypothesis I have been meaning to test is that the fluctuating nature of compilation workloads might be in part responsible for the instability. |
|
Back to top |
|
|
c1pherx n00b
Joined: 02 Apr 2017 Posts: 7
|
Posted: Mon Apr 17, 2017 2:37 pm Post subject: |
|
|
bgamari wrote: | For what it's worth, I highly doubt that this has anything to do with compiler optimizations. I have also seen crashes of the Glasgow Haskell Compiler, the native code generator of which implements essentially no microarchitecture-specific optimizations. Moreover, I have seen segmentation faults of otherwise stable long-running processes after starting a compilation workload (e.g. mprime runs for hours on end alone, but crashes within an hour after a build is started).
A hypothesis I have been meaning to test is that the fluctuating nature of compilation workloads might be in part responsible for the instability. |
I can add some supporting evidence to this hypothesis.
I have had my system slightly overclocked and undervolted since I was able to cause a Segfault running at stock frequency / voltage. As it turns out, I was able to cause failures in Prime95 in Windows when I added other workloads into the equation (I was trying to emulate compiling while running P95 which I've seen cause issues in Linux). After removing the overclock / undervolt, I could run Prime95 and some other CPU heavy task in Windows for 2 hours without failure. After that I switched back to Gentoo and ran Prime95 and a constant emerge of GHC for 5 hours before I induced an error. I'll need to give Windows a similar shake and see if I can cause it there too.
Other things I've figured out recently: Cool'n'Quiet needs to be enabled in the BIOS if you want your CPU frequency governor to apply. Switching to the "performance" governor seems to increase stability a bit. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Mon Apr 17, 2017 4:05 pm Post subject: |
|
|
Following this old link http://gentoo-what-did-you-say.blogspot.com/2011/07/finding-cpu-flags-using-gcc.html, I find the following on my 4.9.4 kaveri machine Code: | COLLECT_GCC_OPTIONS='-e' '-v' '-march=native'
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.4/cc1 -quiet /usr/include/stdlib.h "-march=bdver3" -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mlwp -mfma -mfma4 -mxop -mbmi -mno-bmi2 -mtbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mno-rdrnd -mf16c -mfsgsbase -mno-rdseed -mprfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 --param "l1-cache-size=16" --param "l1-cache-line-size=64" --param "l2-cache-size=2048" "-mtune=bdver3" -quiet -dumpbase stdlib.h -auxbase stdlib -o /tmp/ccGEroKo.s "--output-pch=/usr/include/stdlib.h.gch"
COLLECT_GCC_OPTIONS='-e' '-v' '-march=kaveri'
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.4/cc1 -quiet /usr/include/stdlib.h -quiet -dumpbase stdlib.h "-march=kaveri" -auxbase stdlib -o /tmp/cckDdkXv.s "--output-pch=/usr/include/stdlib.h.gch"
COLLECT_GCC_OPTIONS='-e' '-v' '-march=znver1'
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.4/cc1 -quiet -v /usr/include/stdlib.h -quiet -dumpbase stdlib.h "-march=COLLECT_GCC_OPTIONS=-e" "-march=znver1" -auxbase stdlib -version -o /tmp/ccyO3gUd.s "--output-pch=/usr/include/stdlib.h.gch"
COLLECT_GCC_OPTIONS='-e' '-v' '-march= '-march=haswell'
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.4/cc1 -quiet -v /usr/include/stdlib.h -quiet -dumpbase stdlib.h "-march=COLLECT_GCC_OPTIONS=-e" "-march=haswell" -auxbase stdlib -version -o /tmp/ccN2RAAN.s "--output-pch=/usr/include/stdlib.h.gch"
|
On my 6.3.0 Athlon II machine: Code: | COLLECT_GCC_OPTIONS='-e' '-v' '-march=COLLECT_GCC_OPTIONS=-e' '-v' '-march=znver1'
/usr/libexec/gcc/x86_64-pc-linux-gnu/6.3.0/cc1 -quiet -v /usr/include/stdlib.h -quiet -dumpbase stdlib.h "-march=COLLECT_GCC_OPTIONS=-e" "-march=znver1" -auxbase stdlib -version -o /tmp/cc17GWpy.s "--output-pch=/usr/include/stdlib.h.gch"
COLLECT_GCC_OPTIONS='-e' '-v' '-march=COLLECT_GCC_OPTIONS=-e' '-v' '-march=haswell'
/usr/libexec/gcc/x86_64-pc-linux-gnu/6.3.0/cc1 -quiet -v /usr/include/stdlib.h -quiet -dumpbase stdlib.h "-march=COLLECT_GCC_OPTIONS=-e" "-march=haswell" -auxbase stdlib -version -o /tmp/cc1ZOHNR.s "--output-pch=/usr/include/stdlib.h.gch"
|
I don't think the link does what we want. I'll be happy to rerun these tests if someone has a better test command.
IMHO "haswell" avoids the segfaults because it avoids the instructions that have a bug in Ryzen. I have a hard time beleiving that AMD copied their architecture. |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6050 Location: Removed by Neddy
|
Posted: Mon Apr 17, 2017 4:13 pm Post subject: |
|
|
Tony0945 wrote: |
IMHO "haswell" avoids the segfaults because it avoids the instructions that have a bug in Ryzen. I have a hard time beleiving that AMD copied their architecture. | I agree, so it is either a bug in the uarch or a bug in gcc.
Haswell is "fine for now" option but obviously not the end solution.
what does cat /proc/cpuinfo | grep -m 1 flags show when compared to gcc options _________________
Quote: | Removed by Chiitoo |
|
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54033 Location: 56N 3W
|
Posted: Mon Apr 17, 2017 4:28 pm Post subject: |
|
|
Tony0945,
What does app-portage/cpuid2cpuflags say about a Ryzen too? _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
trippels Tux's lil' helper
Joined: 24 Nov 2010 Posts: 137 Location: Berlin
|
Posted: Mon Apr 17, 2017 4:59 pm Post subject: |
|
|
Naib wrote: | Tony0945 wrote: |
IMHO "haswell" avoids the segfaults because it avoids the instructions that have a bug in Ryzen. I have a hard time beleiving that AMD copied their architecture. | I agree, so it is either a bug in the uarch or a bug in gcc. |
If the kernel encounters an illegal instruction, you will get "trap invalid opcode" errors in dmesg and gcc. gcc will not randomly segfault in this case.
So this is a red herring. |
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Mon Apr 17, 2017 5:17 pm Post subject: |
|
|
Thank you all for your help and suggestions,
after rebuilding the toolchain with "correct" march several times I still encounter random segfaults during
At the moment I suspect an incompatibility between RAM and Motherboard since I read Ryzen is (again) especially picky about RAM. I will try to get other Asus approved RAM as soon as possible.
Btw. as per Asus BIOS description and AMD my latest BIOS has the AGESA updates fixing the "FMA3"-Bug.
Thank you all again for trying to help and if anybody has a good idea I'm always open for testing.
At the moment I clocked down my RAM from 2400 to 2133 (read somewhere that most 2400 are not stable in 2 chips configuration) and I am testing the famous .
I'll post my results here. _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Mon Apr 17, 2017 5:26 pm Post subject: |
|
|
NeddySeagoon wrote: | Tony0945,
What does app-portage/cpuid2cpuflags say about a Ryzen too? |
Here the flags for Ryzen 7 1700:
Code: | CPU_FLAGS_X86: aes avx avx2 f16c fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 |
and for Ryzen 5 1600:
Code: | CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3" |
_________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Mon Apr 17, 2017 5:39 pm Post subject: |
|
|
Setting RAM to 2133 doesn't help either. I will order new RAM and see if this helps. _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
bgamari n00b
Joined: 11 Apr 2017 Posts: 9
|
Posted: Tue Apr 18, 2017 12:12 am Post subject: |
|
|
drizzt wrote: | Setting RAM to 2133 doesn't help either. I will order new RAM and see if this helps. |
If your experience reflects mine, it will make no difference. I have tried three different sets of memory to no avail. |
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Tue Apr 18, 2017 7:31 am Post subject: |
|
|
bgamari wrote: | drizzt wrote: | Setting RAM to 2133 doesn't help either. I will order new RAM and see if this helps. |
If your experience reflects mine, it will make no difference. I have tried three different sets of memory to no avail. |
Do you mean, the system still does not work correctly ?
Oh man, another problem:
Has anybody got the gentoo minimal AMD64 boot cd got up and running ?
if I try to boot with the cd, I can enter the kernel(or just press enter) and then the system just resets.
I can boot Clonezilla from USB though.
Update
Ok, seems to be a CD-booting-thing. I can boot gentoo fine from usb-stick. _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54033 Location: 56N 3W
|
Posted: Tue Apr 18, 2017 8:23 am Post subject: |
|
|
drizzt,
The Gentoo minimal CDs do not support UEFI booting yet, (unless its changed recently).
Use System Rescue CD instead. Its Gentoo based. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
bgamari n00b
Joined: 11 Apr 2017 Posts: 9
|
Posted: Tue Apr 18, 2017 11:54 am Post subject: |
|
|
drizzt wrote: | bgamari wrote: | drizzt wrote: | Setting RAM to 2133 doesn't help either. I will order new RAM and see if this helps. |
If your experience reflects mine, it will make no difference. I have tried three different sets of memory to no avail. |
Do you mean, the system still does not work correctly ? |
Correct, I was able to reproduce the crashes with all three sets of memory. |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6050 Location: Removed by Neddy
|
Posted: Tue Apr 18, 2017 12:11 pm Post subject: |
|
|
There are other threads where people Ryzen builds are functional, this would imply subtleties with your setup
could you provide
1) CPU type (1800, 1700...)
2) Motherboard,
3) BIOS version
4) RAM
5) BIOS settings w.r.t. RAM (voltage, timings, freq) _________________
Quote: | Removed by Chiitoo |
|
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Tue Apr 18, 2017 3:18 pm Post subject: |
|
|
NeddySeagoon wrote: | drizzt,
Q1 ... Read the script
binutils is slotted. You can choose which one is used.
Code: | $ eselect binutils list
[1] aarch64-unknown-linux-gnu-2.27 *
[2] armv6j-hardfloat-linux-gnueabi-2.26.1
[3] armv6j-hardfloat-linux-gnueabi-2.27 *
[4] armv7a-hardfloat-linux-gnueabi-2.27 *
[5] i686-pc-linux-gnu-2.27 *
[6] x86_64-pc-linux-gnu-2.27 * | I have several as I cross compile things. |
NeddySeagoon, you are a genius !
I had 3 versions of binutils for the same architecture on my system. Guess what happened:
The newest one got always rebuild, but the oldest one was used.
I cleaned this mess up and I am compiling like crazy the whole day for testing. No segfaults so far on both systems.
Thank you all for your help and suggestions. Let's see if things are sorted out. _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
liewyec n00b
Joined: 03 Apr 2017 Posts: 9
|
Posted: Tue Apr 18, 2017 4:31 pm Post subject: |
|
|
drizzt wrote: |
NeddySeagoon, you are a genius !
I had 3 versions of binutils for the same architecture on my system. Guess what happened:
The newest one got always rebuild, but the oldest one was used.
I cleaned this mess up and I am compiling like crazy the whole day for testing. No segfaults so far on both systems.
Thank you all for your help and suggestions. Let's see if things are sorted out. |
well i have only one version of binutils 2.26.1. Do you still get segfaults? |
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Tue Apr 18, 2017 5:49 pm Post subject: |
|
|
liewyec wrote: | drizzt wrote: |
NeddySeagoon, you are a genius !
I had 3 versions of binutils for the same architecture on my system. Guess what happened:
The newest one got always rebuild, but the oldest one was used.
I cleaned this mess up and I am compiling like crazy the whole day for testing. No segfaults so far on both systems.
Thank you all for your help and suggestions. Let's see if things are sorted out. |
well i have only one version of binutils 2.26.1. Do you still get segfaults? |
No, still compiling like a maniac on both systems and no segfaults.
My Systems:
- R7 1700, 16GB RAM, gcc-5.4.0 (march=haswell), binutils 2.27, Kernel 4.10.8
- R5 1600, 16GB RAM, gcc-5.4.0 (march=haswell), binutils 2.27, Kernel 4.10.8
If you "upgraded" an existing system like me => I recompiled the toolchain at least 10 times. Looking back I think I should have started fresh. _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
liewyec n00b
Joined: 03 Apr 2017 Posts: 9
|
Posted: Tue Apr 18, 2017 6:01 pm Post subject: |
|
|
drizzt wrote: | liewyec wrote: | drizzt wrote: |
NeddySeagoon, you are a genius !
I had 3 versions of binutils for the same architecture on my system. Guess what happened:
The newest one got always rebuild, but the oldest one was used.
I cleaned this mess up and I am compiling like crazy the whole day for testing. No segfaults so far on both systems.
Thank you all for your help and suggestions. Let's see if things are sorted out. |
well i have only one version of binutils 2.26.1. Do you still get segfaults? |
No, still compiling like a maniac on both systems and no segfaults.
My Systems:
- R7 1700, 16GB RAM, gcc-5.4.0 (march=haswell), binutils 2.27, Kernel 4.10.8
- R5 1600, 16GB RAM, gcc-5.4.0 (march=haswell), binutils 2.27, Kernel 4.10.8
If you "upgraded" an existing system like me => I recompiled the toolchain at least 10 times. Looking back I think I should have started fresh. |
I recompiled existing system, but i tried new instalation, because of the segfaults. Today I upgraded to bin utils 2.27 and kernel 4.11-rc7 and i wil test this. My system is r7 1800x, 32gb ram, gcc-6.3.0 |
|
Back to top |
|
|
roarinelk Guru
Joined: 04 Mar 2004 Posts: 520
|
Posted: Tue Apr 18, 2017 7:21 pm Post subject: |
|
|
With gcc-6.3, I use "-march=znver1 -mtune=broadwell -fno-delete-null-pointer-checks" as CFLAGS. -march=znver1 enables use of all instruction sets available on Zen, and the broadwell tuning model produces notably faster-running code (than mtune=znver1). fno-delete-null-pointer-checks gets rid of a few segfaults in readline/ncurses/bash.
Upgrading from an older AMD system to Zen is tricky, because Zen dropped support for a few instructions which were introduced previously by AMD (3dnow, xop, fma4, tbm), and that does
cause tons of segfaults (SIGILL). starting fresh or upgrading from a haswell-based system is easier in this case. |
|
Back to top |
|
|
|