View previous topic :: View next topic |
Author |
Message |
groeck n00b
Joined: 05 Apr 2017 Posts: 7
|
Posted: Wed Apr 05, 2017 3:42 am Post subject: gcc compile errors with Ryzen |
|
|
I see similar failures. Various gcc versions all the way from 4.4.7 up to 6.3. Various cross compile architectures. I tried with three motherboards (Gigabyte AB350 Gaming, Gigabyte AB350 Gaming 3, and MSI Tomahawk B350). 2 different sets of 4x8GB RAM (3000). Two different Ryzen 1700X CPUs. Various DRAM speed settings; all available BIOS versions I could get my hands on. Problem is always the same: Random internal compiler errors due to segmentation fauls in different source files.
Setting devices/system/cpu/cpu0/cpufreq/scaling_governor to "performance" seems to improve the situation a little, but not much.
Guenter |
|
Back to top |
|
|
c1pherx n00b
Joined: 02 Apr 2017 Posts: 7
|
Posted: Fri Apr 07, 2017 11:46 am Post subject: |
|
|
Keepco wrote: | c1pherx wrote: | Yea. I spoke too soon. I've reduced the frequency of it happening, but it is still happening. On to the next ideas.
One pattern I'm noticing is that now it seems to be happening with builds that use libtool. This may just be a correlation, but my most recent failures were gnutls (first time that's happened) and libseccomp (first time here too). Both use Libtool. |
Can't seem to reproduce the gnutls failure, just tried recompiling it 15 times, worked every time. Guess my problems is elsewhere.
EDIT: Just re-emerged GCC without -march=native it seems like that did the job. |
I tried a fresh stage 3 in a chroot w/o any optimization at all, gcc-5.4.0, and stable most everything else and I was still seeing them.
I did get a BIOS update yesterday AM that seems to have helped, but I still see the occasional segfault. What's fascinating is that it's always during /bin/sh libtool ..... This could be a coincidence just by virtue of the fact that many large packages use libtool, but maybe it will be a clue for someone else. |
|
Back to top |
|
|
groeck n00b
Joined: 05 Apr 2017 Posts: 7
|
Posted: Tue Apr 11, 2017 5:20 am Post subject: bios update recommended |
|
|
I just got a BIOS update for my Gigabyte board (AB350 Gaming 3, BIOS F6). With that BIOS, the segmentation faults seem to be gone. I would suggest for everyone to install the latest BIOS (it seems that several vendors released an update today) and check if that fixes the problems.
Update: I spoke too early. Still happens, but less often. Oh well . |
|
Back to top |
|
|
bgamari n00b
Joined: 11 Apr 2017 Posts: 9
|
Posted: Tue Apr 11, 2017 3:13 pm Post subject: |
|
|
I am a Debian user but I have observed very similar behavior on my 1800X running on an Asus B350 Plus. This has been the case for every BIOS release available, including the beta 0605 release. I have tried two processors, two sets of memory, replacing the motherboard with a Gigabyte AB350, and a different PSU. Neither the CPU nor memory are overclocked and temperatures are around 60 Celcius under load. Strangely enough, the machine can run mprime for days on end without any trouble. However, an average run of the Glasgow Haskell Compiler's testsuite exhibits a handful of failures (typically segmentation faults). Even stranger, if I run a few mprime threads alongside a run of GHC's testsuite, mprime will itself sometimes crash with a segmentation fault.
This sort of spooky action at a distance leads me to suspect that there is a rather nasty hardware bug lurking in this chip. I'm very glad to hear I'm not the only one seeing this behavior; I was beginning to think that I was just cursed. |
|
Back to top |
|
|
groeck n00b
Joined: 05 Apr 2017 Posts: 7
|
Posted: Wed Apr 12, 2017 2:00 am Post subject: |
|
|
I wonder if anyone is able to reproduce the problems under Windows. So far all feedback I have received from board vendors is "we don't support Linux", with an optional "we'll be happy to help you if you can reproduce the problem with Windows". |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6050 Location: Removed by Neddy
|
Posted: Wed Apr 12, 2017 6:59 am Post subject: |
|
|
The recent wave of bios updates improve RAM timing and fix a OPcode error (that does cause windows to bsod ).
If you are saying a recent (ie last couple of days) bios update has improved stability i would not be surprised. As Gentoo is a src distribution we are more likely to be hit by these things via gcc _________________
Quote: | Removed by Chiitoo |
|
|
Back to top |
|
|
liewyec n00b
Joined: 03 Apr 2017 Posts: 9
|
Posted: Wed Apr 12, 2017 7:16 am Post subject: |
|
|
groeck wrote: | I wonder if anyone is able to reproduce the problems under Windows. So far all feedback I have received from board vendors is "we don't support Linux", with an optional "we'll be happy to help you if you can reproduce the problem with Windows". |
this is just great, i don't even have windows at home. How am i supposed to reproduce this in windows? |
|
Back to top |
|
|
trippels Tux's lil' helper
Joined: 24 Nov 2010 Posts: 137 Location: Berlin
|
Posted: Wed Apr 12, 2017 9:13 am Post subject: |
|
|
What kernel version are you guys running?
I would give latest git a try. |
|
Back to top |
|
|
liewyec n00b
Joined: 03 Apr 2017 Posts: 9
|
Posted: Wed Apr 12, 2017 9:14 am Post subject: |
|
|
trippels wrote: | What kernel version are you guys running?
I would give latest git a try. |
With 4.11-rc5 it crashes, 4.11-rc6 i didn't test yet. |
|
Back to top |
|
|
bgamari n00b
Joined: 11 Apr 2017 Posts: 9
|
Posted: Wed Apr 12, 2017 1:55 pm Post subject: |
|
|
liewyec wrote: | With 4.11-rc5 it crashes, 4.11-rc6 i didn't test yet. |
I have tried 4.11-rc6; it makes no difference. |
|
Back to top |
|
|
groeck n00b
Joined: 05 Apr 2017 Posts: 7
|
Posted: Fri Apr 14, 2017 4:12 am Post subject: |
|
|
bgamari wrote: | liewyec wrote: | With 4.11-rc5 it crashes, 4.11-rc6 i didn't test yet. |
I have tried 4.11-rc6; it makes no difference. |
Same with 4.10.10. |
|
Back to top |
|
|
groeck n00b
Joined: 05 Apr 2017 Posts: 7
|
Posted: Fri Apr 14, 2017 4:18 am Post subject: |
|
|
Naib wrote: | The recent wave of bios updates improve RAM timing and fix a OPcode error (that does cause windows to bsod ).
If you are saying a recent (ie last couple of days) bios update has improved stability i would not be surprised. As Gentoo is a src distribution we are more likely to be hit by these things via gcc |
I see the problem with literally dozens of different gcc versions, including "Ubuntu 5.4.0-6ubuntu1~16.04.4", which is the latest version available for the 16.04 release. I don't think the gcc version or the Linux distribution makes any difference. |
|
Back to top |
|
|
bgamari n00b
Joined: 11 Apr 2017 Posts: 9
|
Posted: Fri Apr 14, 2017 5:06 am Post subject: |
|
|
For what it's worth, I opened a support ticket with AMD yesterday. I've not heard back yet but I'll let you know what I hear. Even just an acknowledgement of the issue would put me at ease. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Fri Apr 14, 2017 1:20 pm Post subject: |
|
|
groeck wrote: | I don't think the gcc version or the Linux distribution makes any difference. |
Gcc does not have explicit Zen support until gcc 6. I'm running gcc 6.3.0 on an Athlon II box that I had planned to convert to ryzen until this segfault business surfaced. It's a deal breaker for me. Perhaps gcc 6.4 will fix it. But first they have to figure out why. |
|
Back to top |
|
|
trippels Tux's lil' helper
Joined: 24 Nov 2010 Posts: 137 Location: Berlin
|
Posted: Sat Apr 15, 2017 9:20 am Post subject: |
|
|
Tony0945 wrote: | groeck wrote: | I don't think the gcc version or the Linux distribution makes any difference. |
Gcc does not have explicit Zen support until gcc 6. I'm running gcc 6.3.0 on an Athlon II box that I had planned to convert to ryzen until this segfault business surfaced. It's a deal breaker for me. Perhaps gcc 6.4 will fix it. But first they have to figure out why. |
Please note that currently -march=znver1 is not tuned at all.
It is mostly a copy of bdver* and will generate unnecessary slow code in many cases.
I would not recommend using it until it gets properly tuned by AMD. |
|
Back to top |
|
|
groeck n00b
Joined: 05 Apr 2017 Posts: 7
|
Posted: Sat Apr 15, 2017 1:11 pm Post subject: |
|
|
Tony0945 wrote: | groeck wrote: | I don't think the gcc version or the Linux distribution makes any difference. |
Gcc does not have explicit Zen support until gcc 6. I'm running gcc 6.3.0 on an Athlon II box that I had planned to convert to ryzen until this segfault business surfaced. It's a deal breaker for me. Perhaps gcc 6.4 will fix it. But first they have to figure out why. |
I see the problem when cross compiling. Also, even if there is no explicit zen support, gcc should not crash. |
|
Back to top |
|
|
trippels Tux's lil' helper
Joined: 24 Nov 2010 Posts: 137 Location: Berlin
|
Posted: Sat Apr 15, 2017 2:02 pm Post subject: |
|
|
gcc crashing at random points is almost always due to memory issues.
I would try ECC memory, then you will at least see every failure in the logs.
My guess would be that buggy memory training in the BIOS is the root cause. |
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Sun Apr 16, 2017 11:24 pm Post subject: |
|
|
At least I'm not alone. I have two systems generating occasional segfaults during compiling:
- Asus B350M-A with Ryzen 5 1600 // 16GB (2x8GB), Kernel 4.10.8, latest BIOS
- Asus B350M-A with Ryzen 7 1700 // 16GB (2x8GB), Kernel 4.10.8, latest BIOS
Both systems show the same symptoms.
The systems do run fine, even under heavy load for hours. It seems only the compiling causes the segfaults. _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Sun Apr 16, 2017 11:32 pm Post subject: |
|
|
drizzt wrote: | The systems do run fine, even under heavy load for hours. It seems only the compiling causes the segfaults. |
What compiler version? And what CFLAGS? |
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Sun Apr 16, 2017 11:36 pm Post subject: |
|
|
Sorry,
both systems:
- gcc-5.4.0
- CFLAGS="-O2 -pipe -march=native" _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Sun Apr 16, 2017 11:43 pm Post subject: |
|
|
See this: http://www.phoronix.com/scan.php?page=article&item=amd-ryzen-znver1&num=1
There are two bulldozer instructions that ryzen does not support and Phoronix reports "compilation failures" (segfaults?)
We (I) don't know what 5.4 detects for native on a Ryzen. I think there are some gcc commands to find out. Or try something like -march=k8-sse3
I would think that the flags gcc was compiled with would be the significant ones. |
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Sun Apr 16, 2017 11:46 pm Post subject: |
|
|
Thank you for your help.
In the meantime I found this page: https://wiki.gentoo.org/wiki/User:Maffblaster/Drafts/Ryzen.
They suggest Code: | CFLAGS="-O2 -march=haswell" |
I'll do two things now:
1) I'll try the "haswell" approach on the R5
2) I'll try gcc-6.3.0 on the R7.
I'll report back as soon as I have results. _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Sun Apr 16, 2017 11:49 pm Post subject: |
|
|
I really don't see why AMD's latest processor would have Intel optimizations.
IIRC there was a bug report that tlked of changing some tables in gcc. Scary stuff for me.
EDIT:
Comment 3 here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80313 |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6050 Location: Removed by Neddy
|
Posted: Mon Apr 17, 2017 7:34 am Post subject: |
|
|
drizzt wrote: | Thank you for your help.
In the meantime I found this page: https://wiki.gentoo.org/wiki/User:Maffblaster/Drafts/Ryzen.
They suggest Code: | CFLAGS="-O2 -march=haswell" |
I'll do two things now:
1) I'll try the "haswell" approach on the R5
2) I'll try gcc-6.3.0 on the R7.
I'll report back as soon as I have results. | that was my edit based upon the Gentoo chat ryzen thread https://forums.gentoo.org/viewtopic-p-8056840.html#8056840
My Ryzen5 stuff arrives on Friday so i wanted to ensure all the bits of info i need exist _________________
Quote: | Removed by Chiitoo |
Last edited by Naib on Mon Apr 17, 2017 7:40 am; edited 1 time in total |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6050 Location: Removed by Neddy
|
Posted: Mon Apr 17, 2017 7:37 am Post subject: |
|
|
groeck wrote: | Naib wrote: | The recent wave of bios updates improve RAM timing and fix a OPcode error (that does cause windows to bsod ).
If you are saying a recent (ie last couple of days) bios update has improved stability i would not be surprised. As Gentoo is a src distribution we are more likely to be hit by these things via gcc |
I see the problem with literally dozens of different gcc versions, including "Ubuntu 5.4.0-6ubuntu1~16.04.4", which is the latest version available for the 16.04 release. I don't think the gcc version or the Linux distribution makes any difference. | what march are you using gcc-6.3 has zen core but it is poorly optimised. Prior to gcc6.3 hasswell march appears the best.
If you pick something different gcc might inject opcode your CPU does not have _________________
Quote: | Removed by Chiitoo |
|
|
Back to top |
|
|
|