Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Segfaults during compilation on AMD Ryzen.
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, ... 9, 10, 11  Next  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
groeck
n00b
n00b


Joined: 05 Apr 2017
Posts: 7

PostPosted: Wed Apr 05, 2017 3:42 am    Post subject: gcc compile errors with Ryzen Reply with quote

I see similar failures. Various gcc versions all the way from 4.4.7 up to 6.3. Various cross compile architectures. I tried with three motherboards (Gigabyte AB350 Gaming, Gigabyte AB350 Gaming 3, and MSI Tomahawk B350). 2 different sets of 4x8GB RAM (3000). Two different Ryzen 1700X CPUs. Various DRAM speed settings; all available BIOS versions I could get my hands on. Problem is always the same: Random internal compiler errors due to segmentation fauls in different source files.

Setting devices/system/cpu/cpu0/cpufreq/scaling_governor to "performance" seems to improve the situation a little, but not much.

Guenter
Back to top
View user's profile Send private message
c1pherx
n00b
n00b


Joined: 02 Apr 2017
Posts: 7

PostPosted: Fri Apr 07, 2017 11:46 am    Post subject: Reply with quote

Keepco wrote:
c1pherx wrote:
Yea. I spoke too soon. I've reduced the frequency of it happening, but it is still happening. On to the next ideas.

One pattern I'm noticing is that now it seems to be happening with builds that use libtool. This may just be a correlation, but my most recent failures were gnutls (first time that's happened) and libseccomp (first time here too). Both use Libtool.


Can't seem to reproduce the gnutls failure, just tried recompiling it 15 times, worked every time. Guess my problems is elsewhere.

EDIT: Just re-emerged GCC without -march=native it seems like that did the job.


I tried a fresh stage 3 in a chroot w/o any optimization at all, gcc-5.4.0, and stable most everything else and I was still seeing them.

I did get a BIOS update yesterday AM that seems to have helped, but I still see the occasional segfault. What's fascinating is that it's always during /bin/sh libtool ..... This could be a coincidence just by virtue of the fact that many large packages use libtool, but maybe it will be a clue for someone else.
Back to top
View user's profile Send private message
groeck
n00b
n00b


Joined: 05 Apr 2017
Posts: 7

PostPosted: Tue Apr 11, 2017 5:20 am    Post subject: bios update recommended Reply with quote

I just got a BIOS update for my Gigabyte board (AB350 Gaming 3, BIOS F6). With that BIOS, the segmentation faults seem to be gone. I would suggest for everyone to install the latest BIOS (it seems that several vendors released an update today) and check if that fixes the problems.

Update: I spoke too early. Still happens, but less often. Oh well :-(.
Back to top
View user's profile Send private message
bgamari
n00b
n00b


Joined: 11 Apr 2017
Posts: 9

PostPosted: Tue Apr 11, 2017 3:13 pm    Post subject: Reply with quote

I am a Debian user but I have observed very similar behavior on my 1800X running on an Asus B350 Plus. This has been the case for every BIOS release available, including the beta 0605 release. I have tried two processors, two sets of memory, replacing the motherboard with a Gigabyte AB350, and a different PSU. Neither the CPU nor memory are overclocked and temperatures are around 60 Celcius under load. Strangely enough, the machine can run mprime for days on end without any trouble. However, an average run of the Glasgow Haskell Compiler's testsuite exhibits a handful of failures (typically segmentation faults). Even stranger, if I run a few mprime threads alongside a run of GHC's testsuite, mprime will itself sometimes crash with a segmentation fault.

This sort of spooky action at a distance leads me to suspect that there is a rather nasty hardware bug lurking in this chip. I'm very glad to hear I'm not the only one seeing this behavior; I was beginning to think that I was just cursed.
Back to top
View user's profile Send private message
groeck
n00b
n00b


Joined: 05 Apr 2017
Posts: 7

PostPosted: Wed Apr 12, 2017 2:00 am    Post subject: Reply with quote

I wonder if anyone is able to reproduce the problems under Windows. So far all feedback I have received from board vendors is "we don't support Linux", with an optional "we'll be happy to help you if you can reproduce the problem with Windows".
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 6050
Location: Removed by Neddy

PostPosted: Wed Apr 12, 2017 6:59 am    Post subject: Reply with quote

The recent wave of bios updates improve RAM timing and fix a OPcode error (that does cause windows to bsod ).

If you are saying a recent (ie last couple of days) bios update has improved stability i would not be surprised. As Gentoo is a src distribution we are more likely to be hit by these things via gcc
_________________
Quote:
Removed by Chiitoo
Back to top
View user's profile Send private message
liewyec
n00b
n00b


Joined: 03 Apr 2017
Posts: 9

PostPosted: Wed Apr 12, 2017 7:16 am    Post subject: Reply with quote

groeck wrote:
I wonder if anyone is able to reproduce the problems under Windows. So far all feedback I have received from board vendors is "we don't support Linux", with an optional "we'll be happy to help you if you can reproduce the problem with Windows".


this is just great, i don't even have windows at home. How am i supposed to reproduce this in windows?
Back to top
View user's profile Send private message
trippels
Tux's lil' helper
Tux's lil' helper


Joined: 24 Nov 2010
Posts: 137
Location: Berlin

PostPosted: Wed Apr 12, 2017 9:13 am    Post subject: Reply with quote

What kernel version are you guys running?
I would give latest git a try.
Back to top
View user's profile Send private message
liewyec
n00b
n00b


Joined: 03 Apr 2017
Posts: 9

PostPosted: Wed Apr 12, 2017 9:14 am    Post subject: Reply with quote

trippels wrote:
What kernel version are you guys running?
I would give latest git a try.


With 4.11-rc5 it crashes, 4.11-rc6 i didn't test yet.
Back to top
View user's profile Send private message
bgamari
n00b
n00b


Joined: 11 Apr 2017
Posts: 9

PostPosted: Wed Apr 12, 2017 1:55 pm    Post subject: Reply with quote

liewyec wrote:
With 4.11-rc5 it crashes, 4.11-rc6 i didn't test yet.


I have tried 4.11-rc6; it makes no difference.
Back to top
View user's profile Send private message
groeck
n00b
n00b


Joined: 05 Apr 2017
Posts: 7

PostPosted: Fri Apr 14, 2017 4:12 am    Post subject: Reply with quote

bgamari wrote:
liewyec wrote:
With 4.11-rc5 it crashes, 4.11-rc6 i didn't test yet.


I have tried 4.11-rc6; it makes no difference.


Same with 4.10.10.
Back to top
View user's profile Send private message
groeck
n00b
n00b


Joined: 05 Apr 2017
Posts: 7

PostPosted: Fri Apr 14, 2017 4:18 am    Post subject: Reply with quote

Naib wrote:
The recent wave of bios updates improve RAM timing and fix a OPcode error (that does cause windows to bsod ).

If you are saying a recent (ie last couple of days) bios update has improved stability i would not be surprised. As Gentoo is a src distribution we are more likely to be hit by these things via gcc


I see the problem with literally dozens of different gcc versions, including "Ubuntu 5.4.0-6ubuntu1~16.04.4", which is the latest version available for the 16.04 release. I don't think the gcc version or the Linux distribution makes any difference.
Back to top
View user's profile Send private message
bgamari
n00b
n00b


Joined: 11 Apr 2017
Posts: 9

PostPosted: Fri Apr 14, 2017 5:06 am    Post subject: Reply with quote

For what it's worth, I opened a support ticket with AMD yesterday. I've not heard back yet but I'll let you know what I hear. Even just an acknowledgement of the issue would put me at ease.
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Fri Apr 14, 2017 1:20 pm    Post subject: Reply with quote

groeck wrote:
I don't think the gcc version or the Linux distribution makes any difference.


Gcc does not have explicit Zen support until gcc 6. I'm running gcc 6.3.0 on an Athlon II box that I had planned to convert to ryzen until this segfault business surfaced. It's a deal breaker for me. Perhaps gcc 6.4 will fix it. But first they have to figure out why.
Back to top
View user's profile Send private message
trippels
Tux's lil' helper
Tux's lil' helper


Joined: 24 Nov 2010
Posts: 137
Location: Berlin

PostPosted: Sat Apr 15, 2017 9:20 am    Post subject: Reply with quote

Tony0945 wrote:
groeck wrote:
I don't think the gcc version or the Linux distribution makes any difference.


Gcc does not have explicit Zen support until gcc 6. I'm running gcc 6.3.0 on an Athlon II box that I had planned to convert to ryzen until this segfault business surfaced. It's a deal breaker for me. Perhaps gcc 6.4 will fix it. But first they have to figure out why.


Please note that currently -march=znver1 is not tuned at all.
It is mostly a copy of bdver* and will generate unnecessary slow code in many cases.
I would not recommend using it until it gets properly tuned by AMD.
Back to top
View user's profile Send private message
groeck
n00b
n00b


Joined: 05 Apr 2017
Posts: 7

PostPosted: Sat Apr 15, 2017 1:11 pm    Post subject: Reply with quote

Tony0945 wrote:
groeck wrote:
I don't think the gcc version or the Linux distribution makes any difference.


Gcc does not have explicit Zen support until gcc 6. I'm running gcc 6.3.0 on an Athlon II box that I had planned to convert to ryzen until this segfault business surfaced. It's a deal breaker for me. Perhaps gcc 6.4 will fix it. But first they have to figure out why.


I see the problem when cross compiling. Also, even if there is no explicit zen support, gcc should not crash.
Back to top
View user's profile Send private message
trippels
Tux's lil' helper
Tux's lil' helper


Joined: 24 Nov 2010
Posts: 137
Location: Berlin

PostPosted: Sat Apr 15, 2017 2:02 pm    Post subject: Reply with quote

gcc crashing at random points is almost always due to memory issues.
I would try ECC memory, then you will at least see every failure in the logs.
My guess would be that buggy memory training in the BIOS is the root cause.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Sun Apr 16, 2017 11:24 pm    Post subject: Reply with quote

At least I'm not alone. I have two systems generating occasional segfaults during compiling:
- Asus B350M-A with Ryzen 5 1600 // 16GB (2x8GB), Kernel 4.10.8, latest BIOS
- Asus B350M-A with Ryzen 7 1700 // 16GB (2x8GB), Kernel 4.10.8, latest BIOS

Both systems show the same symptoms.

The systems do run fine, even under heavy load for hours. It seems only the compiling causes the segfaults.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Sun Apr 16, 2017 11:32 pm    Post subject: Reply with quote

drizzt wrote:
The systems do run fine, even under heavy load for hours. It seems only the compiling causes the segfaults.


What compiler version? And what CFLAGS?
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Sun Apr 16, 2017 11:36 pm    Post subject: Reply with quote

Sorry,
both systems:

- gcc-5.4.0
- CFLAGS="-O2 -pipe -march=native"
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Sun Apr 16, 2017 11:43 pm    Post subject: Reply with quote

See this: http://www.phoronix.com/scan.php?page=article&item=amd-ryzen-znver1&num=1

There are two bulldozer instructions that ryzen does not support and Phoronix reports "compilation failures" (segfaults?)

We (I) don't know what 5.4 detects for native on a Ryzen. I think there are some gcc commands to find out. Or try something like -march=k8-sse3

I would think that the flags gcc was compiled with would be the significant ones.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Sun Apr 16, 2017 11:46 pm    Post subject: Reply with quote

Thank you for your help.
In the meantime I found this page: https://wiki.gentoo.org/wiki/User:Maffblaster/Drafts/Ryzen.
They suggest
Code:
CFLAGS="-O2 -march=haswell"


I'll do two things now:
1) I'll try the "haswell" approach on the R5
2) I'll try gcc-6.3.0 on the R7.

I'll report back as soon as I have results.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Sun Apr 16, 2017 11:49 pm    Post subject: Reply with quote

I really don't see why AMD's latest processor would have Intel optimizations.

IIRC there was a bug report that tlked of changing some tables in gcc. Scary stuff for me.

EDIT:

Comment 3 here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80313
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 6050
Location: Removed by Neddy

PostPosted: Mon Apr 17, 2017 7:34 am    Post subject: Reply with quote

drizzt wrote:
Thank you for your help.
In the meantime I found this page: https://wiki.gentoo.org/wiki/User:Maffblaster/Drafts/Ryzen.
They suggest
Code:
CFLAGS="-O2 -march=haswell"


I'll do two things now:
1) I'll try the "haswell" approach on the R5
2) I'll try gcc-6.3.0 on the R7.

I'll report back as soon as I have results.
that was my edit based upon the Gentoo chat ryzen thread https://forums.gentoo.org/viewtopic-p-8056840.html#8056840
My Ryzen5 stuff arrives on Friday so i wanted to ensure all the bits of info i need exist
_________________
Quote:
Removed by Chiitoo


Last edited by Naib on Mon Apr 17, 2017 7:40 am; edited 1 time in total
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 6050
Location: Removed by Neddy

PostPosted: Mon Apr 17, 2017 7:37 am    Post subject: Reply with quote

groeck wrote:
Naib wrote:
The recent wave of bios updates improve RAM timing and fix a OPcode error (that does cause windows to bsod ).

If you are saying a recent (ie last couple of days) bios update has improved stability i would not be surprised. As Gentoo is a src distribution we are more likely to be hit by these things via gcc


I see the problem with literally dozens of different gcc versions, including "Ubuntu 5.4.0-6ubuntu1~16.04.4", which is the latest version available for the 16.04 release. I don't think the gcc version or the Linux distribution makes any difference.
what march are you using gcc-6.3 has zen core but it is poorly optimised. Prior to gcc6.3 hasswell march appears the best.
If you pick something different gcc might inject opcode your CPU does not have
_________________
Quote:
Removed by Chiitoo
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page Previous  1, 2, 3, ... 9, 10, 11  Next
Page 2 of 11

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum