Segfaults during compilation on AMD Ryzen.

alfonsor · n00b Joined: 13 Oct 2007 Posts: 16

debian 4.8.15-1 with its configuration and initramfs, nothing changes

aspinx · n00b Joined: 03 Jun 2017 Posts: 2

I was also having this issue (found this thread via google search). In my case the gcc was always segfaulting randomly.
My system is Ryzen 1600 CPU, Gigabyte B350 mobo, 16Gb G.skill 2133Mhz standard RAM. No overclocking. I had pretty standard X64 gentoo system copied from the old Intel box (without any arch specific compiler flags).

I tried changing CFLAGS to the Ryzen one (-O2 -march=bdver4 -mno-fma4 -mno-tbm -mno-xop -mno-lwp -pipe), but it would segfault during toolchain recompile. Reducing the number of parallel threads from 12 to 6 allowed to recompile binutils, but it was always crashing when recompiling gcc.

Then I've noticed a post here mentioning the multiple binutils packages. Checked my setup and found out that I was using the old version of binutils. Changed that via

Naib · Posted: Sat Jun 03, 2017 11:10 am Post subject:

I would suspect the binutils.

I did a fresh install so the chances of me falling into this is slimmer.

When I changed from GCC-5.x to GCC-6.x to GCC-7.x I made sure I did emerge libtool glibc binutils gcc, checked eselect binutils list and did an emerge -e @system followed by an emerge -e @world

Those that ran into the binutils issue must have been moving an old install and THUS there was always a risk some binary not fully compatible existed OR they used a very old stage3:
2.26 added 2016-07-13
2.27 added 2016-11-15
2.28 added 2017-03-03
_________________

aspinx · n00b Joined: 03 Jun 2017 Posts: 2

My system was not too far from "fresh" - just a year old installation with almost no customization and I though it should run just fine on Ryzen... but for some reason it didn't.

Just in case, here is what I did (more or less):

Bigfoot77 · n00b Joined: 15 Dec 2006 Posts: 16

mblnx · n00b Joined: 04 Mar 2008 Posts: 15

Getting segfaults isn't really the main problem, it is when you start compiling python modules and they generate a 0 byte file and you have to trace down which one got "silently" corrupted and is messing with everything else.

The OPCache code option really makes a big difference for me, but I got at least 2 errors. Checking community.amd.com forums, someone suggested running the memory kits in 2T with the new agesa code fixed a different problem. Could be worth trying it.

PixieDust · n00b Joined: 04 Jun 2017 Posts: 1

I'm just coming back to Gentoo after an absence of a little over 12 years.

I've been having some issues, but they're the sort of issues that come from having to dig through layers of dust and cobwebs to remember what to do.

At this point, I've built and installed Gentoo since getting my R7 1700 probably 10 times. My issues have been my own doing. Screwing up the kernel, accidentally wiping out the wrong partition (I'm a menace!), running things in the wrong environment (chroot vs not), etc. There's a reason I've been installing this to a flash drive ;-)

BTW, shoutout to #Gentoo for the help over the last few weeks while I've futzed with this!

I never had ANY issues getting anything to compile. I did have occasional package blocks, conflicts, etc, but the compiles themselves never had an issue. The first install I started with -j8, and the last install was @ -j16 (most were between -j12 and -j14). I had read in several places people suggesting to use -march=Haswell or the bulldozer flags but I didn't quite feel comfortable with that. I didn't want to take a chance of running into weird architectural stuff while trying to figure things out again.

At this point I've gotten things down pretty well. Comfortable enough that at this point I've wiped an old laptop and gotten Gentoo up and running on it and working great. This way I can watch Netflix while I'm working on my main system.

With this recent experience behind me, I thought maybe it was time to try doing another run through on my main system (still on a flash drive though). This time I was going to try to grab gcc 7, rebuild the toolchain, and try to actually optimize the system and build packages properly.

Now it is entirely possible that I just screwed that whole process up, but that's when I started having issues compiling. I haven't had any segfaults (that I've seen), but I have had a multitude of compiler failures. I was able to get gcc up to 7, and after that all hades broke loose.

Has anyone else tried without any architecture specific optimizations and just compiled for generic 64-bit?

I'd be curious to see if issues were popping up without any special flags or optimizations (I was always using -O2 and -pipe, but that's it). Could be I just got really lucky, but I would have expected to run into issues WAAAAY earlier, especially with as many times as I've compiled everything at this point.

FWIW, I'm running an overclock of 3.8Ghz @ 1.28v LLC1 (Crosshair 6 Hero motherboard). I briefly attempted a memory overclock with my current BIOS (latest official one, not the beta with AGESA 1.0.0.6), but it was very unstable so I went back to default settings (which has me stuck @ 2133 for memory).

I don't know if any of this is helpful, or if it's completely useless. If this did not contribute, please accept my apologies.

Thanks.

alfonsor · n00b Joined: 13 Oct 2007 Posts: 16

on phoronix forum, someone suggested to try
echo 0 >/proc/sys/kernel/randomize_va_space

and it seems to do the trick for me

the usual test I use, continuous parallel emerging of gcc in a shell and mesa in another, usually fails at the first or the second mesa compilation

with randomize_va_space set to 0 (not 1 nor 2), the test went on for hours, mesa was emerged about 80 times with no problem at all

dryatu · n00b Joined: 26 May 2017 Posts: 2

Managed to reproduce the issue with clang by hammering mesa like crazy.

core is from bash - compiled with clang.

chrisrot · n00b Joined: 01 Apr 2004 Posts: 25

Hi,

my problems compiling code seem to have disappeared with the latest unofficial beta for my Asus Crosshair 6 Hero.
After the update to Bios 9945 (AGESA 1.0.0.6) I was able to run

trippels · Posted: Sun Jun 04, 2017 11:36 am Post subject:

@dryatu: What you describe confirms what user inuwashidesu reported on reddit:
https://www.reddit.com/r/programming/comments/6f08mb/compiling_with_ryzen_cpus_on_linux_causing_random/dieuoad/

So it appears that all these segfaults happen in regions of dense test/jmp
instructions.
In your case I suspect that the following conditional jump gets corrupted:

Seek · n00b Joined: 22 Jul 2007 Posts: 47 Location: Austria

boudin · n00b Joined: 15 May 2017 Posts: 4

trippels · Posted: Wed Jun 07, 2017 8:21 am Post subject:

krinn · Watchman Joined: 02 May 2003 Posts: 7470

drizzt · Guru Joined: 21 Jul 2002 Posts: 428

Seems luatex doesn't like ryzen:

yardbird · l33t Joined: 20 Apr 2002 Posts: 689 Location: nl.leiden

drizzt · Guru Joined: 21 Jul 2002 Posts: 428

mv · Watchman Joined: 20 Apr 2005 Posts: 6747

sat · n00b Joined: 26 Apr 2017 Posts: 3

# Multipost to Phoronix and Gentoo forum

Hey guys, please refer to my (id:sat) posts on the AMD community's thread about this problem.
I consider that there is high possibility of Ryzen's hardware problem from my analysis
based on the result of the reproduction on Windows Subsystem on Linux (WSL) and kernel
level trace information.

The thread about this problem in AMD support community:
https://community.amd.com/message/2801909

* Reproduction in the other OSes like Windows, more precisely Windows Subsystem for Linux (WSL), so
called Bash on Ubuntu on Windows.
=> My post beginning with "I ran my reproducer, building linux kernel with make -j16, on WSL
and it failed at random...."

* The result of analyzing the what caused SEGVs by setting tracer in linux kernel

=> My post beginning with "I did the above mentioned investigation and got some more information
from other Ryzen users. Here is the summary(details are below)...."

* Why I considers the prime suspect is Ryzen rather than other hardwares/softwares

=> My post beginning with 'Please let me summarize "what component is wrong (I bet it's a Ryzen)"
by taking account of my past analysis and the facts that has reported here, because information
gets complicated..."

mblnx · n00b Joined: 04 Mar 2008 Posts: 15

Hey folks,

If you haven't opened a ticket with AMD yet, open one. They are tracking and trying to figure out what the problem is.
I received a new CPU today and will be able to run some tests during the weekend.

For now, with the old CPU, running the latest AGESA + OPCache disabled + ASLR disabled I have not seem any errors. Not perfect but stable.

ozhdfw · n00b Joined: 21 Jun 2017 Posts: 7

If anyone has been getting segfaults compiling with Ryzen using -j16 or -j(max for your cpu) and do not want to use the work around by disabling smt, ASLR, and OpCache. Please try increasing your CPU SOC Voltage which is right under the CPU Core Voltage setting. User Shon on the amd forums said setting the CPU SOC Voltage 1.185 seemed to help with the segfaults. Please try these various work arounds and post the outcome here to help your fellow comrades.

ozhdfw · n00b Joined: 21 Jun 2017 Posts: 7

drizzt · Guru Joined: 21 Jul 2002 Posts: 428

Short Feedback:
- SoC Voltage was already set to 1.192V. so "increasing" to 1.185V is not possible for me. Anyway still segfaults
- disable aslr(echo 0 >/proc/sys/kernel/randomize_va_space) also doesn't fix segfault for me
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...

mrostu · n00b Joined: 26 Jun 2017 Posts: 2 Location: Moscow

I had the same issue, when segfaults appeared during compilation at random places. At first, I noticed that I can't build mesa after few tries. Some other packages could be compiled, but after second or third try. My unsystematic efforts did the job: building > 281 packages, including mesa, proceeded without segfaults.

My current setup:
CPU: R5 1400 @ 3200 MHz
MB: MSI X370 SLI PLUS
DRAM: 2x8 GB @ 2933 MHz CL16-18-18-36 Corsair CMK16GX4M2B3200C16

Kernel: gentoo-sources-4.11.7 (genkernel)
gcc: 5.4.0
glibc: 2.23-r4
binutils: 2.28-r2
libtool: 2.4.6-r3
llvm: 4.0.0-r2
clang: 4.0.0

Building options:
MAKEOPTS="-j8"
CFLAGS="-march=x86-64 -O2 -pipe"
CXXFLAGS="${CFLAGS}"
CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3"

After building new PC I just inserted my HDD with Gentoo installation, which was built for -march=native (core2, i guess). Initially, I didn't change -march to x86-64.
Flashing BIOS with fresh version didn't help, changing DRAM frequency didn't too. I rebuilt gentoo-sources to 4.11.7, rebuilt some other packages, but it didn't help at all. What helped:
1. setting -march=x86-64;
2. removing old binutils;
3. rebuilding glibc, binutils and libtool;
4. rebuilding llvm and clang;
5. rebuilding all dependency tree for mesa: emerge -ea --exclude="gcc glibc binutils libtool llvm clang" mesa

Edit: gentoo-sources was built using genkernel

Edit 2: Like many others here I was too premature with my conclusion. Segfaults still arises, but more rarely.