Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Segfaults during compilation on AMD Ryzen.
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3 ... 7, 8, 9, 10, 11  Next  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
alfonsor
n00b
n00b


Joined: 13 Oct 2007
Posts: 16

PostPosted: Sat Jun 03, 2017 10:26 am    Post subject: Reply with quote

debian 4.8.15-1 with its configuration and initramfs, nothing changes
Back to top
View user's profile Send private message
aspinx
n00b
n00b


Joined: 03 Jun 2017
Posts: 2

PostPosted: Sat Jun 03, 2017 10:57 am    Post subject: Reply with quote

I was also having this issue (found this thread via google search). In my case the gcc was always segfaulting randomly.
My system is Ryzen 1600 CPU, Gigabyte B350 mobo, 16Gb G.skill 2133Mhz standard RAM. No overclocking. I had pretty standard X64 gentoo system copied from the old Intel box (without any arch specific compiler flags).

I tried changing CFLAGS to the Ryzen one (-O2 -march=bdver4 -mno-fma4 -mno-tbm -mno-xop -mno-lwp -pipe), but it would segfault during toolchain recompile. Reducing the number of parallel threads from 12 to 6 allowed to recompile binutils, but it was always crashing when recompiling gcc.

Then I've noticed a post here mentioning the multiple binutils packages. Checked my setup and found out that I was using the old version of binutils. Changed that via
Code:
eselect binutils


Then I found that gigabyte released AGESA 1.0.0.6 beta bios update couple days ago (which has improvements for RAM support), so I updated to it.
Then I get back to recompiling the toolchain and it didn't crash! I changed back to -j12 and recompile system and world - no crash either. I don't know if it was the bios update or binutils change, but one of those things clearly helped.

Just in case, I flashed back the older bios (AGESA 1.0.0.4) and tried recompiling the gcc - no crash. Looks like in my case the issue was with broken binutils.
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5040
Location: Removed by Neddy

PostPosted: Sat Jun 03, 2017 11:10 am    Post subject: Reply with quote

I would suspect the binutils.

I did a fresh install so the chances of me falling into this is slimmer.

When I changed from GCC-5.x to GCC-6.x to GCC-7.x I made sure I did emerge libtool glibc binutils gcc, checked eselect binutils list and did an emerge -e @system followed by an emerge -e @world

Those that ran into the binutils issue must have been moving an old install and THUS there was always a risk some binary not fully compatible existed OR they used a very old stage3:
2.26 added 2016-07-13
2.27 added 2016-11-15
2.28 added 2017-03-03
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
aspinx
n00b
n00b


Joined: 03 Jun 2017
Posts: 2

PostPosted: Sat Jun 03, 2017 11:41 am    Post subject: Reply with quote

My system was not too far from "fresh" - just a year old installation with almost no customization and I though it should run just fine on Ryzen... but for some reason it didn't.

Just in case, here is what I did (more or less):
Code:

emerge -av linux-headers glibc binutils gcc-config libtool gcc
eselect binutils <number of new binutils>
gcc-config <number of new gcc> Switch to new gcc
source /etc/profile
emerge -avb glibc binutils gcc libtool
emerge -avbke system
emerge -avbke world


binutils-2.26.1
glibc-2.23-r3
gcc-5.4.0-r3


PS: I still have the old binutils, so I can try switching back to it and see if it will start segfaulting again... Would that be helpful?
Back to top
View user's profile Send private message
Bigfoot77
n00b
n00b


Joined: 15 Dec 2006
Posts: 16

PostPosted: Sat Jun 03, 2017 2:47 pm    Post subject: Reply with quote

alfonsor wrote:
Bigfoot77, yes I tried 3 times a complete gentoo installation and the problem is still there. But your story made me think I always used the same kernel configuration from my main installation... Did you change your kernel config during re-installing?


My kernel config and the kernel version itself stayed the same for both installs. I can't remember exactly which minor version I was running at the time, but it was in the 4.10 series (I *believe* 4.10.0 itself).

EDIT:

Just for reference, I was using gentoo-sources specifically (not vanilla)
Back to top
View user's profile Send private message
mblnx
n00b
n00b


Joined: 04 Mar 2008
Posts: 10

PostPosted: Sun Jun 04, 2017 12:15 am    Post subject: Reply with quote

Getting segfaults isn't really the main problem, it is when you start compiling python modules and they generate a 0 byte file and you have to trace down which one got "silently" corrupted and is messing with everything else.

The OPCache code option really makes a big difference for me, but I got at least 2 errors. Checking community.amd.com forums, someone suggested running the memory kits in 2T with the new agesa code fixed a different problem. Could be worth trying it.
Back to top
View user's profile Send private message
PixieDust
n00b
n00b


Joined: 04 Jun 2017
Posts: 1

PostPosted: Sun Jun 04, 2017 8:19 am    Post subject: Reply with quote

I'm just coming back to Gentoo after an absence of a little over 12 years.

I've been having some issues, but they're the sort of issues that come from having to dig through layers of dust and cobwebs to remember what to do.

At this point, I've built and installed Gentoo since getting my R7 1700 probably 10 times. My issues have been my own doing. Screwing up the kernel, accidentally wiping out the wrong partition (I'm a menace!), running things in the wrong environment (chroot vs not), etc. There's a reason I've been installing this to a flash drive ;-)

BTW, shoutout to #Gentoo for the help over the last few weeks while I've futzed with this!

I never had ANY issues getting anything to compile. I did have occasional package blocks, conflicts, etc, but the compiles themselves never had an issue. The first install I started with -j8, and the last install was @ -j16 (most were between -j12 and -j14). I had read in several places people suggesting to use -march=Haswell or the bulldozer flags but I didn't quite feel comfortable with that. I didn't want to take a chance of running into weird architectural stuff while trying to figure things out again.

At this point I've gotten things down pretty well. Comfortable enough that at this point I've wiped an old laptop and gotten Gentoo up and running on it and working great. This way I can watch Netflix while I'm working on my main system.

With this recent experience behind me, I thought maybe it was time to try doing another run through on my main system (still on a flash drive though). This time I was going to try to grab gcc 7, rebuild the toolchain, and try to actually optimize the system and build packages properly.

Now it is entirely possible that I just screwed that whole process up, but that's when I started having issues compiling. I haven't had any segfaults (that I've seen), but I have had a multitude of compiler failures. I was able to get gcc up to 7, and after that all hades broke loose.

Has anyone else tried without any architecture specific optimizations and just compiled for generic 64-bit?

I'd be curious to see if issues were popping up without any special flags or optimizations (I was always using -O2 and -pipe, but that's it). Could be I just got really lucky, but I would have expected to run into issues WAAAAY earlier, especially with as many times as I've compiled everything at this point.

FWIW, I'm running an overclock of 3.8Ghz @ 1.28v LLC1 (Crosshair 6 Hero motherboard). I briefly attempted a memory overclock with my current BIOS (latest official one, not the beta with AGESA 1.0.0.6), but it was very unstable so I went back to default settings (which has me stuck @ 2133 for memory).

I don't know if any of this is helpful, or if it's completely useless. If this did not contribute, please accept my apologies.

Thanks.
Back to top
View user's profile Send private message
alfonsor
n00b
n00b


Joined: 13 Oct 2007
Posts: 16

PostPosted: Sun Jun 04, 2017 8:34 am    Post subject: Reply with quote

on phoronix forum, someone suggested to try
echo 0 >/proc/sys/kernel/randomize_va_space

and it seems to do the trick for me

the usual test I use, continuous parallel emerging of gcc in a shell and mesa in another, usually fails at the first or the second mesa compilation

with randomize_va_space set to 0 (not 1 nor 2), the test went on for hours, mesa was emerged about 80 times with no problem at all
Back to top
View user's profile Send private message
dryatu
n00b
n00b


Joined: 26 May 2017
Posts: 2

PostPosted: Sun Jun 04, 2017 11:05 am    Post subject: Reply with quote

Managed to reproduce the issue with clang by hammering mesa like crazy.

core is from bash - compiled with clang.

Code:

[Sun Jun  4 13:32:26 2017] sh[25959]: segfault at 3f3850 ip 00000000003f3850 sp 00007ffca8cf4cb0 error 14 in bash[400000+a9000]


bt

Code:

#0  0x00000000003f3850 in ?? ()
No symbol table info available.
#1  0x000000000041411c in alloc_word_desc () at make_cmd.c:85
        temp = <optimized out>
#2  make_bare_word (string=0x14482d0 "p\202D\001") at make_cmd.c:97
        temp = <optimized out>
#3  0x0000000000000000 in ?? ()
No symbol table info available.


registers

Code:

rax            0x1440670   21235312
rbx            0x1441ba0   21240736
rcx            0x136f420   20378656
rdx            0x141b180   21082496
rsi            0x1448270   21267056
rdi            0x142c2f0   21152496
rbp            0x13855f0   0x13855f0
rsp            0x7ffca8cf4cb0   0x7ffca8cf4cb0
r8             0x20   32
r9             0x4d5f54494d494c5f   5575267537513958495
r10            0xfffffffffffffd97   -617
r11            0x143d3e0   21222368
r12            0x1385700   20469504
r13            0x0   0
r14            0x14482d0   21267152
r15            0x141b360   21082976
rip            0x3f3850   0x3f3850
eflags         0x10206   [ PF IF RF ]
cs             0x33   51
ss             0x2b   43
ds             0x0   0
es             0x0   0
fs             0x0   0
gs             0x0   0



Perhaps someone understands how rip ended up at 0x3f3850 from f1.

Code:

Dump of assembler code for function make_bare_word:
   0x0000000000414110 <+0>:   push   %r14
   0x0000000000414112 <+2>:   push   %rbx
   0x0000000000414113 <+3>:   push   %rax
   0x0000000000414114 <+4>:   mov    %rdi,%r14
   0x0000000000414117 <+7>:   movslq 0x29e23e(%rip),%rax        # 0x6b235c <wdcache+12>
   0x000000000041411e <+14>:   test   %rax,%rax
   0x0000000000414121 <+17>:   jle    0x41413b <make_bare_word+43>
   0x0000000000414123 <+19>:   lea    -0x1(%rax),%rcx
   0x0000000000414127 <+23>:   mov    %ecx,0x29e22f(%rip)        # 0x6b235c <wdcache+12>
   0x000000000041412d <+29>:   mov    0x29e21c(%rip),%rcx        # 0x6b2350 <wdcache>
   0x0000000000414134 <+36>:   mov    -0x8(%rcx,%rax,8),%rbx
   0x0000000000414139 <+41>:   jmp    0x414148 <make_bare_word+56>
   0x000000000041413b <+43>:   mov    $0x10,%edi
   0x0000000000414140 <+48>:   callq  0x458a20 <xmalloc>
   0x0000000000414145 <+53>:   mov    %rax,%rbx
   0x0000000000414148 <+56>:   movl   $0x0,0x8(%rbx)
   0x000000000041414f <+63>:   movq   $0x0,(%rbx)
   0x0000000000414156 <+70>:   cmpb   $0x0,(%r14)
   0x000000000041415a <+74>:   je     0x41417d <make_bare_word+109>
   0x000000000041415c <+76>:   mov    %r14,%rdi
   0x000000000041415f <+79>:   callq  0x406470 <strlen@plt>
   0x0000000000414164 <+84>:   lea    0x1(%rax),%rdi
   0x0000000000414168 <+88>:   callq  0x458a20 <xmalloc>
   0x000000000041416d <+93>:   mov    %rax,%rdi
   0x0000000000414170 <+96>:   mov    %r14,%rsi
   0x0000000000414173 <+99>:   callq  0x406140 <strcpy@plt>
   0x0000000000414178 <+104>:   mov    %rax,(%rbx)
   0x000000000041417b <+107>:   jmp    0x41418d <make_bare_word+125>
   0x000000000041417d <+109>:   mov    $0x1,%edi
   0x0000000000414182 <+114>:   callq  0x458a20 <xmalloc>
   0x0000000000414187 <+119>:   mov    %rax,(%rbx)
   0x000000000041418a <+122>:   movb   $0x0,(%rax)
   0x000000000041418d <+125>:   mov    %rbx,%rax
   0x0000000000414190 <+128>:   add    $0x8,%rsp
   0x0000000000414194 <+132>:   pop    %rbx
   0x0000000000414195 <+133>:   pop    %r14
   0x0000000000414197 <+135>:   retq   
End of assembler dump.



edit:

build log https://drive.google.com/file/d/0B-2TEzisIWNaRDJjUUc2UzE4Rnc/view


Last edited by dryatu on Sun Jun 04, 2017 12:21 pm; edited 1 time in total
Back to top
View user's profile Send private message
chrisrot
n00b
n00b


Joined: 01 Apr 2004
Posts: 25

PostPosted: Sun Jun 04, 2017 11:28 am    Post subject: Reply with quote

Hi,

my problems compiling code seem to have disappeared with the latest unofficial beta for my Asus Crosshair 6 Hero.
After the update to Bios 9945 (AGESA 1.0.0.6) I was able to run
Code:
emerge -j8 -e @world
with about 1000+ packages (KDE, Fiorefox, Thunderbird, ....) at stock settings successfully.
With older Bioses I had to downclock in order to compile successfully.

I still do get occasionally
Code:

[ 6231.347183] mce: [Hardware Error]: Machine check events logged
[ 6231.347191] [Hardware Error]: Corrected error, no action required.
[ 6231.347193] [Hardware Error]: CPU:1 (17:1:1) MC3_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd820000000000150
[ 6231.347196] , Syndrome: 0x000000002a000503, IPID: 0x000300b000000000
[ 6231.347199] [Hardware Error]: Decode Unit Extended Error Code: 0
[ 6231.347199] [Hardware Error]: Decode Unit Error: uop cache tag parity error.
[ 6231.347201] [Hardware Error]: cache level: RESV, tx: INSN, mem-tx: IRD


If you have access to beta Bioses with AGESA 1.0.0.6 you might give them a try.

ciao
Christoph
Back to top
View user's profile Send private message
trippels
Tux's lil' helper
Tux's lil' helper


Joined: 24 Nov 2010
Posts: 137
Location: Berlin

PostPosted: Sun Jun 04, 2017 11:36 am    Post subject: Reply with quote

@dryatu: What you describe confirms what user inuwashidesu reported on reddit:
https://www.reddit.com/r/programming/comments/6f08mb/compiling_with_ryzen_cpus_on_linux_causing_random/dieuoad/

So it appears that all these segfaults happen in regions of dense test/jmp
instructions.
In your case I suspect that the following conditional jump gets corrupted:
Code:
   0x000000000041411e <+14>:   test   %rax,%rax
   0x0000000000414121 <+17>:   jle    0x41413b <make_bare_word+43> 

See: https://github.com/bminor/bash/blob/master/make_cmd.c
(alloc_word_desc() gets inlined into make_bare_word (), "test/jle" correspond
to "ocache_alloc (wdcache, WORD_DESC, temp);" on line 88.)

(Unfortunately you two are the only ones that have provided any relevant
segfault info thus far.)

All this points to a possible bug in Ryzen's micro-op cache perhaps triggered
by "CMP/TEST conditional jump" instruction fusion μops.


Last edited by trippels on Sun Jun 04, 2017 12:12 pm; edited 1 time in total
Back to top
View user's profile Send private message
Seek
n00b
n00b


Joined: 22 Jul 2007
Posts: 47
Location: Austria

PostPosted: Tue Jun 06, 2017 8:09 pm    Post subject: Reply with quote

alfonsor wrote:
on phoronix forum, someone suggested to try
echo 0 >/proc/sys/kernel/randomize_va_space

and it seems to do the trick for me
the usual test I use, continuous parallel emerging of gcc in a shell and mesa in another, usually fails at the first or the second mesa compilation
with randomize_va_space set to 0 (not 1 nor 2), the test went on for hours, mesa was emerged about 80 times with no problem at all

I can confirm that turning off ASLR completely eliminates any random segmentation faults during compilation.
Certain packages (gcc, mesa, systemd) consistently failed before when using -j16, after this I could finally fully update my system again.
Also, several hours of Yocto builds worked without issues.

Another workaround for certain failing packages is to only use -j1 for those.
Back to top
View user's profile Send private message
boudin
n00b
n00b


Joined: 15 May 2017
Posts: 4

PostPosted: Wed Jun 07, 2017 8:07 am    Post subject: Reply with quote

trippels wrote:
@dryatu: What you describe confirms what user inuwashidesu reported on reddit:
https://www.reddit.com/r/programming/comments/6f08mb/compiling_with_ryzen_cpus_on_linux_causing_random/dieuoad/

So it appears that all these segfaults happen in regions of dense test/jmp
instructions.
In your case I suspect that the following conditional jump gets corrupted:
Code:
   0x000000000041411e <+14>:   test   %rax,%rax
   0x0000000000414121 <+17>:   jle    0x41413b <make_bare_word+43> 

See: https://github.com/bminor/bash/blob/master/make_cmd.c
(alloc_word_desc() gets inlined into make_bare_word (), "test/jle" correspond
to "ocache_alloc (wdcache, WORD_DESC, temp);" on line 88.)

(Unfortunately you two are the only ones that have provided any relevant
segfault info thus far.)

All this points to a possible bug in Ryzen's micro-op cache perhaps triggered
by "CMP/TEST conditional jump" instruction fusion μops.


I don't know how to get his kind of information but I'm eager to learn. How do you get the backtrace ? By building bash with debug informations and using a debugger somehow or through some kernel debugging features ?
Back to top
View user's profile Send private message
trippels
Tux's lil' helper
Tux's lil' helper


Joined: 24 Nov 2010
Posts: 137
Location: Berlin

PostPosted: Wed Jun 07, 2017 8:21 am    Post subject: Reply with quote

boudin wrote:
I don't know how to get his kind of information but I'm eager to learn. How do you get the backtrace ? By building bash with debug informations and using a debugger somehow or through some kernel debugging features ?


By enabling core-dumps and looking at them in gdb afterwards.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 5804

PostPosted: Wed Jun 07, 2017 10:43 am    Post subject: Reply with quote

boudin wrote:
I don't know how to get his kind of information but I'm eager to learn. How do you get the backtrace ?

https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 402

PostPosted: Thu Jun 08, 2017 6:35 pm    Post subject: Reply with quote

Seems luatex doesn't like ryzen:
Code:
[10449.358930] traps: luatex[16617] general protection ip:60dd44 sp:7ffc3d586b10 error:0 in luatex[400000+54d000]
[10450.467092] traps: luatex[16635] general protection ip:60dd44 sp:7ffc2e146290 error:0 in luatex[400000+54d000]
[10451.103767] traps: luatex[16663] general protection ip:60dd44 sp:7ffc0b58dc80 error:0 in luatex[400000+54d000]
[10451.808861] traps: luatex[16684] general protection ip:60dd44 sp:7ffccfcd48e0 error:0 in luatex[400000+54d000]
[10462.622464] traps: luatex[20606] general protection ip:60dd44 sp:7ffe0a231c00 error:0 in luatex[400000+54d000]
[10463.731521] traps: luatex[20630] general protection ip:60dd44 sp:7ffe17014d10 error:0 in luatex[400000+54d000]

_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
yardbird
l33t
l33t


Joined: 20 Apr 2002
Posts: 689
Location: nl.leiden

PostPosted: Fri Jun 09, 2017 9:11 am    Post subject: Reply with quote

drizzt wrote:
Seems luatex doesn't like ryzen:
Code:
[10449.358930] traps: luatex[16617] general protection ip:60dd44 sp:7ffc3d586b10 error:0 in luatex[400000+54d000]
[10450.467092] traps: luatex[16635] general protection ip:60dd44 sp:7ffc2e146290 error:0 in luatex[400000+54d000]
[10451.103767] traps: luatex[16663] general protection ip:60dd44 sp:7ffc0b58dc80 error:0 in luatex[400000+54d000]
[10451.808861] traps: luatex[16684] general protection ip:60dd44 sp:7ffccfcd48e0 error:0 in luatex[400000+54d000]
[10462.622464] traps: luatex[20606] general protection ip:60dd44 sp:7ffe0a231c00 error:0 in luatex[400000+54d000]
[10463.731521] traps: luatex[20630] general protection ip:60dd44 sp:7ffe17014d10 error:0 in luatex[400000+54d000]


That's actually a GCC 7 issue:

https://bugs.gentoo.org/show_bug.cgi?id=621252
_________________
Albert Einstein wrote:
I consider it [...] urgently necessary for [...] workers to get together, both to protect their own economic status and [...] to secure their influence in the political field.


http://www.bluescarni.info
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 402

PostPosted: Fri Jun 09, 2017 4:20 pm    Post subject: Reply with quote

yardbird wrote:
drizzt wrote:
Seems luatex doesn't like ryzen:
Code:
[10449.358930] traps: luatex[16617] general protection ip:60dd44 sp:7ffc3d586b10 error:0 in luatex[400000+54d000]
[10450.467092] traps: luatex[16635] general protection ip:60dd44 sp:7ffc2e146290 error:0 in luatex[400000+54d000]
[10451.103767] traps: luatex[16663] general protection ip:60dd44 sp:7ffc0b58dc80 error:0 in luatex[400000+54d000]
[10451.808861] traps: luatex[16684] general protection ip:60dd44 sp:7ffccfcd48e0 error:0 in luatex[400000+54d000]
[10462.622464] traps: luatex[20606] general protection ip:60dd44 sp:7ffe0a231c00 error:0 in luatex[400000+54d000]
[10463.731521] traps: luatex[20630] general protection ip:60dd44 sp:7ffe17014d10 error:0 in luatex[400000+54d000]


That's actually a GCC 7 issue:

https://bugs.gentoo.org/show_bug.cgi?id=621252


I definitely need to improve my search skills. :oops:

Thank you for the info.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 5686

PostPosted: Sat Jun 10, 2017 4:05 am    Post subject: Reply with quote

drizzt wrote:
I definitely need to improve my search skills. :oops:

I missed that bug, too: Both relevant package names had the wrong category, that's probably why it didn't pop up in the search. I fixed this now.
Back to top
View user's profile Send private message
sat
n00b
n00b


Joined: 26 Apr 2017
Posts: 3

PostPosted: Fri Jun 16, 2017 1:18 am    Post subject: Reply with quote

# Multipost to Phoronix and Gentoo forum

Hey guys, please refer to my (id:sat) posts on the AMD community's thread about this problem.
I consider that there is high possibility of Ryzen's hardware problem from my analysis
based on the result of the reproduction on Windows Subsystem on Linux (WSL) and kernel
level trace information.

The thread about this problem in AMD support community:
https://community.amd.com/message/2801909

* Reproduction in the other OSes like Windows, more precisely Windows Subsystem for Linux (WSL), so
called Bash on Ubuntu on Windows.
=> My post beginning with "I ran my reproducer, building linux kernel with make -j16, on WSL
and it failed at random...."

* The result of analyzing the what caused SEGVs by setting tracer in linux kernel

=> My post beginning with "I did the above mentioned investigation and got some more information
from other Ryzen users. Here is the summary(details are below)...."


* Why I considers the prime suspect is Ryzen rather than other hardwares/softwares

=> My post beginning with 'Please let me summarize "what component is wrong (I bet it's a Ryzen)"
by taking account of my past analysis and the facts that has reported here, because information
gets complicated..."
Back to top
View user's profile Send private message
mblnx
n00b
n00b


Joined: 04 Mar 2008
Posts: 10

PostPosted: Tue Jun 20, 2017 12:53 am    Post subject: Reply with quote

Hey folks,

If you haven't opened a ticket with AMD yet, open one. They are tracking and trying to figure out what the problem is.
I received a new CPU today and will be able to run some tests during the weekend.

For now, with the old CPU, running the latest AGESA + OPCache disabled + ASLR disabled I have not seem any errors. Not perfect but stable.
Back to top
View user's profile Send private message
ozhdfw
n00b
n00b


Joined: 21 Jun 2017
Posts: 7

PostPosted: Wed Jun 21, 2017 5:51 am    Post subject: Compiling with Ryzen Reply with quote

If anyone has been getting segfaults compiling with Ryzen using -j16 or -j(max for your cpu) and do not want to use the work around by disabling smt, ASLR, and OpCache. Please try increasing your CPU SOC Voltage which is right under the CPU Core Voltage setting. User Shon on the amd forums said setting the CPU SOC Voltage 1.185 seemed to help with the segfaults. Please try these various work arounds and post the outcome here to help your fellow comrades.
Back to top
View user's profile Send private message
ozhdfw
n00b
n00b


Joined: 21 Jun 2017
Posts: 7

PostPosted: Wed Jun 21, 2017 5:58 am    Post subject: Reply with quote

alfonsor wrote:
on phoronix forum, someone suggested to try
echo 0 >/proc/sys/kernel/randomize_va_space

and it seems to do the trick for me

the usual test I use, continuous parallel emerging of gcc in a shell and mesa in another, usually fails at the first or the second mesa compilation

with randomize_va_space set to 0 (not 1 nor 2), the test went on for hours, mesa was emerged about 80 times with no problem at all



I just wanted to point out a helpful post again; Alsonsor pointed out, that "setting randomize_va_space set to 0" which disables it.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 402

PostPosted: Wed Jun 21, 2017 6:26 pm    Post subject: Reply with quote

Short Feedback:
- SoC Voltage was already set to 1.192V. so "increasing" to 1.185V is not possible for me. Anyway still segfaults
- disable aslr(echo 0 >/proc/sys/kernel/randomize_va_space) also doesn't fix segfault for me
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
mrostu
n00b
n00b


Joined: 26 Jun 2017
Posts: 2
Location: Moscow

PostPosted: Mon Jun 26, 2017 8:44 am    Post subject: Reply with quote

I had the same issue, when segfaults appeared during compilation at random places. At first, I noticed that I can't build mesa after few tries. Some other packages could be compiled, but after second or third try. My unsystematic efforts did the job: building > 281 packages, including mesa, proceeded without segfaults.

My current setup:
CPU: R5 1400 @ 3200 MHz
MB: MSI X370 SLI PLUS
DRAM: 2x8 GB @ 2933 MHz CL16-18-18-36 Corsair CMK16GX4M2B3200C16

Kernel: gentoo-sources-4.11.7 (genkernel)
gcc: 5.4.0
glibc: 2.23-r4
binutils: 2.28-r2
libtool: 2.4.6-r3
llvm: 4.0.0-r2
clang: 4.0.0

Building options:
MAKEOPTS="-j8"
CFLAGS="-march=x86-64 -O2 -pipe"
CXXFLAGS="${CFLAGS}"
CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3"

After building new PC I just inserted my HDD with Gentoo installation, which was built for -march=native (core2, i guess). Initially, I didn't change -march to x86-64.
Flashing BIOS with fresh version didn't help, changing DRAM frequency didn't too. I rebuilt gentoo-sources to 4.11.7, rebuilt some other packages, but it didn't help at all. What helped:
1. setting -march=x86-64;
2. removing old binutils;
3. rebuilding glibc, binutils and libtool;
4. rebuilding llvm and clang;
5. rebuilding all dependency tree for mesa: emerge -ea --exclude="gcc glibc binutils libtool llvm clang" mesa

Edit: gentoo-sources was built using genkernel

Edit 2: Like many others here I was too premature with my conclusion. Segfaults still arises, but more rarely.


Last edited by mrostu on Sun Aug 06, 2017 10:04 pm; edited 1 time in total
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page Previous  1, 2, 3 ... 7, 8, 9, 10, 11  Next
Page 8 of 11

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum