View previous topic :: View next topic |
Author |
Message |
likewhoa l33t
Joined: 04 Oct 2006 Posts: 778 Location: Brooklyn, New York
|
Posted: Sun May 28, 2017 5:39 am Post subject: |
|
|
Naib wrote: |
Like you I used XMP RAM settings. I noticed all my RAM timings were off until I enabled XMP which then set them correctly. I still think those running into random segfaults have RAM related issues as it has been shown as well Ryzen is very RAM dependent due to the CXX.
I should put an option on to see if people are setting their RAM to the ramstick settings OR what the bios thinks it should be. I am 99% sure this is the issue (assuming hte hardware is good)
AGESA 1.0.0.6 is due out now-2weeks so that should sort some more RAM issues (ie defaults as well as overclocking) |
Yes most issues are probably bad RAM timings. I am currently stable at 3.9GHz 1.3626v CPU 1.0750v NB and 3.5v DRAM. anything higher would require more RAM than I can handle on air, so 3.9GHz will stay for now \o/
I can probably also assume that gcc-7, =>4.9 Linux kernel and proper RAM timings according to your kit specifications should give users a stable setup. XMP is not enabled by default and most people ignore BIOS settings and expect
things to just work™ but with Ryzen you do need to pay attention like with anything good like Gentoo
Can you post 'for i in gcc libreoffice llvm chromium; do genlop -t $i|tail -3;done' I like to compare what others have.
Code: | for i in gcc libreoffice llvm chromium; do genlop -t $i|tail -3;done
Fri May 26 00:31:35 2017 >>> sys-devel/gcc-7.1.0-r1
merge time: 15 minutes and 49 seconds.
Sun May 28 01:00:56 2017 >>> app-office/libreoffice-5.3.3.2
merge time: 34 minutes and 1 second.
Thu May 25 16:38:00 2017 >>> sys-devel/llvm-4.0.0-r2
merge time: 10 minutes and 23 seconds.
Fri Apr 28 08:29:01 2017 >>> www-client/chromium-58.0.3029.81
merge time: 57 minutes and 32 seconds. |
Enjoy your new rig man! -j16 FTW, oh have you tried 'MAKEOPTS="-j -l" emerge ceph'? |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6059 Location: Removed by Neddy
|
Posted: Sun May 28, 2017 8:36 am Post subject: |
|
|
Code: |
for i in gcc libreoffice llvm chromium; do genlop -t $i|tail -3;done
Sun May 28 08:39:21 2017 >>> sys-devel/gcc-7.1.0-r1
merge time: 26 minutes and 19 seconds.
Wed May 17 21:34:22 2017 >>> app-office/libreoffice-5.2.7.2
merge time: 37 minutes and 58 seconds.
Sun May 28 08:56:21 2017 >>> sys-devel/llvm-4.0.0-r2
merge time: 17 minutes.
!!! Error: no merge found for 'chromium'
|
_________________
Quote: | Removed by Chiitoo |
|
|
Back to top |
|
|
drizzt Guru
Joined: 21 Jul 2002 Posts: 428
|
Posted: Sun May 28, 2017 9:38 am Post subject: |
|
|
trippels wrote: | drizzt wrote: | ruby crashed rather nasty during emerge:
[code]./template/verconf.h.tmpl:3: [BUG] Segmentation fault at 0x00000000000000
ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
Update:
Hurray, looks like there is another source of trouble coming along:
I was able to build the above ruby version(dev-lang/ruby-2.3.4-r2) flawlessly with gcc-6. gcc-7 always triggers the error shown. *sigh* |
It is a known ruby bug (undefined behavior due to misaligned loads). This has nothing to do with Ryzen.
See: https://bugs.ruby-lang.org/issues/11831
(It also could be the following GC bug: https://bugs.ruby-lang.org/issues/13150)
Both issues are fixed in 2.4, so I would suggest to simply upgrade ruby. |
Thank you for the information _________________ People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect... |
|
Back to top |
|
|
alfonsor n00b
Joined: 13 Oct 2007 Posts: 16
|
Posted: Tue May 30, 2017 4:05 pm Post subject: |
|
|
nitm wrote: | If there is interest by someone at AMD about corefile, I can gather them.
But I doubt they'll be of much use - it is totally random.
And I've never seen an ICE in GCC, all my crashes are in bash and head. |
here the dmesg is 99% of the time "error 6 in bash[400000 a7000]"
error 6 is probably an user read in an invalid page, so I suspect is the cpu cache to have problem, silicon or whatever (architectural) |
|
Back to top |
|
|
boudin n00b
Joined: 15 May 2017 Posts: 4
|
Posted: Tue May 30, 2017 9:01 pm Post subject: |
|
|
I've installed the latest AGESA (1006) through this beta bios http://forum.gigabyte.us/thread/886/am4-beta-bios-thread?page=1
The situation is much better, but still had a few compilation errors (weird one though). I'm now running with the XMP profile of my DDR4 without changing the voltage, cool'n'quiet on with -j13 for MAKEOPTS. Before I wasn't able to build anything in this configuration and my whole system would become unstable once segfault will start to appear.
Now I've rebuilt randomly GCC and mesa a few times + some other stuff. I did had two compilation errors with mesa but overall it's much more stable. I'm gonna start again toying with my ram timings and voltage to see if it can reach a stable point. |
|
Back to top |
|
|
alfonsor n00b
Joined: 13 Oct 2007 Posts: 16
|
Posted: Wed May 31, 2017 7:46 am Post subject: |
|
|
boudin wrote: | I've installed the latest AGESA (1006) through this beta bios http://forum.gigabyte.us/thread/886/am4-beta-bios-thread?page=1
The situation is much better, but still had a few compilation errors (weird one though). I'm now running with the XMP profile of my DDR4 without changing the voltage, cool'n'quiet on with -j13 for MAKEOPTS. Before I wasn't able to build anything in this configuration and my whole system would become unstable once segfault will start to appear.
Now I've rebuilt randomly GCC and mesa a few times + some other stuff. I did had two compilation errors with mesa but overall it's much more stable. I'm gonna start again toying with my ram timings and voltage to see if it can reach a stable point. |
AGESA 1006 didn't change a thing here. |
|
Back to top |
|
|
mark_lagace Tux's lil' helper
Joined: 19 Nov 2002 Posts: 77 Location: Ottawa, Canada
|
Posted: Wed May 31, 2017 8:35 pm Post subject: |
|
|
TL;DR: Does anyone have a stable Ryzen 7 system?
I've been pulling my hair out trying to get the system stable for compiling and every option mentioned in this thread has so far not been successful.
My system:
CPU: Ryzen 7 1700 (stock cooler - wraith spire)
RAM: (CMK16GX4M2B3200C16R): 2x8GB DDR4 3200 (running at 2933)
MB: MSI X370 Gaming Pro Carbon, latest BIOS (7A32v15)
Fresh install of Gentoo from a recent stage3 tar (kernel 4.9 gcc 5.4)
Initially used "-O2 -pipe -march=native", but recompiled the system with "-O2 -pipe -march=bdver4 -mno-fma4 -mno-tbm -mno-xop -mno-lwp" after first running into problems and reading the gcc 5.4 uses bdver4 for the Ryzen procs.
MAKEOPTS is set to -j16
I've tried:
RAM settings - JEDEC (2133MHz, 1.2V; CL 15); XMP profile 1 (2933 MHz, 1.35V; CL16); XMP profile 2 (3200 MHz, 1.35V; CL16)
Pulling one stick of RAM and just using 1x8GB (only at 2133) with either stick of RAM just in case one is bad.
Turning SMT on or off
OP Codes - no options in my BIOS to change
CPU frequency governor on Performance vs ondemand
Compiling GCC 6.3 and using -march=znver1
Reducing makeopts to -j13 or -j8
In all cases, I will get random segfaults during compiles. They happen more frequently with large packages and higher levels of threading (i.e. -j16 segfaults more than -j8 ) but even with memory at 2133, SMT off and -j8, compiles will segfault. I've just compiled kernel 4.11.3 and will see if that makes any difference, but I'm not holding my breath.
Having gone through all of this, my question at this point is whether ANYONE has a stable, Ryzen 7 system?
FWIW, memtest86+ runs through multiple passes with no issues. Prime95 under Win10 in stress test mode with 16 helpers running also doesn't crash (at least for the 2-3 hours that I let it run). |
|
Back to top |
|
|
alfonsor n00b
Joined: 13 Oct 2007 Posts: 16
|
Posted: Wed May 31, 2017 8:49 pm Post subject: |
|
|
mark_lagace wrote: | TL;DR: Does anyone have a stable Ryzen 7 system?
|
It's a lottery, people have stable system with various ryzens and people have unstable system with various cpus. It is in the water.
PS 4.11.3 makes no difference. |
|
Back to top |
|
|
trippels Tux's lil' helper
Joined: 24 Nov 2010 Posts: 137 Location: Berlin
|
Posted: Thu Jun 01, 2017 4:31 am Post subject: |
|
|
It should also be noted that the crashing compiler is just a noisy symptom of a silent CPU corruption.
I have also seen git projects get corrupted on Ryzen (sha checksum issues).
So the point is you cannot trust your data on this CPU. It might be OK for a pure gaming PC,
but for everything else it is utterly unacceptable. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54539 Location: 56N 3W
|
Posted: Thu Jun 01, 2017 7:33 am Post subject: |
|
|
trippels,
The only separation between code and data in a PC is context.
Why any PC CPU operate correctly executing code but not when processing data?
I have seen it on more exotic architectures that have separate busses and memory arrangements for CPU instructions and data but there is separation at the hardware level. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
trippels Tux's lil' helper
Joined: 24 Nov 2010 Posts: 137 Location: Berlin
|
Posted: Thu Jun 01, 2017 8:39 am Post subject: |
|
|
NeddySeagoon wrote: | trippels,
The only separation between code and data in a PC is context.
Why any PC CPU operate correctly executing code but not when processing data?
|
I cannot parse this question. The CPU apparently flips random bits in memory/cache.
If the memory area contains code the process may crash. If it contains only data
the user may never notice. |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6059 Location: Removed by Neddy
|
Posted: Thu Jun 01, 2017 8:53 am Post subject: |
|
|
Harvard vs von Neumann architecture _________________
Quote: | Removed by Chiitoo |
|
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54539 Location: 56N 3W
|
Posted: Thu Jun 01, 2017 9:00 am Post subject: |
|
|
trippels,
Quote: | The CPU apparently flips random bits in memory/cache.
If the memory area contains code the process may crash. |
Hold that thought ..
I can't reconcile it with your earlier statement, Quote: | It might be OK for a pure gaming PC |
The inference being that its OK for games to crash.
I'm not aware of segfaults in gcc. its usually in bash, during a build, not gcc itself.
We only know that AM4 systems containing Ryzen processor can generate segfaults under load.
Its a feature of the system, not the CPU. At least, its not been demonstrated that its the CPU.
AMD may know more but they don't have a fix yet.
Tweaking the CPU may fix the system problem but that does not imply the CPU was the root cause.
My money is on the Vcore or Vram PSU transient response behaviour causing brownouts.
They are the hardest bits of a PC to get right and issues there only appear as system load changes. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
trippels Tux's lil' helper
Joined: 24 Nov 2010 Posts: 137 Location: Berlin
|
Posted: Thu Jun 01, 2017 9:16 am Post subject: |
|
|
NeddySeagoon wrote: | trippels,
Quote: | The CPU apparently flips random bits in memory/cache.
If the memory area contains code the process may crash. |
Hold that thought ..
I can't reconcile it with your earlier statement, Quote: | It might be OK for a pure gaming PC |
The inference being that its OK for games to crash.
I'm not aware of segfaults in gcc. its usually in bash, during a build, not gcc itself. |
No. I've seen several non reproducible gcc segfaults myself.
Games don't use all cores at 100% like compiling with -j16 does.
This is a CPU bug. I wish AMD would officially acknowledge it. |
|
Back to top |
|
|
alfonsor n00b
Joined: 13 Oct 2007 Posts: 16
|
Posted: Thu Jun 01, 2017 10:44 am Post subject: |
|
|
NeddySeagoon wrote: |
My money is on the Vcore or Vram PSU transient response behaviour causing brownouts.
They are the hardest bits of a PC to get right and issues there only appear as system load changes. |
Many bios options were cited in this thread to alleviate the problem and none worked for me. Then I discovered that setting LLC to "high" in bios reduces the 10-20 errors for an emerge -e @world to about 5.
So, your money could be well put, but I don't like the consequences of it, because, if so, it could have no solution at all for this line of CPUs. The problem is that AMD is selling very appealing multi core/threads CPUs that fail exactly at the job that makes them appealing. |
|
Back to top |
|
|
roarinelk Guru
Joined: 04 Mar 2004 Posts: 520
|
Posted: Thu Jun 01, 2017 10:48 am Post subject: |
|
|
trippels wrote: | NeddySeagoon wrote: | trippels,
Quote: | The CPU apparently flips random bits in memory/cache.
If the memory area contains code the process may crash. |
Hold that thought ..
I can't reconcile it with your earlier statement, Quote: | It might be OK for a pure gaming PC |
The inference being that its OK for games to crash.
I'm not aware of segfaults in gcc. its usually in bash, during a build, not gcc itself. |
No. I've seen several non reproducible gcc segfaults myself.
Games don't use all cores at 100% like compiling with -j16 does.
This is a CPU bug. I wish AMD would officially acknowledge it. |
I see random MCE's on a core ("u-op cache tag parity error") when all 16 are fully taxed for
a few hours (i.e. encode a dvd with x264 all options set to max) The latest round of bios updates has reduced them significantly, though.
Reducing memory frequency to minumum also cut down on random segfaults a lot.
EDIT: Oh, and there's a far more annoying bug: sometimes processes just stop. They don't crash or anything,
but they don't make any progress either (i.e. RIP doesn't move), but can be killed easily.
EDIT:
Code: |
kernel: [Hardware Error]: Corrected error, no action required.
kernel: [Hardware Error]: CPU:5 (17:1:1) MC3_STATUS[-|CE|MiscV|-|-|-|-|SyndV|-]: 0x9820000000000150
kernel: [Hardware Error]: IPID: 0x000300b000000000, Syndrome: 0x000000002a000503
kernel: [Hardware Error]: Decode Unit Extended Error Code: 0
kernel: [Hardware Error]: Decode Unit Error: uop cache tag parity error.
kernel: [Hardware Error]: cache level: RESV, tx: INSN, mem-tx: IRD
|
Last edited by roarinelk on Sat Jun 03, 2017 10:21 am; edited 3 times in total |
|
Back to top |
|
|
krinn Watchman
Joined: 02 May 2003 Posts: 7470
|
Posted: Thu Jun 01, 2017 12:00 pm Post subject: |
|
|
isn't that the earlier ryzen are just all defective? Have a look (sorry guys, you will have to click here and there, zillions pubs! they deserve an award for top awful website) |
|
Back to top |
|
|
alfonsor n00b
Joined: 13 Oct 2007 Posts: 16
|
Posted: Fri Jun 02, 2017 8:44 am Post subject: |
|
|
here it is a gcc segfault, with no dmesg info; it happened during compilation of gcc 6.3.0 itself (with a parallel mesa emerging)
../../gcc-6.3.0/lto-plugin/lto-plugin.c: In function ‘process_symtab’:
../../gcc-6.3.0/lto-plugin/lto-plugin.c:945:1: internal compiler error: Segmentation fault |
|
Back to top |
|
|
mark_lagace Tux's lil' helper
Joined: 19 Nov 2002 Posts: 77 Location: Ottawa, Canada
|
Posted: Fri Jun 02, 2017 5:28 pm Post subject: |
|
|
alfonsor wrote: | here it is a gcc segfault, with no dmesg info; it happened during compilation of gcc 6.3.0 itself (with a parallel mesa emerging)
../../gcc-6.3.0/lto-plugin/lto-plugin.c: In function ‘process_symtab’:
../../gcc-6.3.0/lto-plugin/lto-plugin.c:945:1: internal compiler error: Segmentation fault |
I can confirm that I have had segfaults in the compiler as well - though far less frequently than bash segfaults. |
|
Back to top |
|
|
tuggbuss Apprentice
Joined: 20 Mar 2017 Posts: 222
|
|
Back to top |
|
|
trippels Tux's lil' helper
Joined: 24 Nov 2010 Posts: 137 Location: Berlin
|
Posted: Sat Jun 03, 2017 4:57 am Post subject: |
|
|
Looking at the https://community.amd.com/message/2796982 thread,
it seems to be an issue with the new micro-op cache. The AMD guy recommends
to disable it in the BIOS (OPCache Control).
roarinelk also reported "u-op cache crc mismatch" MCEs. |
|
Back to top |
|
|
Bigfoot77 n00b
Joined: 15 Dec 2006 Posts: 16
|
Posted: Sat Jun 03, 2017 5:42 am Post subject: |
|
|
Hmm, well I haven't read through every post, but I've gotten through at least half of them. Not sure if anyone else has mentioned this themselves yet, but I figured I'd at least throw it out there in case it's useful:
I built my Ryzen 7 1800X machine a week after launch day. Nothing was overclocked and was just running at stock speeds. My mobo's BIOS version at the time of initial install was v1.0. I (very sporadically) ran into the same segfaults when building larger packages on that initial install (identical to the first post in this thread), but for the most part everything seemed to build without issue and I didn't think much of the few segfaults. Not long after, I upgraded to BIOS v1.3 for my mobo which included the updated AGESA 1.0.0.4a code.
While running v1.3 of the BIOS. I needed to re-emerge world and during that re-emerge is when I started seeing the segfaults constantly. They started to happen more and more often as I got through the package list. By the end, I could no longer build mesa without segfaulting which became my standard for testing the segfaults. *sometimes* gcc would build without segfaulting, but pretty much any large package couldn't make it through. This went on for a few weeks. I was about to RMA the CPU and as a last ditch effort just before, I decided to try a fresh install on a new partition. Interestingly, that install worked completely flawlessly and built every package (mesa included). For the past 1.5 - 2 months, that install has continued to work without issue and is what I'm still currently using. I update almost every day and haven't had any segfaults no matter what gets built.
Has anyone tried a second install running a brand new BIOS to see what happens? I don't know the inner workings of GCC all that well, but is it possible that something during those early buggy BIOS releases causing something to build incorrectly (GCC, libtool, something?) that would always propagate to anything else that was built? The only difference between my initial install when I first built the machine and my current install is the BIOS version. No other hardware has changed. The BIOSes before v1.3 were pretty flaky for me and I didn't use the machine too much, so I can't comment if the segfaults were actually happening a lot during that time or not. I did see the few segfaults with v1.0 initially, so at least something was going on from the beginning.
For reference, my system specs are:
Ryzen 7 1800X
MSI Tomahawk B350 mobo
Kingston HyperX Fury 2400 MHz RAM
gcc 6.3.0 (CFLAGS=-O2 -pipe -march=native -fno-stack-protector)
Please let me know if anyone wants any other hardware/software info about my setup. I know this isn't a ton to go on, but it's worth mentioning (I sure hope I didn't jinx myself |
|
Back to top |
|
|
alfonsor n00b
Joined: 13 Oct 2007 Posts: 16
|
Posted: Sat Jun 03, 2017 9:25 am Post subject: |
|
|
Bigfoot77, yes I tried 3 times a complete gentoo installation and the problem is still there. But your story made me think I always used the same kernel configuration from my main installation... Did you change your kernel config during re-installing? |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6059 Location: Removed by Neddy
|
Posted: Sat Jun 03, 2017 9:28 am Post subject: |
|
|
alfonsor wrote: | Bigfoot77, yes I tried 3 times a complete gentoo installation and the problem is still there. But your story made me think I always used the same kernel configuration from my main installation... Did you change your kernel config during re-installing? | when I did my install I did a complete new start: fresh install, fresh make.conf (kept use) and fresh configured kernel _________________
Quote: | Removed by Chiitoo |
|
|
Back to top |
|
|
alfonsor n00b
Joined: 13 Oct 2007 Posts: 16
|
Posted: Sat Jun 03, 2017 9:52 am Post subject: |
|
|
yup, everything was "fresh" here; just the kernel configuration
I am pretty sure it is not a kernel problem; my fixation is with cache coerence, anyway let's give it a try, what could be better then spending the weekend hacking ryzen segfaults? |
|
Back to top |
|
|
|