Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Segfaults during compilation on AMD Ryzen.
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5 ... 9, 10, 11  Next  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Wed Apr 19, 2017 9:55 am    Post subject: Reply with quote

liewyec wrote:
drizzt wrote:
liewyec wrote:
drizzt wrote:

NeddySeagoon, you are a genius !
I had 3 versions of binutils for the same architecture on my system. Guess what happened:
The newest one got always rebuild, but the oldest one was used.
I cleaned this mess up and I am compiling like crazy the whole day for testing. No segfaults so far on both systems.

Thank you all for your help and suggestions. Let's see if things are sorted out.


well i have only one version of binutils 2.26.1. Do you still get segfaults?


No, still compiling like a maniac on both systems and no segfaults.

My Systems:
- R7 1700, 16GB RAM, gcc-5.4.0 (march=haswell), binutils 2.27, Kernel 4.10.8
- R5 1600, 16GB RAM, gcc-5.4.0 (march=haswell), binutils 2.27, Kernel 4.10.8

If you "upgraded" an existing system like me => I recompiled the toolchain at least 10 times. Looking back I think I should have started fresh.


I recompiled existing system, but i tried new instalation, because of the segfaults. Today I upgraded to bin utils 2.27 and kernel 4.11-rc7 and i wil test this. My system is r7 1800x, 32gb ram, gcc-6.3.0


Oh, I forgot one (I think) really important thing:
I set the RAM speed to 1866. I suspect RAM incompatibilites as I stated some posts ago and have already ordered different ASUS approved RAM to check.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Wed Apr 19, 2017 2:38 pm    Post subject: Reply with quote

drizzt wrote:
Oh, I forgot one (I think) really important thing:
I set the RAM speed to 1866.
I'm thinking the whole problem was overclocked RAM.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Wed Apr 19, 2017 2:48 pm    Post subject: Reply with quote

Tony0945 wrote:
drizzt wrote:
Oh, I forgot one (I think) really important thing:
I set the RAM speed to 1866.
I'm thinking the whole problem was overclocked RAM.


Nope, I set the clock exactly to the given specifications in the first place , 1866 is "massive" underclocking.

Besides, today I've got 1 segfault on each machine. Interestingly it was the same ebuild.

I was able to avoid the segfault on both by reducing the parallel jobs from 16(12) to 8(6). I suspect the massive parallelism is causing certain ebuilds to become unstable.

Update
New RAM, approved by ASUS, set to specifications given by ASUS, same package segfaulted with "-j16". Reducing MAKEOPTS to "-j8" and the package build fine.

Btw. putting the machine under stress with e.g. video encoding or number crunching or creating a large 7zip does not produce any errors. So at least for day to day use it works.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Wed Apr 19, 2017 4:25 pm    Post subject: Reply with quote

drizzt wrote:
I was able to avoid the segfault on both by reducing the parallel jobs from 16(12) to 8(6). I suspect the massive parallelism is causing certain ebuilds to become unstable.
That seems likely as well. I'm not sure why, but some ebuilds won't build in parallel at all.
Back to top
View user's profile Send private message
c1pherx
n00b
n00b


Joined: 02 Apr 2017
Posts: 7

PostPosted: Wed Apr 19, 2017 5:43 pm    Post subject: Reply with quote

Tony0945 wrote:
drizzt wrote:
I was able to avoid the segfault on both by reducing the parallel jobs from 16(12) to 8(6). I suspect the massive parallelism is causing certain ebuilds to become unstable.
That seems likely as well. I'm not sure why, but some ebuilds won't build in parallel at all.


Parallel build issues rarely manifest as segfaults. Typically it will be file not found or something similar due to the Makefiles dependencies being incorrect. The only two packages I know of right now that ran into issues due to parallelism were efitools and dsniff (I'm working on patches and bug reports for both).

Based on all of the testing I did, I think this is an interaction between the CPU's MMU and the RAM itself. As near as I was able to figure out, the segfaults frequently happened during library loading, *not* compilation itself. (/bin/sh libtool * would segfault during /bin/sh initialization, not during GCC execution.) I think this is why we see it so heavily during compilation. Lots of library loading going on, new memory allocations, etc.
Back to top
View user's profile Send private message
daemon32
n00b
n00b


Joined: 28 Apr 2017
Posts: 2

PostPosted: Fri Apr 28, 2017 4:04 am    Post subject: Reply with quote

c1pherx wrote:
Parallel build issues rarely manifest as segfaults. Typically it will be file not found or something similar due to the Makefiles dependencies being incorrect.


You're absolutely correct.

And in addition, if the compiler were to spit out bad instructions, the resulting program's processes would be killed with SIGILL (ILLegal instruction),
they would not segfault, and the compiler certainly wouldn't segfault either.

c1pherx wrote:
As near as I was able to figure out, the segfaults frequently happened during library loading, *not* compilation itself. (/bin/sh libtool * would segfault during /bin/sh initialization, not during GCC execution.) I think this is why we see it so heavily during compilation. Lots of library loading going on, new memory allocations, etc.


I too, get segfaults when building any package that uses libtool, and it is indeed limited to /bin/sh (bash).

I have a Ryzen 7 1700 on an ASRock X370 Fatal1ty Gaming Professional (same PCB as the Taichi) with f4-3000c15d-16gvr memory.

I'm at completely stock settings (full CMOS reset, with the jumper and all) and sure,
I'm not using memory from the QVL, but it is memory from samsung (which ryzen is known to behave better with) so that shouldn't be the issue.

I'm currently running the gentoo livedvd from July 4, 2016, but I first encountered the issue after I had already installed gentoo to disk (booting with Kernel 4.9.24), actually.
I tried to reproduce this sort of issue in windows by running gentoo in the WSL (Windows Subsystem for Linux),
but the build I was running was just too old to use, and due to some missing syscalls, configure scripts would fail to detect even the presence of 'size_t' in /usr/include :oops:

After that, I tried FreeBSD (running under Hyper-V) and I compiled a ton of random ports (including chromium) without a hitch.
I even set windows to use 'Power saving' mode, which will downclock the processor like 'ondemand' does on linux.
I forgot to disable sleep mode, and even that didn't kill the build in the guest OS. 8O

I then turned off Cool 'n Quiet and C-State control in the EFI settings (and verified with a multimeter that the voltage was constant, I read 1.114 volts).
And I went back to the livedvd, in an attempt to finish installing gentoo again, and the segfaults would still happen,
but were only observed when I mounted tmpfs to /var/tmp/portage (thus reducing the time the compiler had to wait when loading source files, allowing for more CPU load)

So I'm at wit's end, but my last ditch effort is currently compiling the entire system with debug symbols (or at least glibc and the kernel) to see if I can gather any more information.

... or maybe disabling the op-cache?
Those new features tend to bite us early adopters :P

Or even try musl libc with busybox??!!

Here's the only trace I could get out of this mess:

Code:
/bin/sh ../../../../libtool  --tag=CC   --mode=compile x86_64-pc-linux-gnu-gcc -DPACKAGE_NAME=\"Mesa\" -DPACKAGE_TARNAME=\"mesa\" -DPACKAGE_VERSION=\"17.0.4\" -DPACKAGE_STRING=\"Mesa\ 17.0.4\" -DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi\?product=Mesa\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesa\" -DVERSION=\"17.0.4\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DYYTEXT_POINTER=1 -DHAVE___BUILTIN_BSWAP32=1 -DHAVE___BUILTIN_BSWAP64=1 -DHAVE___BUILTIN_CLZ=1 -DHAVE___BUILTIN_CLZLL=1 -DHAVE___BUILTIN_CTZ=1 -DHAVE___BUILTIN_EXPECT=1 -DHAVE___BUILTIN_FFS=1 -DHAVE___BUILTIN_FFSLL=1 -DHAVE___BUILTIN_POPCOUNT=1 -DHAVE___BUILTIN_POPCOUNTLL=1 -DHAVE___BUILTIN_UNREACHABLE=1 -DHAVE_FUNC_ATTRIBUTE_CONST=1 -DHAVE_FUNC_ATTRIBUTE_FLATTEN=1 -DHAVE_FUNC_ATTRIBUTE_FORMAT=1 -DHAVE_FUNC_ATTRIBUTE_MALLOC=1 -DHAVE_FUNC_ATTRIBUTE_PACKED=1 -DHAVE_FUNC_ATTRIBUTE_PURE=1 -DHAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL=1 -DHAVE_FUNC_ATTRIBUTE_UNUSED=1 -DHAVE_FUNC_ATTRIBUTE_VISIBILITY=1 -DHAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT=1 -DHAVE_FUNC_ATTRIBUTE_WEAK=1 -DHAVE_FUNC_ATTRIBUTE_ALIAS=1 -DMAJOR_IN_SYSMACROS=1 -DHAVE_DLADDR=1 -DHAVE_CLOCK_GETTIME=1 -DHAVE_PTHREAD=1 -I. -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/gallium/drivers/r300    -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/mesa/program -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/mesa -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/glsl -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/mapi -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/gallium/drivers/r300/include -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/include -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/gallium/include -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/gallium/auxiliary -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/gallium/drivers -I/var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/gallium/winsys -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D_GNU_SOURCE -DUSE_SSE41 -DUSE_GCC_ATOMIC_BUILTINS -DNDEBUG -DUSE_X86_64_ASM -DHAVE_XLOCALE_H -DHAVE_SYS_SYSCTL_H -DHAVE_STRTOF -DHAVE_MKOSTEMP -DHAVE_DLOPEN -DHAVE_POSIX_MEMALIGN -DHAVE_LIBDRM -DGLX_USE_DRM -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING -DGLX_USE_TLS -DHAVE_DRI3 -DENABLE_SHADER_CACHE -DHAVE_MINCORE -DHAVE_LLVM=0x0309 -DMESA_LLVM_VERSION_PATCH=1 -fvisibility=hidden -I/usr/include -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/usr/include/libdrm  -O2 -march=ivybridge -pipe -Wall -std=c99 -Werror=implicit-function-declaration -Werror=missing-prototypes -fno-math-errno -fno-trapping-math  -c -o compiler/radeon_pair_translate.lo /var/tmp/portage/media-libs/mesa-17.0.4/work/mesa-17.0.4/src/gallium/drivers/r300/compiler/radeon_pair_translate.c

*** Error in `/bin/sh': munmap_chunk(): invalid pointer: 0x0000000000469330 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x6fbfb)[0x7f688b061bfb]
/lib64/libc.so.6(+0x75486)[0x7f688b067486]
/bin/sh[0x444bcb]
[0x78c290]
======= Memory map: ========
00400000-004ac000 r-xp 00000000 00:24 352                                /bin/bash
006ab000-006ac000 r--p 000ab000 00:24 352                                /bin/bash
006ac000-006b0000 rw-p 000ac000 00:24 352                                /bin/bash
006b0000-0079e000 rw-p 00000000 00:00 0                                  [heap]
7f688abd7000-7f688abed000 r-xp 00000000 00:24 40750                      /usr/lib64/gcc/x86_64-pc-linux-gnu/5.4.0/libgcc_s.so.1
7f688abed000-7f688adec000 ---p 00016000 00:24 40750                      /usr/lib64/gcc/x86_64-pc-linux-gnu/5.4.0/libgcc_s.so.1
7f688adec000-7f688aded000 r--p 00015000 00:24 40750                      /usr/lib64/gcc/x86_64-pc-linux-gnu/5.4.0/libgcc_s.so.1
7f688aded000-7f688adee000 rw-p 00016000 00:24 40750                      /usr/lib64/gcc/x86_64-pc-linux-gnu/5.4.0/libgcc_s.so.1
7f688adee000-7f688adf0000 r-xp 00000000 00:24 47925                      /lib64/libdl-2.23.so
7f688adf0000-7f688aff0000 ---p 00002000 00:24 47925                      /lib64/libdl-2.23.so
7f688aff0000-7f688aff1000 r--p 00002000 00:24 47925                      /lib64/libdl-2.23.so
7f688aff1000-7f688aff2000 rw-p 00003000 00:24 47925                      /lib64/libdl-2.23.so
7f688aff2000-7f688b182000 r-xp 00000000 00:24 48037                      /lib64/libc-2.23.so
7f688b182000-7f688b381000 ---p 00190000 00:24 48037                      /lib64/libc-2.23.so
7f688b381000-7f688b385000 r--p 0018f000 00:24 48037                      /lib64/libc-2.23.so
7f688b385000-7f688b387000 rw-p 00193000 00:24 48037                      /lib64/libc-2.23.so
7f688b387000-7f688b38b000 rw-p 00000000 00:00 0
7f688b38b000-7f688b3e2000 r-xp 00000000 00:24 48031                      /lib64/libncurses.so.6.0
7f688b3e2000-7f688b5e1000 ---p 00057000 00:24 48031                      /lib64/libncurses.so.6.0
7f688b5e1000-7f688b5e5000 r--p 00056000 00:24 48031                      /lib64/libncurses.so.6.0
7f688b5e5000-7f688b5e6000 rw-p 0005a000 00:24 48031                      /lib64/libncurses.so.6.0
7f688b5e6000-7f688b5e7000 rw-p 00000000 00:00 0
7f688b5e7000-7f688b627000 r-xp 00000000 00:24 47774                      /lib64/libreadline.so.6.3
7f688b627000-7f688b827000 ---p 00040000 00:24 47774                      /lib64/libreadline.so.6.3
7f688b827000-7f688b829000 r--p 00040000 00:24 47774                      /lib64/libreadline.so.6.3
7f688b829000-7f688b82f000 rw-p 00042000 00:24 47774                      /lib64/libreadline.so.6.3
7f688b82f000-7f688b831000 rw-p 00000000 00:00 0
7f688b831000-7f688b844000 r-xp 00000000 00:24 30990                      /usr/lib64/libsandbox.so
7f688b844000-7f688ba44000 ---p 00013000 00:24 30990                      /usr/lib64/libsandbox.so
7f688ba44000-7f688ba45000 r--p 00013000 00:24 30990                      /usr/lib64/libsandbox.so
7f688ba45000-7f688ba46000 rw-p 00014000 00:24 30990                      /usr/lib64/libsandbox.so
7f688ba46000-7f688ba4e000 rw-p 00000000 00:00 0
7f688ba4e000-7f688ba71000 r-xp 00000000 00:24 48093                      /lib64/ld-2.23.so
7f688bc06000-7f688bc66000 rw-p 00000000 00:00 0
7f688bc66000-7f688bc71000 rw-p 00000000 00:00 0
7f688bc71000-7f688bc72000 r--p 00023000 00:24 48093                      /lib64/ld-2.23.so
7f688bc72000-7f688bc73000 rw-p 00024000 00:24 48093                      /lib64/ld-2.23.so
7f688bc73000-7f688bc74000 rw-p 00000000 00:00 0
7ffdc5cbc000-7ffdc5ce1000 rw-p 00000000 00:00 0                          [stack]
7ffdc5d9b000-7ffdc5d9e000 r--p 00000000 00:00 0                          [vvar]
7ffdc5d9e000-7ffdc5da0000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
make[4]: *** [Makefile:1195: compiler/radeon_pair_translate.lo] Aborted
Back to top
View user's profile Send private message
tholin
Apprentice
Apprentice


Joined: 04 Oct 2008
Posts: 203

PostPosted: Fri Apr 28, 2017 9:36 am    Post subject: Reply with quote

c1pherx wrote:
Based on all of the testing I did, I think this is an interaction between the CPU's MMU and the RAM itself. As near as I was able to figure out, the segfaults frequently happened during library loading, *not* compilation itself. (/bin/sh libtool * would segfault during /bin/sh initialization, not during GCC execution.) I think this is why we see it so heavily during compilation. Lots of library loading going on, new memory allocations, etc.

That would be my guess as well. Some problem with the handling of page tables or something. app-benchmarks/stress-ng could be useful to narrow down the problem. It does stress testing on a lot of kernel subsystems. Run all tests in a loop with 32 parallel workers and see of something segfaults. Some of the tests are basically fork bombs so be careful.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Fri Apr 28, 2017 10:04 am    Post subject: Reply with quote

It would be interesting to see if this problem correlates to particular motherboards or even motherboard vendors.

I've not seen issues like this for a long time. k6-2 old P2 long time ...
That was proved to be Vcore power supplies failing allowing the Vcore and RAM voltages to 'brown out' (go transiently out of spec) when the CPU switched from a low power to high power state.
I didn't have the test equipment to make measurements to confirm that but replacing all the capacitors in the motherboard regulator fixed the problem.

Designing for the required transient response in the Vcore regulator is difficult. The CPU can go from almost nothing to 100A in one clock cycle and the voltage must be held within a few millivolts.

Some correlation across motherboards or motherboard vendors could indicate that the Vcore regulators aren't quite up to the job.
There have been one or two reports that switching to the performance governor mitigated the issue.
That supports the above speculation, as the 'almost nothing' starts from a higher value, so the transient to full power step is smaller.

Note that a positive correlation would be interesting, it does not establish cause and effect.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 6051
Location: Removed by Neddy

PostPosted: Fri Apr 28, 2017 12:11 pm    Post subject: Reply with quote

exactly, while Correlation does not imply Causation, statistical information to narrow down any common aggravates is of interest.

I have a RAM voltage issue when I built a Core2 system years ago (thread still in the amd64 section)
_________________
Quote:
Removed by Chiitoo
Back to top
View user's profile Send private message
daemon32
n00b
n00b


Joined: 28 Apr 2017
Posts: 2

PostPosted: Fri Apr 28, 2017 6:24 pm    Post subject: Reply with quote

I tried what I had said in my previous post and turned off the 'OP Cache' setting in the EFI...
And I ran `emerge mesa` in a loop that would terminate upon a non-zero exit status for 2 and a half hours without interruption.
I then went back and turned the 'OP Cache' back on and the loop failed upon the first build.

I really should've learned my lesson from the last time I was an early adopter :P
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Sun May 07, 2017 9:41 am    Post subject: Reply with quote

NeddySeagoon wrote:
It would be interesting to see if this problem correlates to particular motherboards or even motherboard vendors.

I've not seen issues like this for a long time. k6-2 old P2 long time ...
That was proved to be Vcore power supplies failing allowing the Vcore and RAM voltages to 'brown out' (go transiently out of spec) when the CPU switched from a low power to high power state.
I didn't have the test equipment to make measurements to confirm that but replacing all the capacitors in the motherboard regulator fixed the problem.

Designing for the required transient response in the Vcore regulator is difficult. The CPU can go from almost nothing to 100A in one clock cycle and the voltage must be held within a few millivolts.

Some correlation across motherboards or motherboard vendors could indicate that the Vcore regulators aren't quite up to the job.
There have been one or two reports that switching to the performance governor mitigated the issue.
That supports the above speculation, as the 'almost nothing' starts from a higher value, so the transient to full power step is smaller.

Note that a positive correlation would be interesting, it does not establish cause and effect.


Currently I'm running "ondemand" governor. Which governor would you suggest to test ?

Maybe your right with your voltage regulator hypothesis since I bought the "cheaper" B350-Chipset for both machines( I don't need the X370 features ). Maybe I will get a X370 mobo and run some tests, but first let's try the governor.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Sun May 07, 2017 10:32 am    Post subject: Reply with quote

drizzt,

Try the performance governor. The high power level is the same, the CPU is running flat out, but the low power level is higher.
The power transients switching from one to the other are thus smaller.

I'm reluctant to suggest the powersave governor as its runs the CPU at its lowest clock frequency but the power transients will be smaller still.
Reducing the CPU clock like this brings in so many other variables too, so its not worth the test.
Nobody buy a Ryzen expecting to run it at its minimum clock speed for its useful life.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Sun May 07, 2017 10:44 am    Post subject: Reply with quote

NeddySeagoon wrote:
drizzt,

Try the performance governor. The high power level is the same, the CPU is running flat out, but the low power level is higher.
The power transients switching from one to the other are thus smaller.

I'm reluctant to suggest the powersave governor as its runs the CPU at its lowest clock frequency but the power transients will be smaller still.
Reducing the CPU clock like this brings in so many other variables too, so its not worth the test.
Nobody buy a Ryzen expecting to run it at its minimum clock speed for its useful life.


Agreed, I also think "powersave" is no way to go. Just tested with "conservative"-governor and got a segfault again.
I will try with "performance" again.

Wouldn't raising/lowering the vcore voltage help if browning out due to excessive current draw changes is the culprit ?
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Sun May 07, 2017 11:08 am    Post subject: Reply with quote

drizzt,

Raising the core voltage might help but I'm very reluctant to suggest that.

If CPU Vcore brownouts are the issue, they can come from several sources, the Vcore regulator on the motherboard or the upstream 12v supply that feeds that.
Changing Vcore may help if the problem is due to the Vcore regulator but not if its from further upstream.

Its all still speculation.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Sun May 07, 2017 2:00 pm    Post subject: Reply with quote

Is it possible that the bios is blocking any power governor controls ?
I tested with three different governors (ondemand, conservative, performance) and the output that atop gives me (avgf and avgscal) look nearly identical over emerge time on every run.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Sun May 07, 2017 2:46 pm    Post subject: Reply with quote

Is this your board? https://www.newegg.com/Product/Product.aspx?Item=N82E16813132965

I see it gets terrible reviews. The review comparing two Linices on the board was very interesting.
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Sun May 07, 2017 3:34 pm    Post subject: Reply with quote

Tony0945 wrote:
Is this your board? https://www.newegg.com/Product/Product.aspx?Item=N82E16813132965

I see it gets terrible reviews. The review comparing two Linices on the board was very interesting.


Sad but true, I have this one: https://www.newegg.com/Product/Product.aspx?Item=N82E16813132966&cm_re=asus_b350m-_-13-132-966-_-Product

But at least my experience with two boards in two systems is not half as bad as these guys experiences. Mine work for normal workloads ;)

But these reviews let me tend more and more to the conclusion to try another board: https://www.newegg.com/Product/Product.aspx?Item=9SIA1UH5N95687&cm_re=msi_arctic_b350-_-13-144-046-_-Product
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Sun May 07, 2017 3:56 pm    Post subject: Reply with quote

Been thinking of this https://www.newegg.com/Product/Product.aspx?Item=9SIA2F85F29679&cm_re=b350_tomahawk-_-13-144-028-_-Product and it's not so glitzy (but rarely available) cousin,https://www.newegg.com/Product/Product.aspx?Item=N82E16813144018

I usually buy Gigabyte, but it seems like MSI is more aggressive for Zen at least in providing timely BIOS updates. Gigabyte emphasizes Windows based tools that are useless on a Linux only system. Also, they seem to emphasize their Intel products.

I have only had one Biostar board in my life but it was surprisingly reliable.
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 6051
Location: Removed by Neddy

PostPosted: Sun May 07, 2017 4:37 pm    Post subject: Reply with quote

Tony0945 wrote:
Been thinking of this https://www.newegg.com/Product/Product.aspx?Item=9SIA2F85F29679&cm_re=b350_tomahawk-_-13-144-028-_-Product and it's not so glitzy (but rarely available) cousin,https://www.newegg.com/Product/Product.aspx?Item=N82E16813144018

I usually buy Gigabyte, but it seems like MSI is more aggressive for Zen at least in providing timely BIOS updates. Gigabyte emphasizes Windows based tools that are useless on a Linux only system. Also, they seem to emphasize their Intel products.

I have only had one Biostar board in my life but it was surprisingly reliable.

The most aggressive with bios updates is ASUS and it has resulted in booting issues.
MSI are actually the slowest. I have spent the afternoon going over the msi forum & from what I can gather MSI is behind on AGESA ... now whether this is a good thing or a bad thing is up for debate.
AMD actually released buggy code to OEM's (v1.0.0.4) that caused issues which missed MSI as they didn't expose this to the stable download.

Also there was a series of MSI bios's that messed around with core voltages

Quote:
The default/automatic voltage is to high, when i at bios ver 1.2, the default voltage is 1.352v, but now is something 1.38v, this is very high

so this might hold weight with regulation issues

AMD appear to be planning to launch another set of bios updates to OEM's mid may http://www.tomshardware.com/reviews/amd-ryzen-ama,5018.html
_________________
Quote:
Removed by Chiitoo
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Sun May 07, 2017 5:21 pm    Post subject: Reply with quote

Naib wrote:
Tony0945 wrote:
Been thinking of this https://www.newegg.com/Product/Product.aspx?Item=9SIA2F85F29679&cm_re=b350_tomahawk-_-13-144-028-_-Product and it's not so glitzy (but rarely available) cousin,https://www.newegg.com/Product/Product.aspx?Item=N82E16813144018

I usually buy Gigabyte, but it seems like MSI is more aggressive for Zen at least in providing timely BIOS updates. Gigabyte emphasizes Windows based tools that are useless on a Linux only system. Also, they seem to emphasize their Intel products.

I have only had one Biostar board in my life but it was surprisingly reliable.

The most aggressive with bios updates is ASUS and it has resulted in booting issues.
MSI are actually the slowest. I have spent the afternoon going over the msi forum & from what I can gather MSI is behind on AGESA ... now whether this is a good thing or a bad thing is up for debate.
AMD actually released buggy code to OEM's (v1.0.0.4) that caused issues which missed MSI as they didn't expose this to the stable download.

Also there was a series of MSI bios's that messed around with core voltages

Quote:
The default/automatic voltage is to high, when i at bios ver 1.2, the default voltage is 1.352v, but now is something 1.38v, this is very high

so this might hold weight with regulation issues

AMD appear to be planning to launch another set of bios updates to OEM's mid may http://www.tomshardware.com/reviews/amd-ryzen-ama,5018.html


Hm, if I compare the german ASUS page with the german MSI page for BIOS-Updates (the two mainboards I have a look on) they look identical in terms of AGESA(1.0.0.4a) and update release dates.

So at least for me no big deal ;)
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 6051
Location: Removed by Neddy

PostPosted: Wed May 10, 2017 3:31 pm    Post subject: Reply with quote

There is a new wave of MSI bios update:
MSI carbon: 7A32v15
- Improved memory compatibility.
- Fixed PCIe Hot-plug function issue.

Microcode is the same ( 0x0800111c)
AGESA version? ???


Slight update with respect to compilers:
Code:
GCC 6.3+ has support for the znver1 compiler optimization. For optimal performance, this can be enabled in make.conf.
While GCC 5.4 does not support zen core specific optimization, -march=bdver4 has been shown to be functional and stable:


haswell was previously recommended but that has changed
Quote:
After collaborating with another Gentoo developer (Tobias Klausmann (klausman)), it looks like haswell was causing him segfaults. I changed it to the previously supported -march value. --Maffblaster (talk) 23:58, 9 May 2017 (UTC)

_________________
Quote:
Removed by Chiitoo
Back to top
View user's profile Send private message
drizzt
Guru
Guru


Joined: 21 Jul 2002
Posts: 428

PostPosted: Wed May 10, 2017 4:10 pm    Post subject: Reply with quote

Naib wrote:
There is a new wave of MSI bios update:
MSI carbon: 7A32v15
- Improved memory compatibility.
- Fixed PCIe Hot-plug function issue.

Microcode is the same ( 0x0800111c)
AGESA version? ???


Slight update with respect to compilers:
Code:
GCC 6.3+ has support for the znver1 compiler optimization. For optimal performance, this can be enabled in make.conf.
While GCC 5.4 does not support zen core specific optimization, -march=bdver4 has been shown to be functional and stable:


haswell was previously recommended but that has changed
Quote:
After collaborating with another Gentoo developer (Tobias Klausmann (klausman)), it looks like haswell was causing him segfaults. I changed it to the previously supported -march value. --Maffblaster (talk) 23:58, 9 May 2017 (UTC)


Let's check it out. Compiling right now...

MSI Germany shows for the MORTAR ARCTIC at least AGESA 1.0.0.4a. AGESA 1.0.0.6 is as far as i read considered beta and I assume not part of a stable bios download.
_________________
People don't have to earn my respect. I offer my respect to them, but be careful to lose my respect...
Back to top
View user's profile Send private message
roarinelk
Guru
Guru


Joined: 04 Mar 2004
Posts: 520

PostPosted: Thu May 11, 2017 5:50 pm    Post subject: Reply with quote

drizzt wrote:


MSI Germany shows for the MORTAR ARCTIC at least AGESA 1.0.0.4a. AGESA 1.0.0.6 is as far as i read considered beta and I assume not part of a stable bios download.


I have the same board, and the current bios (v30 and beta v41) does clock memory a bit too high. I had to reduce memory clock to minimums to make it really stable.
Back to top
View user's profile Send private message
mblnx
n00b
n00b


Joined: 04 Mar 2008
Posts: 16

PostPosted: Fri May 12, 2017 8:34 am    Post subject: Reply with quote

daemon32 wrote:
I tried what I had said in my previous post and turned off the 'OP Cache' setting in the EFI...
And I ran `emerge mesa` in a loop that would terminate upon a non-zero exit status for 2 and a half hours without interruption.
I then went back and turned the 'OP Cache' back on and the loop failed upon the first build.

I really should've learned my lesson from the last time I was an early adopter :P


Where's this option located? I couldn't find it on mine (x370 Taichi).

btw, I've only started receiving the segfaults after I started recompiling things here, my old cflags were:

CFLAGS="-march=nocona -mtune=amdfam10 -O2 -pipe"

Running gcc 6.3.0 with -march=znver1 segfaulted a lot, on 5.4.0 with march=haswell I am fewer problems, but they still happen.

EDIT: nvm, found it buried somewhere on the bios... No segfaults since disabling it...
Back to top
View user's profile Send private message
asan
n00b
n00b


Joined: 14 May 2017
Posts: 1

PostPosted: Sun May 14, 2017 12:10 pm    Post subject: Reply with quote

I have the exact same segfaults on sh (especially mesa when using -j16) happening for my own system:
CROSSHAIR VI HERO, BIOS 1107 04/28/2017
AMD Ryzen 7 1800X

Everything stock, no overclocking.
Unfortunately I have not found any "OP Cache" option in the BIOS.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page Previous  1, 2, 3, 4, 5 ... 9, 10, 11  Next
Page 4 of 11

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum