Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
AMD Zen/Ryzen thread (part 2)
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
Tony0945
Advocate
Advocate


Joined: 25 Jul 2006
Posts: 3191
Location: Illinois, USA

PostPosted: Wed Sep 27, 2017 12:57 am    Post subject: Re: Doing another system build Reply with quote

pjp wrote:
vaxbrat wrote:
The fab info on the chip is week 9
That sucks, and suggests it will be a while before it will be "safe" to pick one up.


I'm seeing great deals on Bulldozer and would snatch one up if my mobo was am3+ instead of am3 and am2+. Maybe I should check mobo prices and see if they are cheap too. I could reuse my memory then instead of buying DDR4. Screw AMD and their "If it doesn't crash on Win10, we don't care if it crashes on Linux" attitude.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18084

PostPosted: Wed Sep 27, 2017 2:11 am    Post subject: Reply with quote

They're replacing CPUs though,aren't they? That would work for me. If I had the spare cash for an entirely new system, I'd buy one.
_________________
Those who know what's best for us must rise and save us from ourselves.
Back to top
View user's profile Send private message
Tony0945
Advocate
Advocate


Joined: 25 Jul 2006
Posts: 3191
Location: Illinois, USA

PostPosted: Wed Sep 27, 2017 2:27 am    Post subject: Reply with quote

pjp wrote:
They're replacing CPUs though,aren't they? That would work for me. If I had the spare cash for an entirely new system, I'd buy one.

It sounds like they are making people jump through hoops and taking a long time. I had a new car whose battery died at three months on a holiday. GM picked up the car, towed it to the dealer and replaced the battery THAT DAY.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7167

PostPosted: Wed Sep 27, 2017 7:08 am    Post subject: Reply with quote

because many batteries are already produce...
you are comparing a batteries with latest cpu, they have start mass production few days before release, now that it is fix, it will again take days before they have enough good units to feed everyone.

I wonder if amd plan to keep selling bad units stock, because windows users are not really able to see the problem.
That would kill second hands market for ryzen.

And resellers buying in grey market are fucked for good and will keep selling bad units
Back to top
View user's profile Send private message
Tony0945
Advocate
Advocate


Joined: 25 Jul 2006
Posts: 3191
Location: Illinois, USA

PostPosted: Wed Sep 27, 2017 1:25 pm    Post subject: Reply with quote

krinn wrote:
because many batteries are already produce...
you are comparing a batteries with latest cpu, they have start mass production few days before release, now that it is fix, it will again take days before they have enough good units to feed everyone.

I wonder if amd plan to keep selling bad units stock, because windows users are not really able to see the problem.
That would kill second hands market for ryzen.

And resellers buying in grey market are fucked for good and will keep selling bad units

Very true. However, the RMA'd units could be resold to mass market PC sellers at a discount. It won't matter to 96+% that just watch youtube on Win10.
Back to top
View user's profile Send private message
mir3x
Guru
Guru


Joined: 02 Jun 2012
Posts: 431

PostPosted: Wed Sep 27, 2017 2:16 pm    Post subject: Reply with quote

wrc1944 wrote:


Is this pretty normal for requesting an AMD RMA, or should I complain and/or re-submit a new request? Maybe they are swamped with RMA's now that the word is out?


They replied to me in 23hours and 30 minutes, like they have some deadline 24h to reply. So just send them another message. And remove line where u demand RMA, they might not like ppl demanding, it should be their idea to give u RMA :D.
I wrote and attached screenshot of failed compilation:
Quote:
> > Its Ryzen 7 1700X. Gcc fails in compilation sometimes, it cannot
> > compile gcc7.2 itself in 70% cases, gcc7.1 works mostly ok, but 6.3
> > and 6.4 fail sometimes, gcc7.2 very often. In few cases it stucked
> > somewhere, like zombie process but it wasn't shown as zombie.
> > Motherboard is asus-prime b350 plus ( i upgraded bios yesterday to
> > 0808 version, it didnt helped) Ram is Kingston HyperX HX424C15FBK4/64
> > Bios set ram to 2400 by default. I removed 2 sticks and checked with
> > 2x16GB. Also checked with 1866Mhz. Fan is silentiumpc Fortis 3. K10
> > from kernel 4.13 shows temperature on idle about 30-35C. With hard use
> > max 49C. Bios shows 41-43C"

_________________
Installation aborted to prevent system self-destruction
Back to top
View user's profile Send private message
nasaiya
Tux's lil' helper
Tux's lil' helper


Joined: 17 May 2007
Posts: 147

PostPosted: Wed Sep 27, 2017 3:03 pm    Post subject: Re: possible kernel gotcha Reply with quote

Naib wrote:
do you have RCU configured, especially CONFIG_RCU_NOCB_CPU
also do you have a BIOS option associated with Cstate... try turning that off


Would you mind explaining this a little better? I'm having daily hard lockups as well. What are the correct settings for this?

I also have had the segfault issues but since disabling cool & quiet and c9 (or c6 or whatever it was in the bios), and rebuilding everything with gcc 7.1 everything has worked perfectly. I even went 4 days without a lockup but the lockup problem is back :(
_________________
If it ain't broke - fix it till it is!
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18084

PostPosted: Wed Sep 27, 2017 3:54 pm    Post subject: Reply with quote

Tony0945 wrote:
pjp wrote:
They're replacing CPUs though,aren't they? That would work for me. If I had the spare cash for an entirely new system, I'd buy one.

It sounds like they are making people jump through hoops and taking a long time. I had a new car whose battery died at three months on a holiday. GM picked up the car, towed it to the dealer and replaced the battery THAT DAY.
I'm doubtful that is solely because they are users of Linux. Maybe they aren't really well staffed for handling RMAs (in which case they should have gone through the seller). Or maybe they have a limited supply available to them. There are a lot of potential issues. I'm not saying they are handling it as good as it could be, but I don't see anything "malicious." Maybe that's just because I don't want Intel to be my only choice.
_________________
Those who know what's best for us must rise and save us from ourselves.
Back to top
View user's profile Send private message
thumper
Guru
Guru


Joined: 06 Dec 2002
Posts: 533
Location: Venice FL

PostPosted: Wed Sep 27, 2017 8:52 pm    Post subject: Re: possible kernel gotcha Reply with quote

nasaiya wrote:
Naib wrote:
do you have RCU configured, especially CONFIG_RCU_NOCB_CPU
also do you have a BIOS option associated with Cstate... try turning that off


Would you mind explaining this a little better? I'm having daily hard lockups as well. What are the correct settings for this?

I also have had the segfault issues but since disabling cool & quiet and c9 (or c6 or whatever it was in the bios), and rebuilding everything with gcc 7.1 everything has worked perfectly. I even went 4 days without a lockup but the lockup problem is back :(


Code:
CONFIG_RCU_NOCB_CPU:                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                       
 Use this option to reduce OS jitter for aggressive HPC or                                                                                                                                                                                                                                                             
 real-time workloads.   It can also be used to offload RCU                                                                                                                                                                                                                                                             
 callback invocation to energy-efficient CPUs in battery-powered                                                                                                                                                                                                                                                       
 asymmetric multiprocessors.                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                       
 This option offloads callback invocation from the set of                                                                                                                                                                                                                                                             
 CPUs specified at boot time by the rcu_nocbs parameter.                                                                                                                                                                                                                                                               
 For each such CPU, a kthread ("rcuox/N") will be created to                                                                                                                                                                                                                                                           
 invoke callbacks, where the "N" is the CPU being offloaded,                                                                                                                                                                                                                                                           
 and where the "x" is "b" for RCU-bh, "p" for RCU-preempt, and                                                                                                                                                                                                                                                         
 "s" for RCU-sched.  Nothing prevents this kthread from running                                                                                                                                                                                                                                                       
 on the specified CPUs, but (1) the kthreads may be preempted                                                                                                                                                                                                                                                         
 between each callback, and (2) affinity or cgroups can be used                                                                                                                                                                                                                                                       
 to force the kthreads to run on whatever set of CPUs is desired.                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                       
 Say Y here if you want to help to debug reduced OS jitter.                                                                                                                                                                                                                                                           
 Say N here if you are unsure.                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                       
 Symbol: RCU_NOCB_CPU [=y]                                                                                                                                                                                                                                                                                             
 Type  : boolean                                                                                                                                                                                                                                                                                                       
 Prompt: Offload RCU callback processing from boot-selected CPUs                                                                                                                                                                                                                                                       
   Location:                                                                                                                                                                                                                                                                                                           
     -> General setup                                                                                                                                                                                                                                                                                                 
       -> RCU Subsystem                                                                                                                                                                                                                                                                                               
   Defined at kernel/rcu/Kconfig:218                                                                                                                                                                                                                                                                                   
   Depends on: (TREE_RCU [=y] || PREEMPT_RCU [=n]) && (RCU_EXPERT [=y] || NO_HZ_FULL [=n])                                                                                                                                                                                                                             
   Selected by: NO_HZ_FULL [=n] && <choice> && !ARCH_USES_GETTIMEOFFSET [=n] && GENERIC_CLOCKEVENTS [=y]\                                                                                                                                                                                                             
   && SMP [=y] && HAVE_CONTEXT_TRACKING [=y] && HAVE_VIRT_CPU_ACCOUNTING_GEN [=y]


I can confirm that it used to work, I'm not sure when but I think the problem returned in 4.13 and this bit was removed:
Code:
CONFIG_RCU_NOCB_CPU_ALL=y

I had not paid close enough attention but it seems something needs to be set on the kernel command line at boot time.
So I'm just digging into this now, this bit seems to be important:
Code:
from the set of CPUs specified at boot time by the rcu_nocbs parameter.


Explained better here:
Code:
The RCU_NOCB_CPU_ALL=y Kconfig option, which causes all CPUs
to be offloaded.  On a 16-CPU system, this is equivalent to
"rcu_nocbs=0-15"


George
Back to top
View user's profile Send private message
Bloot
Tux's lil' helper
Tux's lil' helper


Joined: 10 Mar 2006
Posts: 92
Location: Barcelona

PostPosted: Thu Sep 28, 2017 5:20 pm    Post subject: Reply with quote

I'm done with AMD.

I sent my faulty CPU two weeks ago, they said they'd send me a replacement as soon as I'd send them the parcel tracking number. They wrote yesterday saying they were restocking their warehouse, and that it would not be completed until october 2nd. That means 3 weeks, at best, since I sent my faulty processor.

I liked Ryzen very much but this is unacceptable, I'd rather contact Amazon if I knew it would take this long. I'll sell it the moment it arrives, if it ever does.

Sorry for the rant, I needed it.
_________________
Portage 2.3.6 (python 3.4.5-final-0, default/linux/amd64/13.0/desktop/plasma, gcc-6.4.0, glibc-2.23-r4, 4.12.5-gentoo x86_64)
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18084

PostPosted: Thu Sep 28, 2017 7:23 pm    Post subject: Reply with quote

Sorry to hear you're that disappointed by a delay, but everyone has their limits. Mine was the Intel monopoly.
_________________
Those who know what's best for us must rise and save us from ourselves.
Back to top
View user's profile Send private message
Bloot
Tux's lil' helper
Tux's lil' helper


Joined: 10 Mar 2006
Posts: 92
Location: Barcelona

PostPosted: Thu Sep 28, 2017 8:15 pm    Post subject: Reply with quote

I wouldn't exactly call a three weeks wait a delay, but nevermind.

Hope they can finally adress this problem and replace every faulty unit, I won't be on the boat anymore. It was nice to share Ryzen experiences with you all though :)

Cheers :wink:
_________________
Portage 2.3.6 (python 3.4.5-final-0, default/linux/amd64/13.0/desktop/plasma, gcc-6.4.0, glibc-2.23-r4, 4.12.5-gentoo x86_64)
Back to top
View user's profile Send private message
mir3x
Guru
Guru


Joined: 02 Jun 2012
Posts: 431

PostPosted: Thu Sep 28, 2017 8:29 pm    Post subject: Reply with quote

Bloot wrote:
I'm done with AMD.

I sent my faulty CPU two weeks ago, they said they'd send me a replacement as soon as I'd send them the parcel tracking number. They wrote yesterday saying they were restocking their warehouse, and that it would not be completed until october 2nd. That means 3 weeks, at best, since I sent my faulty processor.


It sucks epsecially AMD wrote
Quote:
You will receive your original processor
approximately 3-5 business days from the ship date from the AMD
Returns Center, depending upon your location.


U can read forum there, one guy got that email and new CPU few hours later ...
https://community.amd.com/thread/215773?start=1650&tstart=0

Anyway, at least u got such email, I didn't. Seems at least one more week waiting.
15 mins after I sent them tracking number, they sent me email how to pack CPU and new - another RMA number to write on pack label, but I already sent !!!

EDIT: watch and say fck AMD, fck CPUs, fck computers :
https://www.youtube.com/watch?v=5g9zxduFtSM
_________________
Installation aborted to prevent system self-destruction
Back to top
View user's profile Send private message
nasaiya
Tux's lil' helper
Tux's lil' helper


Joined: 17 May 2007
Posts: 147

PostPosted: Sun Oct 01, 2017 3:31 pm    Post subject: Re: possible kernel gotcha Reply with quote

thumper wrote:

...
George


Thanks, enabling all that seems to have done the trick (so far anyway - no lockups in several days)... now I suppose I'll have to decide whether to start a return or keep gcc 7.1 forever...

I didn't add anything to the kernel command line btw so I don't know if that kernel feature is actually doing anything, but at least it's not locking up anymore.
_________________
If it ain't broke - fix it till it is!
Back to top
View user's profile Send private message
vaxbrat
l33t
l33t


Joined: 05 Oct 2005
Posts: 731
Location: DC Burbs

PostPosted: Sun Oct 01, 2017 5:08 pm    Post subject: Seems to be stabilizing here finally Reply with quote

My Taichi based build on kernel 4.12.12 has been up for over two days now without a lockup. The last thing I tweaked on that one is the SOC VCC getting set to 1.18 volts. I did leave cool and quiet turned on. I need to get my cluster on the same version of Ceph Jewel before I put a couple of OSD daemons on it.

My Gaming k4 build was still locking up yesterday even though I made the VCC change. I was about to turn off cool and quiet and considered switching from on-demand to performance cpu governor but I decided to switch to 4.13.3 on it and took George's advice above about the RCU settings. It's been good so far and even appears able to use all 16 cores to emerge mesa with gcc 6.4.0. I'm going to hold off on 7 until there's better agreement about both stability and zen support. I've been pleasantly surprised by amdgpu support and wine allowing me to play Fallout New Vegas. The last time I had that working was with a Geforce and Nvidia-drivers before all the insanity started happening with that and the KDE Plasma compositor in opengl mode.
Back to top
View user's profile Send private message
thumper
Guru
Guru


Joined: 06 Dec 2002
Posts: 533
Location: Venice FL

PostPosted: Sun Oct 01, 2017 8:47 pm    Post subject: Re: possible kernel gotcha Reply with quote

nasaiya wrote:
thumper wrote:

...
George


Thanks, enabling all that seems to have done the trick (so far anyway - no lockups in several days)... now I suppose I'll have to decide whether to start a return or keep gcc 7.1 forever...

I didn't add anything to the kernel command line btw so I don't know if that kernel feature is actually doing anything, but at least it's not locking up anymore.


I am using GCC 7.2 and it's working fantastic for me, since the latest update for my BIOS, no more segfaults, seems like kernel 4.13.3 had a fix for it too.
But the lockups still occurred randomly until I added this to my kernel command line in /etc/default/grub as well as those .config changes.
Code:
rcu_nocbs=0-15

I have the 1800X, and so far so good.

I'm not sure if these bit's are relevant, but I'm using these versions of the tools
Code:

sys-devel/libtool-2.4.6-r3
sys-devel/binutils-2.28.1
sys-libs/glibc-2.23-r4
sys-devel/gcc-7.2.0
sys-kernel/linux-headers-4.13


George
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5689
Location: Removed by Neddy

PostPosted: Sun Oct 01, 2017 9:43 pm    Post subject: Re: possible kernel gotcha Reply with quote

thumper wrote:
nasaiya wrote:
thumper wrote:

...
George


Thanks, enabling all that seems to have done the trick (so far anyway - no lockups in several days)... now I suppose I'll have to decide whether to start a return or keep gcc 7.1 forever...

I didn't add anything to the kernel command line btw so I don't know if that kernel feature is actually doing anything, but at least it's not locking up anymore.


I am using GCC 7.2 and it's working fantastic for me, since the latest update for my BIOS, no more segfaults, seems like kernel 4.13.3 had a fix for it too.
But the lockups still occurred randomly until I added this to my kernel command line in /etc/default/grub as well as those .config changes.
Code:
rcu_nocbs=0-15

I have the 1800X, and so far so good.

I'm not sure if these bit's are relevant, but I'm using these versions of the tools
Code:

sys-devel/libtool-2.4.6-r3
sys-devel/binutils-2.28.1
sys-libs/glibc-2.23-r4
sys-devel/gcc-7.2.0
sys-kernel/linux-headers-4.13


George
what -march are you using for gcc-7.2 ?
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
thumper
Guru
Guru


Joined: 06 Dec 2002
Posts: 533
Location: Venice FL

PostPosted: Sun Oct 01, 2017 9:54 pm    Post subject: Re: possible kernel gotcha Reply with quote

Naib wrote:

what -march are you using for gcc-7.2 ?

Code:
-march=native

Using native in the kernel as well.

George
Back to top
View user's profile Send private message
amaroc
Tux's lil' helper
Tux's lil' helper


Joined: 13 Nov 2005
Posts: 91

PostPosted: Mon Oct 02, 2017 2:12 pm    Post subject: Ryzen upgrade Reply with quote

As this is "Gentoo Chat" and it's Ryzen related I post it here - even if I do not really contribute to the Ryzen technical discussion. Think it might be interesting to some - otherwise please skip.
Summary about an upgrade to Ryzen CPU
Two weeks a go my good old Phenom II X6 1055T (6x2.80GHz) died all of a sudden. It might have been the motherboard as well as there was literally nothing - no screen, no beep no other sign of life. I tried some disconnect/connect cycles for cards and cables around, CMOS battery replacement and RTC reset, measured power supply - but no joy. So either CPU or mobo died somehow. Ok, this is what may happen after 7 years and more than 10k hours running.
Time to think about an update or other options.
- Android tablet only? - not much fun, not really
- Laptop? - the tablet allows for this flexibility already
So there shall be a desktop update.
- ARM or x86? - I've got still some x86 code around and w/ ARM I would very likely end up with Android someday - no!
- AMD or Intel? - One year ago I would probably have decided for Intel but there is Ryzen now for some time.
OK, so reading about Ryzen than - OMG - even this thread and the predecessor are full of segfault messages, memory incompatibilities, etc.
In addition - do I want to continue with Gentoo? I'm still running grub legacy, never thought about UEFI and overall - I want the system up and running asap as at least online banking needs to get restored soon.That will require a proper browser in a graphical environment - so I will need a quick graphics card decision as well.
- AMD or nVidia? - AMD seems to be more on open source - so that was easy
- Old or new chip family? - As my screen is also almost ten years old I should look for something more recent. I was brave and decided for the RX550.
So, mobo decision was easy - ASUS prime B350-PLUS as there is an (onboard) SPDIF out and the reviews are not bad. For DDR4 I decided for SR Crucial - 2400 speed grade seems to be sufficient and 2x 8GB is an improvement over the old 12GB. A Ryzen 1700 with the bundled fan should also be sufficient. As my 2GB HDD was a bit tight already I decided for a 8TB update as well. Everything else - 1TB SSD, DVD-RW, BD-ROM, case, power supply, etc. should stay. The mobo requires an 8-pin EATX 12V connector but my power supply has a 4-pin ATX 12V connector - Google said that this is OK.
So, I ordered the items above and after some days I could start the assembly, booted into the BIOS, made an update to the recent BIOS version - here we are.
Now - what to boot? Even if I have recent backups on external drives I wanted to re-use my installation as much as possible and therefore doing a tar backup of my SSD root partition seemed to be a good idea.
A little bit out of curiosity I decided for Ubuntu 17.10 beta as there should be a recent kernel, graphics, etc.
I ended up with acpi=off and a gnome desktop. Not bad and OK for a tar backup but chroot and compile a kernel on one core out of eight? That's no fun. So I tared my SSD root partition just in case and decided for a Gentoo installation medium.
What - the CD image does not support UEFI and the DVD is about one year old? As I'm lazy I decided against UEFI and switched to the legacy boot scheme with the Gentoo CD image.
The genkernel booted fine, all 8 cores were there and I could chroot. I took the opportunity to go for grub2 according wiki and followed the Ryzen kernel instruction. Playing safe I decided for Generic-x86-64, gcc 5.4.0 and -march=x86-64.
The next boot showed a nice grub2 menu but the kernel crashed again - with good old 80x25 I couldn't see much. I tried memtest86 from the install medium - it crashed as well. It took me some time to understand that the old memtest86 does not like the Ryzen so I decided to continue the journey.
Why not being brave and use the old Phenom kernel? Surprise - it booted and even Xorg and KDE did start - wow! OK, I was on six cores rather than eight (+8x HT) as my Phenom customization was still in but that was acceptable for a kernel update. The old ATI radeon code obviously worked even for Xorg and I didn't care about sw rendering at all.
So I modified the kernel slightly to have eight cores plus ht - that was easy.
Later I performed an incremetal kernel update process according Ryzen kernel instruction and found the IOMMU option to be the root cause. Google told me that this might be an issue on ASUS and B350 motherboards - but I do not really need this feature right now - so it stays out for now.
As I was OK with the current graphics solution and Xorg-log and system-log stayed quite I decided for some stability tests. gcc compile gave me some non Ryzen issues for tmpfs inodes and docbook issues but beside that everythings runs rock solid. I did a bunch of parallel emerges on gcc and webkit-gtk utilizing all cores and a lot of memory and didn't had any issue.
Clearly there will be more work to do for gcc flags, AMDGPU, sensors, etc. but that's not urgent at all.

Summary:
I'm very pleased with the update process and after 12 years with Gentoo I have to admit - Gentoo still rocks :-)
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5689
Location: Removed by Neddy

PostPosted: Tue Oct 03, 2017 9:08 am    Post subject: Re: possible kernel gotcha Reply with quote

thumper wrote:
Naib wrote:

what -march are you using for gcc-7.2 ?

Code:
-march=native

Using native in the kernel as well.

George
ok now this is odd... I have been using -march=znver1 since I converted to GCC-6.x
GCC-7.2 is the only package I have ever had issue with (I have done pretty much all of the stress testing as I fear I have a dodgy chip).
I changed my march to native and it builds oO I then built the toolchain twice (libtools, binutils, glibc, gcc) with no problems

I left emerge continually building gcc-7.2 overnight and everyone built fine...

-march=native should imply -march=znver1 so the results should be identical. So either there is a bug in gcc building or maybe there was an inconsistency in my toolchain that over the last few weeks multiple rebuilds helped align.
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
Chewi
Developer
Developer


Joined: 01 Sep 2003
Posts: 875
Location: Edinburgh, Scotland

PostPosted: Tue Oct 03, 2017 9:14 am    Post subject: Reply with quote

Hello all. I'm back after going through the RMA process. I was 2½ weeks without my desktop. :| Could have been worse but I expected better. It only took 4 days for my old CPU to reach the warehouse in the Netherlands but it took them another 13 days to deliver the replacement. At least it gave me time to play with my ARM box.

I can see that the replacement was manufactured in 1730 (July) and the good news is that the segfaults appear to have gone after re-enabling ASLR. The not so good news is that I still got a freeze overnight after disabling the RCU stuff. There's a chance that this may have been down to the amd-staging DC stuff but I have my doubts. Trying without that at the moment. I could turn the RCU stuff back on but not having C6 sucks for power usage and I really want to get to the bottom of it. Now that the segfaults are ruled out, I will try to find out more. Maybe I'll contact Gigabyte.

@nasaiya, what motherboard do you have?
Back to top
View user's profile Send private message
fcl
n00b
n00b


Joined: 31 Dec 2016
Posts: 72

PostPosted: Tue Oct 03, 2017 3:00 pm    Post subject: Reply with quote

To be fair C6 doesn't lower power usage THAT much. I wouldn't worry about it
Back to top
View user's profile Send private message
thumper
Guru
Guru


Joined: 06 Dec 2002
Posts: 533
Location: Venice FL

PostPosted: Tue Oct 03, 2017 7:52 pm    Post subject: Re: possible kernel gotcha Reply with quote

Naib wrote:
ok now this is odd... I have been using -march=znver1 since I converted to GCC-6.x
GCC-7.2 is the only package I have ever had issue with (I have done pretty much all of the stress testing as I fear I have a dodgy chip).
I changed my march to native and it builds oO I then built the toolchain twice (libtools, binutils, glibc, gcc) with no problems

I left emerge continually building gcc-7.2 overnight and everyone built fine...

-march=native should imply -march=znver1 so the results should be identical. So either there is a bug in gcc building or maybe there was an inconsistency in my toolchain that over the last few weeks multiple rebuilds helped align.


I've been using GCC 7.2 since a few days before it hit portage proper, 7.1 prior to that.

For what its worth I rebuilt my toolchain plus some other packages a half dozen times or so out of fear when the segfaults were showing up, until I got a clean pass.

Used this:
Code:
#!/bin/bash
emerge -1v sys-kernel/linux-headers sys-libs/glibc sys-devel/binutils-config sys-libs/binutils-libs sys-devel/binutils dev-libs/boost sys-devel/gcc-config  sys-devel/gcc sys-devel/libtool sys-devel/llvm sys-devel/clang


George
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5689
Location: Removed by Neddy

PostPosted: Tue Oct 03, 2017 8:24 pm    Post subject: Re: possible kernel gotcha Reply with quote

thumper wrote:
Naib wrote:
ok now this is odd... I have been using -march=znver1 since I converted to GCC-6.x
GCC-7.2 is the only package I have ever had issue with (I have done pretty much all of the stress testing as I fear I have a dodgy chip).
I changed my march to native and it builds oO I then built the toolchain twice (libtools, binutils, glibc, gcc) with no problems

I left emerge continually building gcc-7.2 overnight and everyone built fine...

-march=native should imply -march=znver1 so the results should be identical. So either there is a bug in gcc building or maybe there was an inconsistency in my toolchain that over the last few weeks multiple rebuilds helped align.


I've been using GCC 7.2 since a few days before it hit portage proper, 7.1 prior to that.

For what its worth I rebuilt my toolchain plus some other packages a half dozen times or so out of fear when the segfaults were showing up, until I got a clean pass.

Used this:
Code:
#!/bin/bash
emerge -1v sys-kernel/linux-headers sys-libs/glibc sys-devel/binutils-config sys-libs/binutils-libs sys-devel/binutils dev-libs/boost sys-devel/gcc-config  sys-devel/gcc sys-devel/libtool sys-devel/llvm sys-devel/clang


George
good to know. Funny thing is I swore I was using 7.2 but then on closer inspection i would appear I wasn't & then any attempt to build it failed at exactly the same point. I feared a chip issue and was rigorously applying every stress-test that Ryzen7 users were stating would cause issues (none would on my setup). Fearing this 7.2 was a chip issue I did everything recommended when it was (disable smt, -j1, disable ....) and still no luck.

there should be zero difference between -march=native and -march=znver1 yet -march=native worked and then repeated rebuilds worked every single time... Maybe this co-coincided with a binutils bump who knows (another one today).

when gcc-7.2 did build via gcc-7.1+march=native I rebuilt it 3times in a row to be sure. Then I switched to gcc-7.2 and rebuilt the toolchain a number of times emerge libtool glibc binutils gcc #toolchain rebuild Then left emerge gcc in a loop overnight and no problems...

I am now doing an emerge -e @world to rebuild all with gcc-7.2


VERY VERY ODD... but it is working
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
Chewi
Developer
Developer


Joined: 01 Sep 2003
Posts: 875
Location: Edinburgh, Scotland

PostPosted: Tue Oct 03, 2017 10:04 pm    Post subject: Reply with quote

So disabling DC didn't help. Now I've disabled "Global C-State Control" in the BIOS, which I gather disables C6. That should hold up but I want to check it makes a difference.

When it did freeze, I was able to capture the kernel output using netconsole. To the other people experiencing freezes, it would be great if you could try netconsole so that we can compare notes.

Code:
INFO: rcu_preempt detected stalls on CPUs/tasks:
\x096-...: (0 ticks this GP) idle=c54/0/0 softirq=72512/72512 fqs=0
\x097-...: (10 GPs behind) idle=248/0/0 softirq=55050/55050 fqs=0
\x098-...: (1 GPs behind) idle=0a0/0/0 softirq=53085/53086 fqs=0
\x099-...: (8 GPs behind) idle=1cc/0/0 softirq=24670/24670 fqs=0
\x0910-...: (5 GPs behind) idle=a98/0/0 softirq=48138/48138 fqs=0
\x0911-...: (1 GPs behind) idle=f20/0/0 softirq=25923/25924 fqs=0
\x09
(detected by 2, t=63859 jiffies, g=189965, c=189964, q=171)
Sending NMI from CPU 2 to CPUs 6:
Sending NMI from CPU 2 to CPUs 7:
Sending NMI from CPU 2 to CPUs 8:
Sending NMI from CPU 2 to CPUs 9:
Sending NMI from CPU 2 to CPUs 10:
NETDEV WATCHDOG: wan0 (igb): transmit queue 0 timed out
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at /home/chewi/Projects/linux/net/sched/sch_generic.c:316 dev_watchdog+0x212/0x220
Modules linked in:
 it87(O)
 hwmon_vid
 netconsole
 ip6table_mangle
 nf_log_ipv6
 nf_conntrack_ipv6
 nf_defrag_ipv6
 xt_connmark
 iptable_mangle
 xt_helper
 ipt_REJECT
 nf_reject_ipv4
 nf_log_ipv4
 nf_log_common
 xt_LOG
 xt_limit
 nf_conntrack_ipv4
 nf_defrag_ipv4
 xt_tcpudp
 xt_multiport
 xt_conntrack
 nf_conntrack_ftp
 nf_conntrack_tftp
 nf_conntrack_irc
 nf_conntrack_pptp
 nf_conntrack_proto_gre
 nf_conntrack
 ip6table_filter
 ip6_tables
 iptable_filter
 ip_tables
 x_tables
 nfsd
 auth_rpcgss
 oid_registry
 lockd
 grace
 bnep
 cachefiles
 fscache
 bluetooth
 ecdh_generic
 xfs
 ftdi_sio
 usbserial
 kvm_amd
 kvm
 snd_hda_codec_realtek
 snd_hda_codec_generic
 amdgpu
 snd_hda_codec_hdmi
 snd_hda_intel
 snd_hda_codec
 irqbypass
 snd_hwdep
 crct10dif_pclmul
 crc32_pclmul
 ghash_clmulni_intel
 mfd_core
 drm_kms_helper
 pcbc
 cfbfillrect
 syscopyarea
 cfbimgblt
 sysfillrect
 sysimgblt
 fb_sys_fops
 aesni_intel
 cfbcopyarea
 ttm
 drm
 aes_x86_64
 snd_hda_core
 snd_pcm
 snd_timer
 snd
 tun
 ccp
 crypto_simd
 glue_helper
 cryptd
 ppp_generic
 slhc
 loop
 crc32c_intel
 alx
 igb
 mdio
 i2c_algo_bit
 sunrpc
 [last unloaded: netconsole]
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.13.4-01268-g72d693146ebf #61
Hardware name: Gigabyte Technology Co., Ltd. AX370-Gaming 5/AX370-Gaming 5, BIOS F9a 09/08/2017
task: ffffffff8180e480 task.stack: ffffffff81800000
RIP: 0010:dev_watchdog+0x212/0x220
RSP: 0018:ffff88041ec03e90 EFLAGS: 00010286
RAX: 0000000000000037 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88041ec0c8f8 RDI: ffff88041ec0c8f8
RBP: ffff88040781441c R08: 0000000000000001 R09: 000000000000047b
R10: 0000000000001000 R11: 000000002b300077 R12: ffff880407814000
R13: 0000000000000000 R14: 0000000000000008 R15: ffff880408a30940
FS:  0000000000000000(0000) GS:ffff88041ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff137422000 CR3: 000000040a109000 CR4: 00000000003406f0
Call Trace:
 <IRQ>
 ? qdisc_rcu_free+0x40/0x40
 ? qdisc_rcu_free+0x40/0x40
 ? call_timer_fn.isra.6+0x11/0x70
 ? expire_timers+0x92/0xa0
 ? run_timer_softirq+0x9f/0xd0
 ? tick_sched_timer+0x4c/0x70
 ? timerqueue_add+0x52/0x80
 ? ktime_get+0x36/0x98
 ? __do_softirq+0xc9/0x208
 ? irq_exit+0xa3/0xa8
 ? smp_apic_timer_interrupt+0x5e/0x80
 ? apic_timer_interrupt+0x7f/0x90
 </IRQ>
 ? acpi_idle_do_entry+0x2b/0x40
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
Page 4 of 8

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum