View previous topic :: View next topic |
Author |
Message |
Marlo Veteran
Joined: 26 Jul 2003 Posts: 1591
|
Posted: Mon Jul 23, 2018 6:17 am Post subject: |
|
|
Your kernel is now running for 6 days. You originally wanted that.
Are you really sure that your motherbord has a watchdog SP5100?
The kernel says: "The Total Cost of Ownership is a watchdog timer that will reboot the machine after its expiration."
You dont receive any useful information or actions with this kerneloption.
Disable SP5100_TCO. _________________ ------------------------------------------------------------------
http://radio.garden/ |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Mon Jul 23, 2018 2:00 pm Post subject: |
|
|
Thanks Marlo. I really appreciate the help.
I have removed all the watchdog timer drivers and will see how it goes. _________________ Some day there will only be free software. |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Sat Aug 04, 2018 4:44 pm Post subject: |
|
|
It looks like that was indeed the problem.
I removed all watchdog modules and the system ran for 10 days without issue.
I have removed all the debugging I turned on and am running a cleaner kernel to see what happens. It's been up for a day so far with kernel gentoo-4.17.11. _________________ Some day there will only be free software. |
|
Back to top |
|
|
Marlo Veteran
Joined: 26 Jul 2003 Posts: 1591
|
Posted: Sat Aug 04, 2018 7:32 pm Post subject: |
|
|
In the meantime, I think that the bug at the beginning of this thread is not a CPU error.
It was definitely in the kernel. Maybe in Power Management and ACPI options.
RayDude, I can safely confirm what you are saying: 4.17.11 is stable for AMD Ryzen 5 1600.
I think you can close this thread with solved.
Thanks for this message.
Best regards
Ma _________________ ------------------------------------------------------------------
http://radio.garden/ |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Sun Aug 05, 2018 3:46 pm Post subject: |
|
|
Hi Marlo,
It hung overnight.
There was nothing in the remote logs to indicate why. In fact there was no data at all.
Can you or someone else please share your .config so I can compare it to mine?
I really want this to be stable. _________________ Some day there will only be free software. |
|
Back to top |
|
|
Marlo Veteran
Joined: 26 Jul 2003 Posts: 1591
|
Posted: Sun Aug 05, 2018 4:02 pm Post subject: |
|
|
Here is my .config.
But I have now developed this .config for qemu, kvm,xen,iptables and so on.
Remember. I have systemd and an AMD Polaris.
Good luck!
If you have any questions. please report here.
Ma _________________ ------------------------------------------------------------------
http://radio.garden/ |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Sat Sep 22, 2018 11:01 pm Post subject: |
|
|
It took me weeks to strip my kernel back but I did it. It's tiny and only has 50 megs of driver modules...
But it still crashes once in a while.
I think it's related to X or KDE...
It is a bit more stable though. Goes for around a week before a dead lock. _________________ Some day there will only be free software. |
|
Back to top |
|
|
Marlo Veteran
Joined: 26 Jul 2003 Posts: 1591
|
Posted: Sun Sep 23, 2018 11:02 am Post subject: |
|
|
Which kernel? _________________ ------------------------------------------------------------------
http://radio.garden/ |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Sun Sep 23, 2018 3:25 pm Post subject: |
|
|
gentoo-sources-4.18.5 crashed twice in a couple of weeks.
I just updated to gentoo-sources-4.18.9 _________________ Some day there will only be free software. |
|
Back to top |
|
|
Marlo Veteran
Joined: 26 Jul 2003 Posts: 1591
|
Posted: Sun Sep 23, 2018 5:32 pm Post subject: |
|
|
Only with the perspective of our topic I have tried these kernels. (Without bios settings and without CMD-Line parameters):
Code: | kernel-config-x86_64-4.14.39-gentoo kernel-config-x86_64-4.17.13-gentoo kernel-config-x86_64-4.17.4-gentoo kernel-config-x86_64-4.18.6-gentoo
kernel-config-x86_64-4.14.65-gentoo-wren kernel-config-x86_64-4.17.14-gentoo kernel-config-x86_64-4.17.5-gentoo kernel-config-x86_64-4.18.7-gentoo
kernel-config-x86_64-4.16.11-gentoo kernel-config-x86_64-4.17.15-gentoo-r1 kernel-config-x86_64-4.17.6-gentoo kernel-config-x86_64-4.18.8-gentoo
kernel-config-x86_64-4.16.12-gentoo kernel-config-x86_64-4.17.17-gentoo kernel-config-x86_64-4.17.8-gentoo kernel-config-x86_64-4.18.9-gentoo
kernel-config-x86_64-4.16.13-gentoo kernel-config-x86_64-4.17.17-gentoo-polaris12 kernel-config-x86_64-4.17.9-gentoo kernel-config-x86_64-4.19.0-rc2
kernel-config-x86_64-4.16.8-gentoo kernel-config-x86_64-4.17.19-gentoo kernel-config-x86_64-4.18.0-gentoo kernel-config-x86_64-4.19.0-rc3
kernel-config-x86_64-4.16.9-gentoo kernel-config-x86_64-4.17.1-gentoo kernel-config-x86_64-4.18.1-gentoo-r1 kernel-config-x86_64-4.19.0-rc4
kernel-config-x86_64-4.17.0-gentoo kernel-config-x86_64-4.17.2-gentoo kernel-config-x86_64-4.18.2-gentoo kernel-config-x86_64-4.9.95-gentoo
kernel-config-x86_64-4.17.10-gentoo kernel-config-x86_64-4.17.3-gentoo kernel-config-x86_64-4.18.3-gentoo
kernel-config-x86_64-4.17.11-gentoo kernel-config-x86_64-4.17.3-gentook kernel-config-x86_64-4.18.4-gentoo
kernel-config-x86_64-4.17.12-gentoo kernel-config-x86_64-4.17.3-gentoosu kernel-config-x86_64-4.18.5-gentoo |
In my experience, Linux 4.17.14 Gentoo is good and stable.
The entire 4.18.xx Gentoo series is so far completely useless. Maybe a patch is missing? Look here for this article.
What really excites me is the Git-Version of 4.19-rcX. These seem to be stable. In other words, when the Gentoo Kernel 4.19.X-gentoo is released, I can compare it to the Git--.rcX version.
As I said, only from the point of view of our topic. _________________ ------------------------------------------------------------------
http://radio.garden/ |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Mon Sep 24, 2018 5:05 pm Post subject: |
|
|
Marlo wrote: |
The entire 4.18.xx Gentoo series is so far completely useless. Maybe a patch is missing? Look here for this article.
What really excites me is the Git-Version of 4.19-rcX. These seem to be stable. In other words, when the Gentoo Kernel 4.19.X-gentoo is released, I can compare it to the Git--.rcX version.
As I said, only from the point of view of our topic. |
What do you mean 4.18.xx Gentoo series is useless? I followed the link, but that's just about power savings. I don't really care about that on my desktop. _________________ Some day there will only be free software. |
|
Back to top |
|
|
bammbamm808 Guru
Joined: 08 Dec 2002 Posts: 548 Location: Hawaii
|
Posted: Tue Sep 25, 2018 5:01 am Post subject: |
|
|
My hardware is somewhat similar and I didn't see you reference one of the things that helped get my system rock solid. Not sure what it's called in your bios, or if it's even there, but under whatever passes for 'power supply idle' control, change it from 'low' to 'typical, 'common', 'normal' or whatever seems equivalent. I went through weeks of segfaults, reboots, hardlocks, both when loaded and idle before all my fiddling paid off.
On gentoo-sources-4.17.6 here, with the experimental use flag enabled. '-znver1' is my arch. _________________ MSI MAG B550 Tomahawk
Ryzen 3900x
32Gb Samsung B-die (16GB dual rank x2) DDR4 @ 3200MHz, cl14
Geforce RTX 2070S 8GB
Samsung m.2 NVME pcie-3.0
Etc.... |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Fri Oct 12, 2018 5:09 pm Post subject: |
|
|
bammbamm808 wrote: | My hardware is somewhat similar and I didn't see you reference one of the things that helped get my system rock solid. Not sure what it's called in your bios, or if it's even there, but under whatever passes for 'power supply idle' control, change it from 'low' to 'typical, 'common', 'normal' or whatever seems equivalent. I went through weeks of segfaults, reboots, hardlocks, both when loaded and idle before all my fiddling paid off.
On gentoo-sources-4.17.6 here, with the experimental use flag enabled. '-znver1' is my arch. |
My cheap-ass gigabyte mobo bios is terrible. I don't think I have a setting like that, but I'll look for it. Thanks for the tip. _________________ Some day there will only be free software. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Fri Oct 12, 2018 11:14 pm Post subject: |
|
|
RayDude wrote: | My cheap-ass gigabyte mobo bios is terrible. I don't think I have a setting like that, but I'll look for it. Thanks for the tip. |
Post your mobo model. odds are that someone here is using it and can point you right to it. Include your BIOS version too. My AM4 board is MSI and each BIOS version gets better. |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Sat Oct 13, 2018 4:41 pm Post subject: |
|
|
I have a Gigabyte AB350M-D3H, BIOS version F23d.
I looked for a setting like the one you described and didn't find anything.
If someone knows where it is, I'd appreciate the help.
Thanks. _________________ Some day there will only be free software. |
|
Back to top |
|
|
bammbamm808 Guru
Joined: 08 Dec 2002 Posts: 548 Location: Hawaii
|
Posted: Sat Oct 13, 2018 11:29 pm Post subject: |
|
|
RayDude wrote: | I have a Gigabyte AB350M-D3H, BIOS version F23d.
I looked for a setting like the one you described and didn't find anything.
If someone knows where it is, I'd appreciate the help.
Thanks. |
Historically half-baked uefi implementations is THE reason i avoid gigabyte and choose asud. Otherwise they both seem to offer comparable boards. Asus seems to include more options. _________________ MSI MAG B550 Tomahawk
Ryzen 3900x
32Gb Samsung B-die (16GB dual rank x2) DDR4 @ 3200MHz, cl14
Geforce RTX 2070S 8GB
Samsung m.2 NVME pcie-3.0
Etc.... |
|
Back to top |
|
|
Marlo Veteran
Joined: 26 Jul 2003 Posts: 1591
|
Posted: Sun Oct 14, 2018 9:50 am Post subject: |
|
|
Maybe that's an alternative to the bios.
There is this little helper tool zenstates.py, to dynamically edit AMD Ryzen processor P-States. _________________ ------------------------------------------------------------------
http://radio.garden/ |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Mon Oct 15, 2018 5:24 pm Post subject: |
|
|
Thanks, I'll check it out tonight. _________________ Some day there will only be free software. |
|
Back to top |
|
|
krinn Watchman
Joined: 02 May 2003 Posts: 7470
|
Posted: Mon Oct 15, 2018 5:55 pm Post subject: |
|
|
Dunno if it has been ask before, but are you sure your cpu is not a faulty one?
You have a "huge" thread about ryzen, where users in it explains clearly how to ID your cpu and which cpu series are faulty, they even explains the RMA process to get a good cpu.
Because if your ryzen is a bad one, you could pass your life trying tricks in kernel without result
Look at that per example (rma because crach/lockup) https://forums.gentoo.org/viewtopic-p-8127190.html#8127190 |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Tue Oct 16, 2018 5:18 pm Post subject: |
|
|
krinn wrote: | Dunno if it has been ask before, but are you sure your cpu is not a faulty one?
You have a "huge" thread about ryzen, where users in it explains clearly how to ID your cpu and which cpu series are faulty, they even explains the RMA process to get a good cpu.
Because if your ryzen is a bad one, you could pass your life trying tricks in kernel without result
Look at that per example (rma because crach/lockup) https://forums.gentoo.org/viewtopic-p-8127190.html#8127190 |
Mine was a faulty one. AMD replaced it last year. It worked fine (24/7/months and months) for a long while, then this started happening. It may have been because of a bios update (I was trying to get my memory to go faster at one point). It might have been because of a kernel update... There is really no way to tell. It could be the MOBO over voltaged it and partially killed it (I did not install the infamous gigabyte over voltage BIOS, but that doesn't mean there weren't other BIOS versions with the same bug...), but that seems unlikely.
It could be that the mobo itself is bad... I put in a different PSU (to be able to plug in another SSD), but it should be more than capable of driving the Ryzen and a GTX1080... I think.
It's funny just how tolerable it is to crash every 9 to 10 days... It just doesn't feel like linux...
I'll try the zenstates.py file when I get a chance... Been busy these days....
Thanks for the suggestions, I really appreciate everyone taking the time to post. _________________ Some day there will only be free software. |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Sun Oct 21, 2018 5:33 pm Post subject: |
|
|
Just to confirm, I'm disabling C6 with zenstates, is that the right thing to do?
I'm doing a two week test now, and if it makes it to two weeks it would be the first time in almost a year to do so. _________________ Some day there will only be free software. |
|
Back to top |
|
|
dewhite Tux's lil' helper
Joined: 16 Mar 2003 Posts: 106 Location: Houston, Texas, USA
|
Posted: Sat Jan 12, 2019 2:58 am Post subject: |
|
|
RayDude wrote: | Just to confirm, I'm disabling C6 with zenstates, is that the right thing to do?
I'm doing a two week test now, and if it makes it to two weeks it would be the first time in almost a year to do so. |
RayDude:
Did you ever get this under control? I'm chasing similar issues with a 1700x and I've tried most of the things that you made reference to in this thread. Did you ever find a combination of tweaks that successfully managed your issues? _________________ Work FS: R7-5700g | 2x16Gb DDR4 | 500Gb NVMe LUKS root | 2x 8TB RAID1
Home FS: R7-1700x | 2x8Gb DDR4 | 275Gb M.2 SATA LUKS root | 2x 14TB RAID1 |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2062 Location: San Jose, CA
|
Posted: Sat May 25, 2019 3:30 pm Post subject: |
|
|
Sorry I didn't reply.
This fell off my most recent posts screen.
The zenstates.py script fixed my problem when I disabled C6.
Now, I'm doing a fresh install and my kernel (using the same .config from the old system) hangs when called from grub. No output.
I'm in the process of adding nocbs to the kernel command line. I'm hoping that fixes this new install. _________________ Some day there will only be free software. |
|
Back to top |
|
|
dewhite Tux's lil' helper
Joined: 16 Mar 2003 Posts: 106 Location: Houston, Texas, USA
|
Posted: Thu May 30, 2019 3:23 pm Post subject: |
|
|
RayDude wrote: | Sorry I didn't reply.
This fell off my most recent posts screen.
The zenstates.py script fixed my problem when I disabled C6.
Now, I'm doing a fresh install and my kernel (using the same .config from the old system) hangs when called from grub. No output.
I'm in the process of adding nocbs to the kernel command line. I'm hoping that fixes this new install. |
Thanks for your reply. I ended up determining that I had an early production (week 7) part and RMA'ing to AMD. Got a week 41 part back from AMD after 10 day turn-around and have not had any issues since then. _________________ Work FS: R7-5700g | 2x16Gb DDR4 | 500Gb NVMe LUKS root | 2x 8TB RAID1
Home FS: R7-1700x | 2x8Gb DDR4 | 275Gb M.2 SATA LUKS root | 2x 14TB RAID1 |
|
Back to top |
|
|
|