Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
frequent hard-locks with Xorg, no suspicous items in logs
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Desktop Environments
View previous topic :: View next topic  
Author Message
stahlsau
Guru
Guru


Joined: 09 Jan 2004
Posts: 582
Location: WildWestwoods

PostPosted: Mon Sep 23, 2019 9:47 am    Post subject: frequent hard-locks with Xorg, no suspicous items in logs Reply with quote

*edit: see my last posts, issue caused by nouveau - now running proprietary nvidia-drivers, works for the moment
_________________________________________________

Hi all,
as the title says, I get frequent chrashes / hard lockups with Xorg. Mouse frozen, console not available, nothing works (magic-sysrq etc.). Have to power off the machine.

Running fresh install of gentoo 17.1 amd64, init openrc, ext4, on a AMD Ryzen 2600X (no OC), 16GB RAM, Nvidia RTX2060 with nouveau (no acceleration since not supported), fluxbox + standard Xorg-server. Recent ~arch kernel 5.3.0, since the older ones don't support the graphics card (nouveau).

Memtest and filesystem tests tell no problem. Box running for weeks under Win10, no problems (prime95 etc ok for hours). So I'd rule out heat or memory problems.

Hardlocks during surfing with firefox, or during emerge some packages, or...dunno, sometimes doing some stuff without much load on the system. Emerged the complete system in framebuffer-console, not a single lockup. Running X, lockups from minute 1 to minute 300 (estimated) possible. Nothing special in the logs, mostly they end with messages from the wireless card adjusting the frequency or something..

Some things that came in my mind:
- could it be that I'm running out of memory (don't have a swap drive), would it be the same symptions?
- could some userspace-program lock the whole system up? Though i dunno which one...maybe firefox?


AFAIK, my Ryzen is not affected by the famous bug, additionally I've enabled microcode loading, so this should not be an issue(?)



Hope someone can give me a tip, I'm a bit clueless...


Last edited by stahlsau on Fri Sep 27, 2019 4:08 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45805
Location: 56N 3W

PostPosted: Mon Sep 23, 2019 10:18 am    Post subject: Reply with quote

stahlsau,

Its not running out of memory. Usually the out of memory manager kicks in and kills something. The system does not lock up.
Random sounds like hardware somewhere. Software problems are usually deterministic and affect lots of users.

Booting into memtest86 and running a few cycles is good.
memtest86 can point to other things than faulty RAM.
If you repeatedly get the same error at the same address, it might be RAM.
Any other problems reported by memtest86 are probably not RAM.

prime95 is a good CPU stress test.

Does your Nvidia RTX2060 card have its own power connector, if so, is it fitted

-- edit --

Enabling microcode loading is required but not sufficient.
You need to provide the microcode every time you build the kernel.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
stahlsau
Guru
Guru


Joined: 09 Jan 2004
Posts: 582
Location: WildWestwoods

PostPosted: Mon Sep 23, 2019 11:13 am    Post subject: Reply with quote

thanks for the fast answer.
I'll check if I really enabled the microcode, I'm not sure.
The graphics card has a separate power plug, of which I'm sure it is used. But, the mainboard has an additional power plug too, and I'm not sure if I used it because I was missing the correct plug during installation. I'll check this this evening, too.

Regarding hardware vs. software - I understand what you mean. But well, the system is running fine under heavy load on windows (I do 3D-CAD with sometimes huge assemblies, and, of course, gaming ;-)), and is running fine compiling lotsa stuff in the terminal without X. The lockups appear to be random, with and without much load (and without any 3D-load, as I wrote, the 3D-part of the card is not supported by the driver).


I'll check these things and report back.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45805
Location: 56N 3W

PostPosted: Mon Sep 23, 2019 2:18 pm    Post subject: Reply with quote

stahlsau,

Windows, binary only thing and Gentoo do things differently.

Windows and binary distributed programs are either built to run on the lowest common denominator hardware, or for a very small numer of different sets of hardware.
e.g. Windows had separate binaries for AMD and Intel CPUs.

Gentoo optimises everything for your particular CPU, if you let it.
You could try to replace -march=native with -mtune=generic if you want a lowest common denominator build.

Its possible that Gentoo is using features of your system that Windows does not know how to reach.
That still makes it a hardware problem but its one that Windows and binary only software will not trigger.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
stahlsau
Guru
Guru


Joined: 09 Jan 2004
Posts: 582
Location: WildWestwoods

PostPosted: Tue Sep 24, 2019 5:36 am    Post subject: Reply with quote

thanks for your answers, that provides some insight.

Checked yesterday: graphics card has additional power plug installed, mainboard doesn't (only 20-pin). I'll get an adapter and install the additional 4-pin-plug from a different rail, the power source is new and strong enough (by far).

Has two lockups yesterday, one browsing firefox, one with some other graphical browser. So, firefox isn't the problem. The other browser doesn't support js, flash etc., which I had in suspicion. ATM it _seems_ to happen while browsing the net, loading some new website, sometimes. Not sure, though.

Logs still empty, system seems to go to a full stop without writing a single bit to the logs.

Maybe the nouveau-driver could be the culprit - the RTX are barely supported, not many people using them afaik. Is there a way to get X working with a decent resolution without nvidia-binary or nouveau?
Back to top
View user's profile Send private message
tomtom69
Apprentice
Apprentice


Joined: 09 Nov 2010
Posts: 207
Location: Bavaria

PostPosted: Tue Sep 24, 2019 5:46 am    Post subject: Reply with quote

Maybe you are hit by this:
https://bugzilla.kernel.org/show_bug.cgi?id=196683
Short form: I had to set the bios option "typical current idle" on my 2 Ryzen systems to get rid of freezes.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45805
Location: 56N 3W

PostPosted: Tue Sep 24, 2019 12:44 pm    Post subject: Reply with quote

stahlsau,

You can try the fbdev or the modesetting driver.
You won't like either of of them.

The fbdev driver uses you graphics card as a chunk of RAM. Thats all.
The GPU does nothing. Its just like the very first Hercules graphics card for the PC.

The modesetting driver tries to do some software acceleration but it needs the right support in mesa.
Again, the GPU does little or nothing.

Nothing in the logs says the lockup is very fast. There is no opportunity to flush buffers to the HDD.

If you want something less radical and easier to test than the fbdev or the modesetting drivers, try adding nomsi to the end of the kernel command line.
Its a small performance hit.

You can do that at boot, for just this boot, by pressing the 'e' key at the grub menu and editing the command line in RAM.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7427

PostPosted: Tue Sep 24, 2019 5:01 pm    Post subject: Reply with quote

https://forums.gentoo.org/viewtopic-t-1083508-highlight-ryzen.html
has been helpful, could be helpful again
Back to top
View user's profile Send private message
stahlsau
Guru
Guru


Joined: 09 Jan 2004
Posts: 582
Location: WildWestwoods

PostPosted: Wed Sep 25, 2019 8:36 am    Post subject: Reply with quote

thanks @ all for the help, I'll test the suggestions one after the other.
Slow progress, real live takes it's toll atm.

I did a bios update since there've been 4 iterations since the one that was installed. No change.

Tried a different windowmanager, differnet browser, didn't help. As I see it now, it has to be a low-level problem, either hardware or maybe a driver, no userspace software.

Changed in uefi the suggested powersave-option to the suggested value (typical power idle or something), so the proc won't be volted down too much, apparently this helped in some cases. Not here.

What I will do when i get time:
- disable C6 powersaving mode (probably disable C-modes in bios completely for testing)
- plug in the additional power cord (4-pin) to the mainboard, whch should arrive today
- if that doesn't help, i'll try fbdev or the other mode you mentioned, to rule out a faulty nouveau-driver
- try the kernel stuff at boot: idle=nomwait rcu_nocbs=0-15 pci=msi (gonna double check that...rcu_... is cpu-specific, afaik)


I'll report back!
Back to top
View user's profile Send private message
Chiitoo
Administrator
Administrator


Joined: 28 Feb 2010
Posts: 2087
Location: Here and Away Again

PostPosted: Wed Sep 25, 2019 9:49 am    Post subject: Reply with quote

tomtom69 wrote:
Maybe you are hit by this:
https://bugzilla.kernel.org/show_bug.cgi?id=196683
Short form: I had to set the bios option "typical current idle" on my 2 Ryzen systems to get rid of freezes.

I may have had hangs that possibly got cured by the same setting, though I forget what I have it set to right now. The default might be 'auto', and indeed, 'typical current idle' might be what I set it to, and have not had hangs with. Will need to check the next time I boot.

This is with a Ryzen 7 1700.

The hang actually is before POST for me a lot of times, as if someone pressed the reset button, and then it sits there, doing nothing until I reset it (again). Sometimes it is a hang without the reset though.

As a side-note, another issue I have had, that also resets the machine (but no hang as far as I can remember), had to do with a PCI sound card. It would happen when I touched the volume wheel on my keyboard, but not always. Pretty annoying, I must say. I removed the card and it has not happened while using the integrated chip.

The card (that I had for years and had no such issues with before with my Phenom machine) is an ASUS Xonar DG, using the Oxygen driver.
_________________
Kind regards,
Chiitoo.

You might remember me from Gentoo projects such as Forums, LXQt, Qt, and Wine.
Back to top
View user's profile Send private message
stahlsau
Guru
Guru


Joined: 09 Jan 2004
Posts: 582
Location: WildWestwoods

PostPosted: Wed Sep 25, 2019 5:10 pm    Post subject: Reply with quote

well, trying to answer before the hardlock occurs:

- disabled C-modes in bios completely
still hardlocks
- plug in the additional power cord (4-pin) to the mainboard
still hardlocks

- tried the kernel stuff at boot: idle=nomwait rcu_nocbs=0-11
dunno if this worked, since atm i use genkernel with the initrd....simply appended this to the grub-kernel line?
still hardlocks



new:
output of zenstates:
Code:

inux /home/xxx # ./zenstates.py -l
P0 - Enabled - FID = 90 - DID = 8 - VID = 3A - Ratio = 36.00 - vCore = 1.18750
P1 - Enabled - FID = 80 - DID = 8 - VID = 55 - Ratio = 32.00 - vCore = 1.01875
P2 - Enabled - FID = 84 - DID = C - VID = 72 - Ratio = 22.00 - vCore = 0.83750
P3 - Disabled
P4 - Disabled
P5 - Disabled
P6 - Disabled
P7 - Disabled
C6 State - Package - Disabled
C6 State - Core - Disabled


waiting for the crash....
last idea is running without nouveau.
Back to top
View user's profile Send private message
stahlsau
Guru
Guru


Joined: 09 Jan 2004
Posts: 582
Location: WildWestwoods

PostPosted: Thu Sep 26, 2019 8:01 am    Post subject: Reply with quote

zenstates.py didn't work, too. So, i guess, I've tried everything except ditching nouveau - which I did yesterday evening.

Running X on fbdev works ok, a bit 80's-style but whatever...
Tried for about 1 1/2 hours, emerging stuff, idling, browsing...no lockup so far. First time i powered the box off normally (running linux - no probs with win 10 so far) ;)

Will do some further testing today, if it doesn't lock, i'll contact the guys at #noveau and ask what they think.
Tried proprietary nvidia drivers but didn't get them to work, i remember it has always been a hassle years ago too, when i tried it. Stuck with fbdev, then, for the time being.

I'll report back if this solved my issues.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45805
Location: 56N 3W

PostPosted: Thu Sep 26, 2019 10:10 pm    Post subject: Reply with quote

stahlsau,

The modesetting driver will be a step up from fbdev.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
stahlsau
Guru
Guru


Joined: 09 Jan 2004
Posts: 582
Location: WildWestwoods

PostPosted: Fri Sep 27, 2019 9:07 am    Post subject: Reply with quote

so, well, it seems that running without nouveau stops the lock-ups.
Issue 1 solved - thanks for all the help guys!

I'll check modesetting, thanks for the heads up.
I'll also get in touch with the nouveau guys, seems like this bug isn't known.

Btw., I get the feeling that there isn't much going on on the forums anymore, might that be? IIRC there were some hundred new posts every day some years ago, now there are 30..guess there are some more newbie-friendly distros which get most of the attention now - sad enough. I always found gentoo very nice to learn and very good to fine-adjust what i want and what not. But well, it ain't dead ;-)
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7427

PostPosted: Fri Sep 27, 2019 10:10 am    Post subject: Reply with quote

stahlsau wrote:
Btw., I get the feeling that there isn't much going on on the forums anymore, might that be? IIRC there were some hundred new posts every day some years ago, now there are 30..guess there are some more newbie-friendly distros which get most of the attention now - sad enough. I always found gentoo very nice to learn and very good to fine-adjust what i want and what not. But well, it ain't dead ;-)

or those years ago users are now confirm users and have less need of help ;)
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45805
Location: 56N 3W

PostPosted: Sat Sep 28, 2019 8:40 am    Post subject: Reply with quote

stahlsau,

I think the Gentoo documentation has improved, so there are fewer early install errors.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Desktop Environments All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum