Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Other Things Gentoo
  • Search

Gentoo crashes when trying to do a 'heavy' task

Still need help with Gentoo, and your question doesn't fit in the above forums? Here is your last bastion of hope.
Post Reply
Advanced search
32 posts
  • 1
  • 2
  • Next
Author
Message
dragonfire2003
n00b
n00b
Posts: 53
Joined: Mon Mar 14, 2022 3:21 pm

Gentoo crashes when trying to do a 'heavy' task

  • Quote

Post by dragonfire2003 » Sun Apr 10, 2022 8:57 pm

Good morning people of the Gentoo Forums! So, Recently a very ODD behavior I noticed with my Gentoo install is that it crashes whenever I try to do a heavy task
(And yes, Said tasks can be done on other operating systems, even Arch)
Specs:
24GB of ram
RTX 2060
AMD Ryzen 5
SSD
I'm using the Binary Kernel
Tasks that make Gentoo crash:
Trying to render a video on DaVinci resolve
Trying to play a game on wine
There are probably more which I didn't find out about yet, But those are the ones I need a fix ASAP
Two things I'm sure aren't related:
- Cooling Issues (Temperature is normal)
- Ram issues
Top
alamahant
Advocate
Advocate
Posts: 4034
Joined: Sat Mar 23, 2019 12:12 pm

  • Quote

Post by alamahant » Sun Apr 10, 2022 9:01 pm

I noticed with my Gentoo install is that it crashes whenever I try to do a heavy task
what happens?
Does it become unresponsive or poweroff?
What does dmesg say?
:)
Top
dragonfire2003
n00b
n00b
Posts: 53
Joined: Mon Mar 14, 2022 3:21 pm

  • Quote

Post by dragonfire2003 » Sun Apr 10, 2022 9:14 pm

alamahant wrote:
I noticed with my Gentoo install is that it crashes whenever I try to do a heavy task
what happens?
Does it become unresponsive or poweroff?
What does dmesg say?
poweroff
dmesg output: https://pastebin.com/NnyMqpZp
the last line is intriguing

Code: Select all

[   84.301576] xhci_hcd 0000:01:00.0: WARN: buffer overrun event for slot 3 ep 4 on endpoint

Top
alamahant
Advocate
Advocate
Posts: 4034
Joined: Sat Mar 23, 2019 12:12 pm

  • Quote

Post by alamahant » Sun Apr 10, 2022 11:10 pm

I dont think its the culprit but what do you have connected via usb?
xhci_hcd
seems to be usb related.
:)
Top
dragonfire2003
n00b
n00b
Posts: 53
Joined: Mon Mar 14, 2022 3:21 pm

  • Quote

Post by dragonfire2003 » Sun Apr 10, 2022 11:39 pm

alamahant wrote:I dont think its the culprit but what do you have connected via usb?
xhci_hcd
seems to be usb related.
things i have connected:
a fan
my mouse
my keyboard
my microphone
thats it
Top
alamahant
Advocate
Advocate
Posts: 4034
Joined: Sat Mar 23, 2019 12:12 pm

  • Quote

Post by alamahant » Sun Apr 10, 2022 11:52 pm

Plz install
linux-firmware
https://wiki.gentoo.org/wiki/AMD_microcode#Emerge
and maybe be check if the fan is to blame...
:)
Top
dragonfire2003
n00b
n00b
Posts: 53
Joined: Mon Mar 14, 2022 3:21 pm

  • Quote

Post by dragonfire2003 » Mon Apr 11, 2022 12:33 am

alamahant wrote:Plz install
linux-firmware
https://wiki.gentoo.org/wiki/AMD_microcode#Emerge
and maybe be check if the fan is to blame...
so I reached this part of the linux-firmware install:

Code: Select all

Regenerate the grub config using following command:
root #grub-mkconfig -o /boot/grub/grub.cfg
when I run

Code: Select all

grub-mkconfig -o /boot/grub/grub.cfg
this shows up:

Code: Select all

/usr/sbin/grub-mkconfig: line 260: /boot/grub/grub.cfg.new: No such file or directory
and I tried to render a video on davinci without the fan and without the microphone, same results
edit: i also cannot edit any kernel settings bc im using the binary kernel
Top
alamahant
Advocate
Advocate
Posts: 4034
Joined: Sat Mar 23, 2019 12:12 pm

  • Quote

Post by alamahant » Mon Apr 11, 2022 12:38 am

Try plz

Code: Select all

ls  /boot/grub
mountpoint /boot
mount /boot
ls  /boot/grub
:)
Top
dragonfire2003
n00b
n00b
Posts: 53
Joined: Mon Mar 14, 2022 3:21 pm

  • Quote

Post by dragonfire2003 » Mon Apr 11, 2022 12:41 am

alamahant wrote:Try plz

Code: Select all

ls  /boot/grub
mountpoint /boot
mount /boot
ls  /boot/grub
outputs in order

Code: Select all

ls: cannot access '/boot/grub': No such file or directory
/boot is a mountpoint
mount: /boot: /dev/sda1 already mounted on /boot.
ls: cannot access '/boot/grub': No such file or directory
Top
alamahant
Advocate
Advocate
Posts: 4034
Joined: Sat Mar 23, 2019 12:12 pm

  • Quote

Post by alamahant » Mon Apr 11, 2022 4:20 pm

Then plz do

Code: Select all

umount /boot
ls /boot
Have you actually installed grub(grub-install..........)
?
:)
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56105
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

  • Quote

Post by NeddySeagoon » Mon Apr 11, 2022 5:35 pm

dragonfire2003,

As its only you having this problem, its something unique to you.
That usually means hardware, as we all share the same software.

Poweroff points to overheating, an the system shutting down, to save itself from damage.

Being old and cynical, tell us how you know the temperatures and the RAM are good?

I've just had two faulty RAM sticks. The first one was easy to diagnose. Uncorrectable ECC errors at boot, so booting was not possible.
The second was harder. It too gave uncorrectable ECC errors eventually but it took over a week to pinpoint it to the RAM.
Note that this is ECC RAM too. Ordinary RAM is much harder to diagnose.

If you overclock, that includes XMP, turn it all off.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
dragonfire2003
n00b
n00b
Posts: 53
Joined: Mon Mar 14, 2022 3:21 pm

  • Quote

Post by dragonfire2003 » Mon Apr 11, 2022 7:05 pm

NeddySeagoon wrote:dragonfire2003,

As its only you having this problem, its something unique to you.
That usually means hardware, as we all share the same software.

Poweroff points to overheating, an the system shutting down, to save itself from damage.

Being old and cynical, tell us how you know the temperatures and the RAM are good?

I've just had two faulty RAM sticks. The first one was easy to diagnose. Uncorrectable ECC errors at boot, so booting was not possible.
The second was harder. It too gave uncorrectable ECC errors eventually but it took over a week to pinpoint it to the RAM.
Note that this is ECC RAM too. Ordinary RAM is much harder to diagnose.

If you overclock, that includes XMP, turn it all off.
I saw some other people having the same problem in the past but whatever.
Poweroff points to overheating, an the system shutting down, to save itself from damage.
Indeed that should be the thing that's causing my system to shut down, But it doesn't make sense! I have 6 fans along with an external one and I live in the 9th coldest city in Brazil, I also checked the temperature and it seems fine!
Being old and cynical, tell us how you know the temperatures and the RAM are good?
Temperature seems fine (50° which is the usual) and I've checked my RAM sticks and they also seem fine.
(Done a lot of diagnostics, Nothing seems wrong with them and no weird errors at startup)
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Mon Apr 11, 2022 7:07 pm

Don't forget bad motherboards, had that happen too - when parts (cpu, ram) tested in another board, it works fine.

And about XMP ... I have one computer that if I disable XMP, the machine won't boot Linux. Running memtest86+ I get tons of errors. With it enabled, machine boots and runs fine, and memtest86+ comes clean. *shrug* not sure what's up with this.
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56105
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

  • Quote

Post by NeddySeagoon » Mon Apr 11, 2022 7:13 pm

dragonfire2003,

Tell us how you measure the temperature
Tell us how you tested the RAM.

Did you assemble the system yourself?
If so tell us how the heatsink is fitted to the CPU. Thermal paste and so on.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
CooSee
Veteran
Veteran
User avatar
Posts: 1617
Joined: Sat Nov 20, 2004 10:38 pm
Location: right here !

  • Quote

Post by CooSee » Mon Apr 11, 2022 10:26 pm

you should try another kernel.

tried long-term kernel once, but system behaved weird, therefore i stayed with current gentoo-sources.

regarding binary kernel - (no offence) never liked it, because there are to much things activated which i never need.

or maybe, try other distro,e.g Garuda via usb, if the behaviour of your system is the same.

good luck
" Die Realität ist eine Illusion, die durch Mangel an ehrlicher Kommunikation entsteht "
---
" Der Mensch ist von Natur aus neugierig, was am Ende übrig bleibt ist die Gier "
Top
mike155
Advocate
Advocate
Posts: 4438
Joined: Fri Sep 17, 2010 11:33 pm
Location: Frankfurt, Germany

  • Quote

Post by mike155 » Mon Apr 11, 2022 11:36 pm

What about the USB error messages in dmesg?

Code: Select all

[   20.387963] usb 1-9: Not enough bandwidth for new device state.
[   20.387968] usb 1-9: Not enough bandwidth for altsetting 1
[   20.387969] usb 1-9: 1:1: usb_set_interface failed (-28)
[   20.393089] usb 1-9: Not enough bandwidth for new device state.
[   20.393090] usb 1-9: Not enough bandwidth for altsetting 1
[   20.393091] usb 1-9: 1:1: usb_set_interface failed (-28)
....
I would definitely try to fix the issue.
Top
dragonfire2003
n00b
n00b
Posts: 53
Joined: Mon Mar 14, 2022 3:21 pm

  • Quote

Post by dragonfire2003 » Tue Apr 12, 2022 1:43 am

NeddySeagoon wrote:dragonfire2003,

Tell us how you measure the temperature
Tell us how you tested the RAM.

Did you assemble the system yourself?
If so tell us how the heatsink is fitted to the CPU. Thermal paste and so on.
Tell us how you measure the temperature
I used my cousin's thermal camera
Tell us how you tested the RAM.
I used a few diagnostic tools and I opened it myself to check if there was anything wrong (I know what I'm doing and I have the tools for opening it)
Top
dragonfire2003
n00b
n00b
Posts: 53
Joined: Mon Mar 14, 2022 3:21 pm

  • Quote

Post by dragonfire2003 » Tue Apr 12, 2022 1:45 am

CooSee wrote:you should try another kernel.

tried long-term kernel once, but system behaved weird, therefore i stayed with current gentoo-sources.

regarding binary kernel - (no offence) never liked it, because there are to much things activated which i never need.

or maybe, try other distro,e.g Garuda via usb, if the behaviour of your system is the same.

good luck
I wish I could try another kernel but because of nvidia's bullsh*t I can't
And I tried other systems using a USB Stick and even dual boot, Everything works fine
(Including exporting videos in DaVinci and such)
Top
dragonfire2003
n00b
n00b
Posts: 53
Joined: Mon Mar 14, 2022 3:21 pm

  • Quote

Post by dragonfire2003 » Tue Apr 12, 2022 1:46 am

eccerr0r wrote:Don't forget bad motherboards, had that happen too - when parts (cpu, ram) tested in another board, it works fine.

And about XMP ... I have one computer that if I disable XMP, the machine won't boot Linux. Running memtest86+ I get tons of errors. With it enabled, machine boots and runs fine, and memtest86+ comes clean. *shrug* not sure what's up with this.
Every piece of hardware is working fine and I'm 100% sure about that, I have tested everything on my PC and it all works fine.

I get no errors with memtest86
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Tue Apr 12, 2022 4:11 am

So, video card problems? Video card drivers forcing you to use specific kernels ... use older video card drivers?

You're ruling out everything but it's your computer that's different than the rest of us who are not having problems with the same system software...
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Goverp
Advocate
Advocate
User avatar
Posts: 2404
Joined: Wed Mar 07, 2007 6:41 pm

  • Quote

Post by Goverp » Tue Apr 12, 2022 7:32 am

A thought, probably irrelevant, but AMD cpus are notoriously sensitive to heatsink paste. If you fit you own fan and don't get the paste right, in the past at least, you'd get thermal problems or in the worst case damage.
Greybeard
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56105
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

  • Quote

Post by NeddySeagoon » Tue Apr 12, 2022 8:49 am

dragonfire2003,

A thermal camera will not tell you about your CPU transistor junction temperature, which in one of the ones that matters.

Install lm-sensors and configure it for your motherboard. That will require kernel support if you don't have it.
There may be some kernel provided temperatures in /sys/class/thermal/... The output is in milliC.

Code: Select all

$ cat /sys/class/thermal/thermal_zone0/temp
42842
That's 42.842C.

Boot into memtes86 or memtest86+ and run a few cycles.

Run prime95, which is a good CPU stress test.

You need to tell us what you did and provide results.
Your assertion that its not the hardware, when its only you that is having problems, is unlikely to be correct.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
pjp
Administrator
Administrator
User avatar
Posts: 20668
Joined: Tue Apr 16, 2002 10:35 pm

  • Quote

Post by pjp » Tue Apr 12, 2022 3:50 pm

dragonfire2003 wrote:I saw some other people having the same problem in the past but whatever.
I had a similar problem in the past, and it turned out to be a hardware problem.

The reason people ask what you've done isn't to question your knowledge or abilities. Even very experienced people make mistakes or miss things. The questions are to help others gain a level of confidence that they agree with your analysis. Providing details helps get passed that more quickly.

My problem wasn't discovered by running memtest for ~5 hours. I had to run it for >24 hours before I found what turned out to be a motherboard memory slot problem. Only one specific "heavy" task caused the reboot.
Quis separabit? Quo animo?
Top
Chiitoo
Ninja Apprentice
Ninja Apprentice
User avatar
Posts: 3079
Joined: Sun Feb 28, 2010 5:36 pm
Location: Sore wa sore, kore wa kore... nanoda.

  • Quote

Post by Chiitoo » Wed Apr 13, 2022 12:42 pm

I'll throw in power supply unit gone bad, mostly just because I've had that happen way too often... and it can be easy to test if one happens to have more than one laying about.

Speaking of MemTest86 (not +), the free version from PassMark, I used it to confirm bad RAM late last year, but didn't want to RMA it right under Christmas.

Now I wanted to finally go through it, but wanted to test it once more again and... got no errors. :V

Turns out there was regression introduced in release 9.3, which affects the currently most recent release 9.4.1000 as well.

I got some debug builds from PassMark, and we were able to confirm the issue and it should be fixed in the next release (curiously the paid version is supposedly unaffected). Specifically, errors during the test number 13, hammer test, were not being triggered due to a "single-sided" version of the test being used instead of a "double-sided" version.
Kindest of regardses.
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Wed Apr 13, 2022 3:55 pm

Goverp wrote:A thought, probably irrelevant, but AMD cpus are notoriously sensitive to heatsink paste. If you fit you own fan and don't get the paste right, in the past at least, you'd get thermal problems or in the worst case damage.
TBH all CPUs with high power dissipation and "low" temperature tolerance are subject to heatsink paste issues... at least causing thermal throttling events. I knew of the old Athlon XPs that would literally immolate if you did not have a heatsink on (and probably similar if your heatsink paste was not up to snuff) but are the newer ones as sensitive? Haven't gotten a new CPU in ages...
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Post Reply

32 posts
  • 1
  • 2
  • Next

Return to “Other Things Gentoo”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic