

poweroffalamahant wrote:what happens?I noticed with my Gentoo install is that it crashes whenever I try to do a heavy task
Does it become unresponsive or poweroff?
What does dmesg say?
Code: Select all
[ 84.301576] xhci_hcd 0000:01:00.0: WARN: buffer overrun event for slot 3 ep 4 on endpoint


so I reached this part of the linux-firmware install:alamahant wrote:Plz install
linux-firmware
https://wiki.gentoo.org/wiki/AMD_microcode#Emerge
and maybe be check if the fan is to blame...
Code: Select all
Regenerate the grub config using following command:
root #grub-mkconfig -o /boot/grub/grub.cfg
Code: Select all
grub-mkconfig -o /boot/grub/grub.cfg
Code: Select all
/usr/sbin/grub-mkconfig: line 260: /boot/grub/grub.cfg.new: No such file or directory

outputs in orderalamahant wrote:Try plzCode: Select all
ls /boot/grub mountpoint /boot mount /boot ls /boot/grub
Code: Select all
ls: cannot access '/boot/grub': No such file or directory
/boot is a mountpoint
mount: /boot: /dev/sda1 already mounted on /boot.
ls: cannot access '/boot/grub': No such file or directory
Code: Select all
umount /boot
ls /boot


I saw some other people having the same problem in the past but whatever.NeddySeagoon wrote:dragonfire2003,
As its only you having this problem, its something unique to you.
That usually means hardware, as we all share the same software.
Poweroff points to overheating, an the system shutting down, to save itself from damage.
Being old and cynical, tell us how you know the temperatures and the RAM are good?
I've just had two faulty RAM sticks. The first one was easy to diagnose. Uncorrectable ECC errors at boot, so booting was not possible.
The second was harder. It too gave uncorrectable ECC errors eventually but it took over a week to pinpoint it to the RAM.
Note that this is ECC RAM too. Ordinary RAM is much harder to diagnose.
If you overclock, that includes XMP, turn it all off.
Indeed that should be the thing that's causing my system to shut down, But it doesn't make sense! I have 6 fans along with an external one and I live in the 9th coldest city in Brazil, I also checked the temperature and it seems fine!Poweroff points to overheating, an the system shutting down, to save itself from damage.
Temperature seems fine (50° which is the usual) and I've checked my RAM sticks and they also seem fine.Being old and cynical, tell us how you know the temperatures and the RAM are good?


Code: Select all
[ 20.387963] usb 1-9: Not enough bandwidth for new device state.
[ 20.387968] usb 1-9: Not enough bandwidth for altsetting 1
[ 20.387969] usb 1-9: 1:1: usb_set_interface failed (-28)
[ 20.393089] usb 1-9: Not enough bandwidth for new device state.
[ 20.393090] usb 1-9: Not enough bandwidth for altsetting 1
[ 20.393091] usb 1-9: 1:1: usb_set_interface failed (-28)
....

NeddySeagoon wrote:dragonfire2003,
Tell us how you measure the temperature
Tell us how you tested the RAM.
Did you assemble the system yourself?
If so tell us how the heatsink is fitted to the CPU. Thermal paste and so on.
I used my cousin's thermal cameraTell us how you measure the temperature
I used a few diagnostic tools and I opened it myself to check if there was anything wrong (I know what I'm doing and I have the tools for opening it)Tell us how you tested the RAM.

I wish I could try another kernel but because of nvidia's bullsh*t I can'tCooSee wrote:you should try another kernel.
tried long-term kernel once, but system behaved weird, therefore i stayed with current gentoo-sources.
regarding binary kernel - (no offence) never liked it, because there are to much things activated which i never need.
or maybe, try other distro,e.g Garuda via usb, if the behaviour of your system is the same.
good luck

Every piece of hardware is working fine and I'm 100% sure about that, I have tested everything on my PC and it all works fine.eccerr0r wrote:Don't forget bad motherboards, had that happen too - when parts (cpu, ram) tested in another board, it works fine.
And about XMP ... I have one computer that if I disable XMP, the machine won't boot Linux. Running memtest86+ I get tons of errors. With it enabled, machine boots and runs fine, and memtest86+ comes clean. *shrug* not sure what's up with this.


Code: Select all
$ cat /sys/class/thermal/thermal_zone0/temp
42842I had a similar problem in the past, and it turned out to be a hardware problem.dragonfire2003 wrote:I saw some other people having the same problem in the past but whatever.


TBH all CPUs with high power dissipation and "low" temperature tolerance are subject to heatsink paste issues... at least causing thermal throttling events. I knew of the old Athlon XPs that would literally immolate if you did not have a heatsink on (and probably similar if your heatsink paste was not up to snuff) but are the newer ones as sensitive? Haven't gotten a new CPU in ages...Goverp wrote:A thought, probably irrelevant, but AMD cpus are notoriously sensitive to heatsink paste. If you fit you own fan and don't get the paste right, in the past at least, you'd get thermal problems or in the worst case damage.