View previous topic :: View next topic |
Author |
Message |
Amity88 Apprentice
Joined: 03 Jul 2010 Posts: 260 Location: Third planet from the Sun
|
Posted: Sun Feb 11, 2018 4:29 pm Post subject: System hangs randomly but only when using amdgpu [solved] |
|
|
This is a fresh Linux system, the screen randomly fills with a color (blue/white/yellow etc) and is rendered unusable till a restart. I'm not even sure if this is a hang because sometimes when this happens, I can still hear the audio from the youtube video.
It's weird as it never happens in Windows 8.1 but randomly hits me when I use Gentoo or SysRescueCD or SuSe or Mint of FreeBSD. In short it happens on any non-Windows OS.
I don't think it's a video issue because it happens even in pure CLI. The kern log or dmesg file doesn't indicate any error at the time of the hang. Do you guys have any suggestions on what else I could check to fix this?
Here's the output of lspci, this is an ASUS H81M-CS motherboard:
Code: |
00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:02.0 Display controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)
00:1c.1 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #2 (rev d5)
00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d5)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation C220 Series Chipset Family H81 Express LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland XT [Radeon HD 8670 / R7 250/350]
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
03:00.0 Network controller: Qualcomm Atheros AR9485 Wireless Network Adapter (rev 01)
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 11)
|
uname -an
Code: |
Linux vivalarev 4.9.76-gentoo-r1 #3 SMP Sun Feb 11 18:08:13 IST 2018 x86_64 Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz GenuineIntel GNU/Linux
|
_________________
Ant P. wrote: | The enterprise distros sell their binaries. Canonical sells their users. |
Also... Be ignorant... Be happy!
Last edited by Amity88 on Sat Oct 09, 2021 5:35 pm; edited 2 times in total |
|
Back to top |
|
|
Jaglover Watchman
Joined: 29 May 2005 Posts: 8291 Location: Saint Amant, Acadiana
|
|
Back to top |
|
|
Amity88 Apprentice
Joined: 03 Jul 2010 Posts: 260 Location: Third planet from the Sun
|
Posted: Tue Feb 13, 2018 6:29 am Post subject: |
|
|
I tried running memtest86 over the past 24 hours and didn't get any errors or blanks screens.
Currently, I suspect that it's the AMG GPU driver (amdgpu R7 250, Southern Islands GCN 1.0) that is causing the issue. For the purpose of debug, I'm gonna try the following:
1. Try using the older radeon driver and see if the issue persists.
2. If that doesn't work, I'll try using the onboard Intel GPU. _________________
Ant P. wrote: | The enterprise distros sell their binaries. Canonical sells their users. |
Also... Be ignorant... Be happy! |
|
Back to top |
|
|
Amity88 Apprentice
Joined: 03 Jul 2010 Posts: 260 Location: Third planet from the Sun
|
Posted: Thu Feb 15, 2018 5:06 pm Post subject: System hangs randomly but only when using amdgpu/radeon |
|
|
(changing the subject to better reflect the actual issue)
So, I was able to narrow down the issue to the gprahics driver.
1. The screen blanks out randomly when I used AMDGPU drivers.
2. It's a lot worse when I used Radion.
3. The only thing that worked in the past was the old fglrx driver a year ago. Can't use this anymore though cause they're dropped support
4. The onboard Intel GPU driver is stable. This is what I'm using now.
Not sure how I can fix this. If you guys get the AMD R7 250 (Southern Islands) working without random hangs, please let me know. _________________
Ant P. wrote: | The enterprise distros sell their binaries. Canonical sells their users. |
Also... Be ignorant... Be happy! |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3339 Location: Rasi, Finland
|
Posted: Thu Feb 15, 2018 5:17 pm Post subject: |
|
|
Do you get anything in dmesg/logs?
Have you tried other kernel versions?
I have VGA: | VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] | on my server. So I could start poking around. I just need to attach a monitor to it. _________________ ..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote: | I am NaN! I am a man! |
Last edited by Zucca on Thu Feb 15, 2018 7:55 pm; edited 1 time in total |
|
Back to top |
|
|
Amity88 Apprentice
Joined: 03 Jul 2010 Posts: 260 Location: Third planet from the Sun
|
Posted: Thu Feb 15, 2018 5:44 pm Post subject: |
|
|
I didn't find anything in the logs/dmesg when booted into SysRescueCD after an incident.
About the other kernel version. This system used to work fine with the fglrx drivers. Things just got messy after AMD moved over to the amdgpu drivers.
Also, it's good to know that you actually have something close in design. I think mine is GCN 1.1 and your is probably GCN 1.2
I haven't really started using this build so I'm willing to experiment if you have anything you want me to try.. _________________
Ant P. wrote: | The enterprise distros sell their binaries. Canonical sells their users. |
Also... Be ignorant... Be happy! |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54220 Location: 56N 3W
|
Posted: Thu Feb 15, 2018 6:19 pm Post subject: |
|
|
Amity88,
I have a similar issue with an R450.
I've tried different motherboard slots, the old and new amdgpu drivers, turning off Message Signalled IRQs. (its a command line option)
Memtest finds nothing and there is nothing in kernel logs.
The incident halts the CPU, as it won't even respond to the reset button, which is probably why there is nothing in the logs.
After a power cycle, the system often restarts with one core missing.
A restart can take a couple of hours too. Its left my raid5 'dirty' a few times so it does a resync.
So far, I've only tried the two amdgpu drivers but a few other non accelerated drives should work.
That may help determine if its a hardware or a software problem. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Amity88 Apprentice
Joined: 03 Jul 2010 Posts: 260 Location: Third planet from the Sun
|
Posted: Thu Feb 15, 2018 7:06 pm Post subject: |
|
|
Hey there Neddy
I've ruled out any hardware issues because I dual boot with Windows 8.1 and it runs pretty stable.
The symptoms on Linux are very similar to what you experience. Doesn't respond to reset key combinations, the actual reset button doesn't work at times. Restarts don't take much time though.
The non-accelerated drivers would do software rendering right? As crappy as it is, I figured that the Intel GPU is better than software rendering. Maybe I should try pulling in fglrx or amdgpu-pro _________________
Ant P. wrote: | The enterprise distros sell their binaries. Canonical sells their users. |
Also... Be ignorant... Be happy! |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54220 Location: 56N 3W
|
Posted: Thu Feb 15, 2018 7:27 pm Post subject: |
|
|
Amity88,
Yes - there would be no acceleration at all. I had in mind vesa or fbdev.
The GPU does nothing and the CPU does all the drawing. Performance will be terrible.
I've gone back to my 9 year old nVidia card meanwhile as I need to address Meltdown/Spectre eveywhere and having random lock ups doesn't help. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
thumper Guru
Joined: 06 Dec 2002 Posts: 552 Location: Venice FL
|
Posted: Tue Feb 20, 2018 2:03 am Post subject: |
|
|
Have you checked your logs for kernel crash dumps?
I had these:
Code: | amdgpu 0000:24:00.0: swiotlb buffer is full (sz: 2097152 bytes)
swiotlb: coherent allocation failed for device 0000:24:00.0 size=2097152
CPU: 0 PID: 5149 Comm: Compositor Tainted: G OE 4.15.3-gentoo #1 |
And it would eventually hard lock the machine.
After some research I added this to my kernel command line:
Did that last week, have not crashed since, still time will tell. Could be a coincidence.
George |
|
Back to top |
|
|
PrSo Tux's lil' helper
Joined: 01 Jun 2017 Posts: 136
|
Posted: Tue Feb 20, 2018 9:20 am Post subject: |
|
|
thumper,
those messages in log are totally harmless and shouldn't be the reason of hard locking, please see this bug report, and this patch on LKML, so this _is_ a coincidence, although this could be a symptom.
Amity88,
is there any special reason that you are on 4.9.76-gentoo-r1 kernel? |
|
Back to top |
|
|
Amity88 Apprentice
Joined: 03 Jul 2010 Posts: 260 Location: Third planet from the Sun
|
Posted: Tue Feb 20, 2018 2:40 pm Post subject: |
|
|
PrSo wrote: |
Amity88,
is there any special reason that you are on 4.9.76-gentoo-r1 kernel? |
I just use this version because it was the latest stable kernel. Do you feel that a newer kernel would fix the problem? _________________
Ant P. wrote: | The enterprise distros sell their binaries. Canonical sells their users. |
Also... Be ignorant... Be happy! |
|
Back to top |
|
|
Mimamau Apprentice
Joined: 11 Jun 2002 Posts: 160 Location: Germany
|
Posted: Tue Feb 20, 2018 2:48 pm Post subject: |
|
|
As in my other thread, there seems to be problems with southern islands gpus.
I only get a slow 2d desktop, everything else gives me a blank screen or crashes the system completely.
Even the amdgpu-pro drivers don't work on supported distributions. AMD support wrote:
"I apologize for the delay. I was waiting for feedback from the subject matter experts.
Unfortunately, it appears the HD7870 series has not been qualified with our latest drivers.
The recommendation is to use the inbox drivers or an open source driver, available here: https://www.x.org/wiki/RadeonFeature/#index10h2
If you experience issues with the open source drivers, please file a report at the link above. I have been informed that our engineers monitor and investigate reports listed there.
In order to update this service request, please respond, leaving the service request reference intact.
Best regards,
AMD Global Customer Care" |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54220 Location: 56N 3W
|
Posted: Tue Feb 20, 2018 2:58 pm Post subject: |
|
|
Amity88,
There is a new amdgpu driver is the 4.15 kernel.
Its worth a try. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Tue Feb 20, 2018 3:22 pm Post subject: |
|
|
Amity88 wrote: | I just use this version because it was the latest stable kernel. Do you feel that a newer kernel would fix the problem? |
4.9.82 is in the tree.
I have problems with 4.4.x and 4.9.x with motherboard module nct6775 failing to load. No problem with 4.14.x Trying 'meld' on the relevant kernel source, I see that 4.4 and 4.9 are identical but 4.4 has tables with an extra entry. Undoubtedly that line supports my mobo which is a new AM4 mobo.
NeddySeagoon wrote: | Amity88,
There is a new amdgpu driver is the 4.15 kernel.
Its worth a try. | Based on Neddy's input, I would try 4.14 or 4.15 (has some Spectre mitigation) or, depending on your comfort level, try backporting the driver to 4.9.
I think I'll try that, just for fun.
EDIT backporting the driver worked fine. Couldn't find where in kernel.org to file a bug. I may just file a bug against gentoo-sources
Last edited by Tony0945 on Tue Aug 14, 2018 12:52 am; edited 3 times in total |
|
Back to top |
|
|
PrSo Tux's lil' helper
Joined: 01 Jun 2017 Posts: 136
|
Posted: Tue Feb 20, 2018 4:12 pm Post subject: |
|
|
Amity88 wrote: |
Do you feel that a newer kernel would fix the problem? |
Just like Neddy sad, you should try 4.15.4. (change to ~amd64 or unmask gentoo-sources)
There is a big improvement with amdgpu driver, and a new AMD DC (but I am not sure if your card family -Oland- is supported, BTW SI=GCN 1.0)
I have one machine with GCN 1.1 (R4 APU - CIK) and this is the first mainline kernel (4.15) when things on amdgpu driver works quite good (it is still _experimental_ for SI and CIK tough).
One more thing, do you have dual gpu enabled, Intel and AMD?
Code: | 00:02.0 Display controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) |
|
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54220 Location: 56N 3W
|
Posted: Tue Feb 20, 2018 4:57 pm Post subject: |
|
|
Having done the Spectre updates, I've gone back to my RX450 card.
As the 4.15 kernel didn't fix my lockups, I'm trying 4.16-rc1
Watch this space. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3339 Location: Rasi, Finland
|
Posted: Tue Feb 20, 2018 5:38 pm Post subject: |
|
|
NeddySeagoon wrote: | Watch this space. | I have stalled all kernel and amdgpu updates. Now waiting eagerly.
I really don't want my server to lock up. I have exactly one spare GPU and it is AMD HD 7850. I think it's affected too. And I think the current one on my server is too: Cape Verde PRO R7 250E
My desktop has Fiji Based R9 Nano... I think I'm safe there... _________________ ..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
gcyoung Apprentice
Joined: 04 Jul 2007 Posts: 170 Location: England
|
Posted: Mon Aug 13, 2018 9:32 pm Post subject: Amdgpu Radeon R7-240 card and ryzen3 processor |
|
|
I am also getting intermittent screen and wireless keyboard freezes. While it works, the amdgpu module seems better than the radeonsi. I don't know if it is connected, but my login dmesg output contains a message [[Firmware Bug:] ACPI MWAIT C-state 0x0 not supported by hw].
I note that the Arch web site also contains referenced to problems with the combination of amdgpu and ryzen processor.
I have ssh'd (without X) into the computer from another machine, and find it is still responding normally to commands.
It's a pity, since I like the result before it freezes!
PS: I am using kernel-4.17.6 which is not listed as stable, but I found the same problem with an earlier stable kernel |
|
Back to top |
|
|
Goverp Veteran
Joined: 07 Mar 2007 Posts: 1998
|
Posted: Tue Aug 14, 2018 8:39 am Post subject: |
|
|
This is probably no help, but I'm running an hp laptop with /proc/cpuinfo model name: "AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+3G". It's a STONEY graphics thingy.
It also has an rtl8723de modem, which meant I need a very later kernel (and an external module), so I've been running kernel 4.16 originally, 4.17.1 now.
Never had any problems like described in this thread, nor any issues from using a late kernel.
AFAIK (I read Phoronix summaries) AMDGPU support features regularly in the kernel change logs.
I currently have:
Code: | /etc/portage/make.conf
VIDEO_CARDS="amdgpu radeonsi" | and, to reduce kernel churn:
Code: | /etc/portage/package.keywords
<=sys-kernel/gentoo-sources-4.17.1 ~amd64
|
I read today that 4.18 has more AMDGPU stuff. _________________ Greybeard |
|
Back to top |
|
|
gcyoung Apprentice
Joined: 04 Jul 2007 Posts: 170 Location: England
|
Posted: Wed Aug 15, 2018 8:13 pm Post subject: Amdgpu Freezes |
|
|
It may be of interest to others with this problem to know that since my last posting I have followed a suggestion given under the heading dpm' on https://wiki.gentoo.org/wiki/AMDGPU#Hardware_detection.
[echo performance > /sys/class/drm/card0/device/power_dpm_state]
and:-
[echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level]
Since making these settings I have had no further "freezes", except when I made only the first setting. Since making the settings I have used the computer, including one five hour mythtv frontend performance, for about twenty hours. Previously, I failed regularly to complete a fairly standard viewing of a film --say about two hours, without needing a reboot.
Unfortunately the settings disappear when I log out, although I suppose I can write a small script to run these settings on login. If there is any way to include the settings as options to the module, I'd be glad to hear of it:-- or possibly there might be kernel setting which would do the trick.
If I don't return with a message that I've had another "freezup", then it can be assumed that these settings have solved, at least my difficulty, although it might not work in other cases. |
|
Back to top |
|
|
Amity88 Apprentice
Joined: 03 Jul 2010 Posts: 260 Location: Third planet from the Sun
|
Posted: Sat Oct 09, 2021 5:34 pm Post subject: |
|
|
@gcyoung,
You have solved it I think! This is the same solution that worked for me as well. I came back here to updated it. Basically we need to disable the dynamic power management (dpm) of this gpu. _________________
Ant P. wrote: | The enterprise distros sell their binaries. Canonical sells their users. |
Also... Be ignorant... Be happy! |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3339 Location: Rasi, Finland
|
Posted: Sun Oct 10, 2021 8:00 pm Post subject: |
|
|
Thanks. Gotta poke those settings too.
Sadly, it looks like power consumption will increase. :| _________________ ..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
|