Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Resume failed after loading hibernation image successfully
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
GarryElec
n00b
n00b


Joined: 26 Sep 2020
Posts: 8

PostPosted: Fri Dec 30, 2022 10:38 am    Post subject: Resume failed after loading hibernation image successfully Reply with quote

Hello,

Hibernate (suspend to disk) and resume worked fine with my desktop box until kernel 5.15.41 included.
With later kernels 5.15.52 and 5.15.59 after some days (~3-15 times hibernate) instead of resuming from the image the box booted from scratch.
The same now with kernel 5.15.80 after 7 and 3 hibernates. Between these switches of the kernels I didn't change the kernel config manually.
After the failed resumes dmesg always shows that the image got loaded:

Code:
PM: Image signature found, resuming
[...]
PM: Image successfully loaded
[...]
PM: hibernation: Failed to load image, recovering.


Code:
[    0.000000] Linux version 5.15.52-gentoo-x86_64 [...]
[...]
[    4.361113] PM: Image signature found, resuming
[...]
[   12.128598] PM: Image successfully loaded
[   12.601521] serial 00:05: disabled
[   12.740636] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x20 [usbcore] returns -16
[   12.740653] PM: dpm_run_callback(): pci_pm_freeze+0x0/0xb0 returns -16
[   12.740661] xhci_hcd 0000:28:00.3: PM: failed to quiesce async: error -16
[   12.861496] xhci_hcd 0000:03:00.0: xHC error in resume, USBSTS 0x401, Reinit
[   12.861507] usb usb1: root hub lost power or was reset
[   12.861509] usb usb2: root hub lost power or was reset
[   12.862311] serial 00:05: activated
[   12.937106] nvme nvme0: 16/0/0 default/read/poll queues
[   13.195984] ata6: SATA link down (SStatus 0 SControl 330)
[   13.196016] ata1: SATA link down (SStatus 0 SControl 300)
[   13.196046] ata5: SATA link down (SStatus 0 SControl 330)
[   13.361075] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   13.373719] ata2.00: configured for UDMA/100
[   13.401879] usb 1-7: reset low-speed USB device number 2 using xhci_hcd
[   13.868660] PM: hibernation: Failed to load image, recovering.
[   13.940093] PM: hibernation: Basic memory bitmaps freed
[   13.940096] OOM killer enabled.
[   13.940096] Restarting tasks ... done.
[   13.940230] PM: hibernation: resume failed (-16)


Code:
[    0.000000] Linux version 5.15.59-gentoo-x86_64 [...]
[...]
[    4.461255] PM: Image signature found, resuming
[...]
[   21.589748] PM: Image successfully loaded
[   22.112286] serial 00:05: disabled
[   22.271263] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x20 [usbcore] returns -16
[   22.271275] PM: dpm_run_callback(): pci_pm_freeze+0x0/0xb0 returns -16
[   22.271282] xhci_hcd 0000:28:00.3: PM: failed to quiesce async: error -16
[   22.392289] xhci_hcd 0000:03:00.0: xHC error in resume, USBSTS 0x401, Reinit
[   22.392296] usb usb1: root hub lost power or was reset
[   22.392299] usb usb2: root hub lost power or was reset
[   22.393350] serial 00:05: activated
[   22.467883] nvme nvme0: 16/0/0 default/read/poll queues
[   22.725823] ata6: SATA link down (SStatus 0 SControl 330)
[   22.725857] ata1: SATA link down (SStatus 0 SControl 300)
[   22.725891] ata5: SATA link down (SStatus 0 SControl 330)
[   22.891923] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   22.905880] ata2.00: configured for UDMA/100
[   22.932727] usb 1-7: reset low-speed USB device number 2 using xhci_hcd
[   23.400333] PM: hibernation: Failed to load image, recovering.
[   23.575557] PM: hibernation: Basic memory bitmaps freed
[   23.575560] OOM killer enabled.
[   23.575561] Restarting tasks ... done.
[   23.575683] PM: hibernation: resume failed (-16)


Code:
[    0.000000] Linux version 5.15.80-gentoo-x86_64 [...]
[...]
[    4.261362] PM: Image signature found, resuming
[...]
[   18.773681] PM: Image successfully loaded
[   19.391908] serial 00:05: disabled
[   19.431359] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x20 [usbcore] returns -16
[   19.431377] PM: dpm_run_callback(): pci_pm_freeze+0x0/0xb0 returns -16
[   19.431389] xhci_hcd 0000:28:00.3: PM: failed to quiesce async: error -16
[   19.651895] xhci_hcd 0000:03:00.0: xHC error in resume, USBSTS 0x401, Reinit
[   19.651903] usb usb1: root hub lost power or was reset
[   19.651905] usb usb2: root hub lost power or was reset
[   19.652864] serial 00:05: activated
[   19.728240] nvme nvme0: 16/0/0 default/read/poll queues
[   19.985995] ata6: SATA link down (SStatus 0 SControl 330)
[   19.986030] ata5: SATA link down (SStatus 0 SControl 330)
[   19.986063] ata1: SATA link down (SStatus 0 SControl 300)
[   20.192209] usb 1-10: reset low-speed USB device number 2 using xhci_hcd
[   20.658805] PM: hibernation: Failed to load image, recovering.
[   20.839565] PM: hibernation: Basic memory bitmaps freed
[   20.839567] OOM killer enabled.
[   20.839568] Restarting tasks ... done.
[   20.839677] PM: hibernation: resume failed (-16)


My hibernate config:
Code:
$ egrep -v '^(#|$)' /etc/hibernate/hibernate.conf
TryMethod disk.conf

$ egrep -v '^(#|$)' /etc/hibernate/disk.conf
TryMethod sysfs-disk.conf

$ egrep -v '^(#|$)' /etc/hibernate/sysfs-disk.conf
UseSysfsPowerState disk
Include common.conf

$ egrep -v '^(#|$)' /etc/hibernate/common.conf
Verbosity 1
LogFile /var/log/hibernate.log
LogVerbosity 4
LogTimestamp yes
Distribution gentoo
XDisplay :0
SaveClock yes
FullSpeedCPU yes
LockXScreenSaver yes
UnloadModules xhci_pci xhci_hcd usbhid usb_storage usbcore
UnloadBlacklistedModules yes
LoadModules auto
DownInterfaces auto
UpInterfaces auto
RestartServices apache2
SwitchToTextMode yes


I have additionally switched off the 2 USB devices 0000:28:00.3 and 0000:03:00.0 before calling hibernate:
Code:
egrep -q '^XHC0.*disabled' /proc/acpi/wakeup && echo XHC0 > /proc/acpi/wakeup
egrep -q '^PTXH.*disabled' /proc/acpi/wakeup && echo PTXH > /proc/acpi/wakeup


Why does the hibernation image fail to load correctly sometimes with kernels >5.15.41?

What can I do avoid it so I can run a newer kernel and continue using hibernate/resume?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21635

PostPosted: Fri Dec 30, 2022 5:51 pm    Post subject: Reply with quote

I spent some time reading the relevant kernel source, based on the strings you showed. In the kernel source I used (v5.15.85), the line numbers may be different, but the logic is likely the same. Your string appears in one place:
kernel/power/hibernate.c:698:
   pr_err("Failed to load image, recovering.\n");
Hibernation is special in that partway through a successful restore, the newly booted kernel will stop executing and transfer control to the restored kernel. As a result, hibernation_restore returns only on failure.
kernel/power/hibernate.c:526:
 * This routine must be called with system_transition_mutex held.  If it is
 * successful, control reappears in the restored target kernel in
 * hibernation_snapshot().
Therefore, we know that it must have been skipped or failed outright in the failure case. Assuming it was executed and failed, you should get a message from one of its helper routines, such as this line in resume_target_kernel:
kernel/power/hibernate.c:461:
      pr_err("Some devices failed to power down, aborting resume\n");
You could try using a newer kernel to see if the problem is fixed upstream. You could try bisecting in the v5.15.x line to find the specific commit that makes this unreliable, then report upstream that it causes problems. You could examine the helper routines to try to find why hcd_pci_suspend is returning EBUSY (16). Does resume still fail if you do not disable these devices before hibernating?

A read of the git logs for the relevant kernel versions turns up many references to usb / hcd, so I doubt we could readily guess the guilty commit.
Back to top
View user's profile Send private message
GarryElec
n00b
n00b


Joined: 26 Sep 2020
Posts: 8

PostPosted: Sat Dec 31, 2022 5:55 am    Post subject: Reply with quote

Quote:
Therefore, we know that it must have been skipped or failed outright in the failure case. Assuming it was executed and failed, you should get a message from one of its helper routines, such as this line in resume_target_kernel:
kernel/power/hibernate.c:461:
Code:
pr_err("Some devices failed to power down, aborting resume\n");


I didn't get this message in dmesg or /var/log/*

Quote:
You could try using a newer kernel to see if the problem is fixed upstream.

I'm giving 5.15.85 a try now.

Quote:
Does resume still fail if you do not disable these devices before hibernating?


With 5.15.80 I haven't had enough hibernates yet. But I added

Code:
/etc/hibernate/common.conf:
UnloadModules xhci_pci xhci_hcd usbhid usb_storage usbcore

and
Code:
egrep -q '^XHC0.*disabled' /proc/acpi/wakeup && echo XHC0 > /proc/acpi/wakeup
egrep -q '^PTXH.*disabled' /proc/acpi/wakeup && echo PTXH > /proc/acpi/wakeup

after 5.15.52 failed to resume. So 5.15.52 I have tested with disabling these devices and without.

I have commented out these lines now and will see if 5.15.85 resumes correctly permanently.
Back to top
View user's profile Send private message
GarryElec
n00b
n00b


Joined: 26 Sep 2020
Posts: 8

PostPosted: Tue Jan 03, 2023 12:01 am    Post subject: Reply with quote

5.15.85 doesn't help. It failed to resume with the same messages in dmesg.

Quote:
You could try bisecting in the v5.15.x line to find the specific commit that makes this unreliable, then report upstream that it causes problems.

How can I find the specific commit? I'm not familiar with git, but from 5.15.41 to 5.15.52 there is a change which persists in 5.15.85:

Code:
/usr/src $ diff -u linux-5.15.41-gentoo/drivers/usb/core/hcd-pci.c linux-5.15.85-gentoo/drivers/usb/core/hcd-pci.c
--- linux-5.15.41-gentoo/drivers/usb/core/hcd-pci.c     2021-10-31 21:53:10.000000000 +0100
+++ linux-5.15.85-gentoo/drivers/usb/core/hcd-pci.c     2022-12-31 06:39:29.062891247 +0100
@@ -616,10 +616,10 @@
        .suspend_noirq  = hcd_pci_suspend_noirq,
        .resume_noirq   = hcd_pci_resume_noirq,
        .resume         = hcd_pci_resume,
-       .freeze         = check_root_hub_suspended,
+       .freeze         = hcd_pci_suspend,
        .freeze_noirq   = check_root_hub_suspended,
        .thaw_noirq     = NULL,
-       .thaw           = NULL,
+       .thaw           = hcd_pci_resume,
        .poweroff       = hcd_pci_suspend,
        .poweroff_noirq = hcd_pci_suspend_noirq,
        .restore_noirq  = hcd_pci_resume_noirq,
/usr/src $ diff -u linux-5.15.41-gentoo/drivers/usb/core/hcd-pci.c linux-5.15.52-gentoo/drivers/usb/core/hcd-pci.c
--- linux-5.15.41-gentoo/drivers/usb/core/hcd-pci.c     2021-10-31 21:53:10.000000000 +0100
+++ linux-5.15.52-gentoo/drivers/usb/core/hcd-pci.c     2022-07-10 20:54:50.413343065 +0200
@@ -616,10 +616,10 @@
        .suspend_noirq  = hcd_pci_suspend_noirq,
        .resume_noirq   = hcd_pci_resume_noirq,
        .resume         = hcd_pci_resume,
-       .freeze         = check_root_hub_suspended,
+       .freeze         = hcd_pci_suspend,
        .freeze_noirq   = check_root_hub_suspended,
        .thaw_noirq     = NULL,
-       .thaw           = NULL,
+       .thaw           = hcd_pci_resume,
        .poweroff       = hcd_pci_suspend,
        .poweroff_noirq = hcd_pci_suspend_noirq,
        .restore_noirq  = hcd_pci_resume_noirq,


Last edited by GarryElec on Tue Jan 03, 2023 1:10 am; edited 1 time in total
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21635

PostPosted: Tue Jan 03, 2023 12:44 am    Post subject: Reply with quote

Please use unified diffs when showing changes. They provide context, which makes them easier to read.

When I suggested something newer, I was thinking a new major line, such as v6.1.x.

To find the specific commit, you would need to build multiple test kernels drawn from the range starting with the last known good and ending with the first known bad. You would test intervening kernels to see whether they are good or bad, and from that declare the commits present in them to be good or bad. After sufficient iterations, only one commit remains to be considered. The test on that commit will determine whether it is the newest good or the oldest bad. git bisect assists you in this process, allowing you to test roughly half the uncategorized commits with every step, so you only need log2(Ncommits) tests instead of Ncommits individual tests. It also keeps some history to help you track your results in a structured form, which is helpful if you cannot complete all the tests in one short sitting. The actual test process is still manual. You try a kernel, then tell git whether the result was good or bad. From that, it picks another commit halfway between the one just tested and the nearest one with the opposite state. You test that one, report its results, and so on.
Back to top
View user's profile Send private message
GarryElec
n00b
n00b


Joined: 26 Sep 2020
Posts: 8

PostPosted: Tue Jan 03, 2023 1:55 am    Post subject: Reply with quote

Hu wrote:
When I suggested something newer, I was thinking a new major line, such as v6.1.x.

I'm giving 6.1.2 a try.
Back to top
View user's profile Send private message
consumed_king
n00b
n00b


Joined: 05 Mar 2023
Posts: 1

PostPosted: Sun Mar 05, 2023 9:53 pm    Post subject: Reply with quote

Did You Fix this ? I'm facing this kind of error now
Back to top
View user's profile Send private message
GarryElec
n00b
n00b


Joined: 26 Sep 2020
Posts: 8

PostPosted: Mon Mar 06, 2023 6:09 am    Post subject: Reply with quote

consumed_king wrote:
Did You Fix this ? I'm facing this kind of error now


With 6.1.2 it always resumed normally. Now I'm running 6.1.12, also resuming normally. So Hu's hint with a new major line (v6.1.x) seems to have fixed it.
Back to top
View user's profile Send private message
dari46
n00b
n00b


Joined: 08 Dec 2023
Posts: 1

PostPosted: Fri Dec 08, 2023 11:36 am    Post subject: It's also because of the NVIDIA Driver Reply with quote

consumed_king wrote:
Did You Fix this ? I'm facing this kind of error now


Additionally to going to 6.1 Kernel, I also had to downgrade the NVIDIA Driver from 545 to 525, otherwise not even suspend to ram
would work. After downgrading to 525.147.05 both hibernation and suspend to ram work fine again.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum