Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVEDIs there a way to get debug messages from the kernel?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Shodan
n00b
n00b


Joined: 18 Apr 2003
Posts: 24
Location: Milan, Italy

PostPosted: Thu May 16, 2024 3:17 pm    Post subject: [SOLVEDIs there a way to get debug messages from the kernel? Reply with quote

Hi
I'm trying to upgrade my kernel from 6.1.74 to 6.6.30 on my XPS 15 9530 but it just hangs.
I tried adding earlycon=efifb to my boot parameters, but I just watch the kernel reaching the step where it disables the bootconsole and enables the standard one. The machine, at this poing, is completely frozen and I have to do a hard reset.

A genkernel made kernel works, but not the one I configure using the previous config.

Since I don't have a serial port, just 3 usb-c ports and no converter handy, how can I have the kernel dump some info about where it hangs?

Thank you.


Last edited by Shodan on Thu May 23, 2024 2:10 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54413
Location: 56N 3W

PostPosted: Thu May 16, 2024 4:04 pm    Post subject: Reply with quote

Shodan,

The behaviour you describe is often accounted for by the kernel switching from a working console drive to a broken one.
It gives the appearance of hanging but everything may be working except the console.

It's worth trying to log in over ssh.

Its also possible to do some analysis.

Please post the output of
Code:
lspci -nnk
so we can see your hardware and put your broken kernel .config file onto a pastebin site.
wgetpaste can help there.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
logrusx
Veteran
Veteran


Joined: 22 Feb 2018
Posts: 1610

PostPosted: Thu May 16, 2024 4:23 pm    Post subject: Re: Is there a way to get debug messages from the kernel? Reply with quote

Shodan wrote:
Hi
I'm trying to upgrade my kernel from 6.1.74 to 6.6.30 on my XPS 15 9530 but it just hangs.
I tried adding earlycon=efifb to my boot parameters, but I just watch the kernel reaching the step where it disables the bootconsole and enables the standard one. The machine, at this poing, is completely frozen and I have to do a hard reset.

A genkernel made kernel works, but not the one I configure using the previous config.

Since I don't have a serial port, just 3 usb-c ports and no converter handy, how can I have the kernel dump some info about where it hangs?

Thank you.


Try pressing Alt+PrtScr+R prior to trying to switch to a console terminal. If it doesn't work, check if Caps/Num Lock trigger the leds. If they respond, the system is alive and you can ssh into it.

Aside from that, how did you produce new kernel config?

Share both, the old and the new one.

Best Regards,
Georgi
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3180

PostPosted: Thu May 16, 2024 6:21 pm    Post subject: Reply with quote

You can try netconsole to send output over UDP
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
Shodan
n00b
n00b


Joined: 18 Apr 2003
Posts: 24
Location: Milan, Italy

PostPosted: Fri May 17, 2024 8:21 am    Post subject: Reply with quote

Hello,
thank you guys for your help

here's lspci out:
Code:
0000:00:00.0 Host bridge [0600]: Intel Corporation Device [8086:a706]
   Subsystem: Dell Device [1028:0beb]
0000:00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:a70d]
   Subsystem: Dell Device [1028:0beb]
   Kernel driver in use: pcieport
0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Raptor Lake-P [Iris Xe Graphics] [8086:a7a0] (rev 04)
   Subsystem: Dell Raptor Lake-P [Iris Xe Graphics] [1028:0beb]
   Kernel driver in use: i915
0000:00:04.0 Signal processing controller [1180]: Intel Corporation Raptor Lake Dynamic Platform and Thermal Framework Processor Participant [8086:a71d]
   Subsystem: Dell Raptor Lake Dynamic Platform and Thermal Framework Processor Participant [1028:0beb]
   Kernel driver in use: proc_thermal_pci
   Kernel modules: processor_thermal_device_pci
0000:00:06.0 System peripheral [0880]: Intel Corporation RST VMD Managed Controller [8086:09ab]
0000:00:07.0 PCI bridge [0604]: Intel Corporation Raptor Lake-P Thunderbolt 4 PCI Express Root Port #0 [8086:a76e]
   Subsystem: Dell Raptor Lake-P Thunderbolt 4 PCI Express Root Port [1028:0beb]
   Kernel driver in use: pcieport
0000:00:07.1 PCI bridge [0604]: Intel Corporation Device [8086:a73f]
   Subsystem: Dell Device [1028:0beb]
   Kernel driver in use: pcieport
0000:00:08.0 System peripheral [0880]: Intel Corporation GNA Scoring Accelerator module [8086:a74f]
   Subsystem: Dell GNA Scoring Accelerator module [1028:0beb]
0000:00:0a.0 Signal processing controller [1180]: Intel Corporation Raptor Lake Crashlog and Telemetry [8086:a77d] (rev 01)
   Subsystem: Dell Raptor Lake Crashlog and Telemetry [1028:0beb]
0000:00:0d.0 USB controller [0c03]: Intel Corporation Raptor Lake-P Thunderbolt 4 USB Controller [8086:a71e]
   Subsystem: Dell Raptor Lake-P Thunderbolt 4 USB Controller [1028:0beb]
   Kernel driver in use: xhci_hcd
   Kernel modules: xhci_pci
0000:00:0d.2 USB controller [0c03]: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #0 [8086:a73e]
   Subsystem: Dell Raptor Lake-P Thunderbolt 4 NHI [1028:0beb]
0000:00:0e.0 RAID bus controller [0104]: Intel Corporation Volume Management Device NVMe RAID Controller Intel Corporation [8086:a77f]
   Subsystem: Dell Volume Management Device NVMe RAID Controller Intel Corporation [1028:0beb]
   Kernel driver in use: vmd
   Kernel modules: ahci
0000:00:12.0 Serial controller [0700]: Intel Corporation Alder Lake-P Integrated Sensor Hub [8086:51fc] (rev 01)
   Subsystem: Dell Alder Lake-P Integrated Sensor Hub [1028:0beb]
   Kernel driver in use: intel_ish_ipc
   Kernel modules: intel_ish_ipc
0000:00:12.6 Serial bus controller [0c80]: Intel Corporation Device [8086:51fb] (rev 01)
   Subsystem: Dell Device [1028:0beb]
   Kernel driver in use: intel-lpss
   Kernel modules: intel_lpss_pci
0000:00:14.0 USB controller [0c03]: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller [8086:51ed] (rev 01)
   Subsystem: Dell Alder Lake PCH USB 3.2 xHCI Host Controller [1028:0beb]
   Kernel driver in use: xhci_hcd
   Kernel modules: xhci_pci
0000:00:14.2 RAM memory [0500]: Intel Corporation Alder Lake PCH Shared SRAM [8086:51ef] (rev 01)
   Subsystem: Dell Alder Lake PCH Shared SRAM [1028:0beb]
0000:00:14.3 Network controller [0280]: Intel Corporation Raptor Lake PCH CNVi WiFi [8086:51f1] (rev 01)
   Subsystem: Intel Corporation Raptor Lake PCH CNVi WiFi [8086:4090]
   Kernel driver in use: iwlwifi
   Kernel modules: iwlwifi
0000:00:15.0 Serial bus controller [0c80]: Intel Corporation Alder Lake PCH Serial IO I2C Controller #0 [8086:51e8] (rev 01)
   Subsystem: Dell Alder Lake PCH Serial IO I2C Controller [1028:0beb]
   Kernel driver in use: intel-lpss
   Kernel modules: intel_lpss_pci
0000:00:15.1 Serial bus controller [0c80]: Intel Corporation Alder Lake PCH Serial IO I2C Controller #1 [8086:51e9] (rev 01)
   Subsystem: Dell Alder Lake PCH Serial IO I2C Controller [1028:0beb]
   Kernel driver in use: intel-lpss
   Kernel modules: intel_lpss_pci
0000:00:16.0 Communication controller [0780]: Intel Corporation Alder Lake PCH HECI Controller [8086:51e0] (rev 01)
   Subsystem: Dell Alder Lake PCH HECI Controller [1028:0beb]
   Kernel driver in use: mei_me
0000:00:1c.0 PCI bridge [0604]: Intel Corporation Alder Lake-P PCH PCIe Root Port #4 [8086:51bb] (rev 01)
   Subsystem: Dell Alder Lake-P PCH PCIe Root Port [1028:0beb]
   Kernel driver in use: pcieport
0000:00:1f.0 ISA bridge [0601]: Intel Corporation Raptor Lake LPC/eSPI Controller [8086:519d] (rev 01)
   Subsystem: Dell Raptor Lake LPC/eSPI Controller [1028:0beb]
0000:00:1f.3 Multimedia audio controller [0401]: Intel Corporation Raptor Lake-P/U/H cAVS [8086:51ca] (rev 01)
   Subsystem: Dell Raptor Lake-P/U/H cAVS [1028:0beb]
   Kernel driver in use: sof-audio-pci-intel-tgl
   Kernel modules: snd_hda_intel, snd_sof_pci_intel_tgl
0000:00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake PCH-P SMBus Host Controller [8086:51a3] (rev 01)
   Subsystem: Dell Alder Lake PCH-P SMBus Host Controller [1028:0beb]
   Kernel driver in use: i801_smbus
   Kernel modules: i2c_i801
0000:00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-P PCH SPI Controller [8086:51a4] (rev 01)
   Subsystem: Dell Alder Lake-P PCH SPI Controller [1028:0beb]
0000:01:00.0 3D controller [0302]: NVIDIA Corporation AD106M [GeForce RTX 4070 Max-Q / Mobile] [10de:2820] (rev a1)
   Subsystem: Dell AD106M [GeForce RTX 4070 Max-Q / Mobile] [1028:0beb]
   Kernel driver in use: nvidia
   Kernel modules: nvidia_drm, nvidia
0000:a4:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5260 PCI Express Card Reader [10ec:5260] (rev 01)
   Subsystem: Dell RTS5260 PCI Express Card Reader [1028:0beb]
   Kernel driver in use: rtsx_pci
   Kernel modules: rtsx_pci
10000:e0:06.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCIe 4.0 Graphics Port [8086:a74d]
   Subsystem: Dell Raptor Lake PCIe 4.0 Graphics Port [1028:0beb]
   Kernel driver in use: pcieport
10000:e1:00.0 Non-Volatile memory controller [0108]: KIOXIA Corporation NVMe SSD Controller XG8 [1e0f:0010] (rev 01)
   Subsystem: KIOXIA Corporation NVMe SSD Controller XG8 [1e0f:0001]
   Kernel driver in use: nvme


this is my 6.6.30 .config
https://paste.gentoo.zip/98anEnzu
and this is the 6.1.74 one. This works
https://paste.gentoo.zip/RNKm31MJ

I just used menuconfig as always and compiled it.
To try to solve this problem, this time, I didn't copy the previous .config and edited, but created a new one replicating the options I need.

I tried reaching the host with ssh but it looks just dead. Shouldn't X start anyways?

I can try to compile in the support for an usb network card or a serial adapter and try to use it to send debug data.

Thank you!
Back to top
View user's profile Send private message
logrusx
Veteran
Veteran


Joined: 22 Feb 2018
Posts: 1610

PostPosted: Fri May 17, 2024 9:31 am    Post subject: Reply with quote

Shodan wrote:

I just used menuconfig as always and compiled it.
To try to solve this problem, this time, I didn't copy the previous .config and edited, but created a new one replicating the options I need.

I tried reaching the host with ssh but it looks just dead. Shouldn't X start anyways?

I can try to compile in the support for an usb network card or a serial adapter and try to use it to send debug data.

Thank you!


This is not the way to go. As you've found out, this is the way to a not working kernel or compilation failure.

Copy the working .config from the old kernel in the new kernel directory and run make oldconfig. Old options will be migrated, you'll be given a choice to change new ones. Don't forget you can always type ? to get help. If you want the default for the new options, use make olddefconfig. This will not ask you for anything and will produce a new config based on the old one with the new options set to their default value. I don't recommend it as it can include options and drivers you don't need or don't want, but it's a viable option.

Don't forget to rebuild out of kernel modules like nvidia-drivers, acpi_call, ryzen_smu et.c. you know which ones you use. You should either supply KERNEL_DIR at command line when doing that or eselect the new kernel, so they are compiled against it and not the old one.

Best Regards,
Georgi
Back to top
View user's profile Send private message
Shodan
n00b
n00b


Joined: 18 Apr 2003
Posts: 24
Location: Milan, Italy

PostPosted: Fri May 17, 2024 12:24 pm    Post subject: Reply with quote

logrusx wrote:
Shodan wrote:

I just used menuconfig as always and compiled it.
To try to solve this problem, this time, I didn't copy the previous .config and edited, but created a new one replicating the options I need.

I tried reaching the host with ssh but it looks just dead. Shouldn't X start anyways?

I can try to compile in the support for an usb network card or a serial adapter and try to use it to send debug data.

Thank you!


This is not the way to go. As you've found out, this is the way to a not working kernel or compilation failure.

Copy the working .config from the old kernel in the new kernel directory and run make oldconfig. Old options will be migrated, you'll be given a choice to change new ones. Don't forget you can always type ? to get help. If you want the default for the new options, use make olddefconfig. This will not ask you for anything and will produce a new config based on the old one with the new options set to their default value. I don't recommend it as it can include options and drivers you don't need or don't want, but it's a viable option.

Don't forget to rebuild out of kernel modules like nvidia-drivers, acpi_call, ryzen_smu et.c. you know which ones you use. You should either supply KERNEL_DIR at command line when doing that or eselect the new kernel, so they are compiled against it and not the old one.

Best Regards,
Georgi


Hi logrusx

I forgot to add that I tried that too, both olddefconfig and oldconfig selecting stuff I thought could be useful, but no luck there.

Apparently the system hangs before the network card, as there's no output there.

Thank you
Back to top
View user's profile Send private message
logrusx
Veteran
Veteran


Joined: 22 Feb 2018
Posts: 1610

PostPosted: Fri May 17, 2024 1:20 pm    Post subject: Reply with quote

Try enabling CONFIG_SYSFB_SIMPLEFB. It's also a good idea to have a fallback if efifb fails, that's why it's a good idea to have CONFIG_FB_VESA enabled.

Depending on your hardware CONFIG_FB_SIMPLE might be necessary. Try different combinations.

You can also use the config from gentoo-kernel and strip it down. I think if you install gentoo-kernel-bin you'll get that config as well in /boot.

Best Regards,
Georgi
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4392
Location: Bavaria

PostPosted: Fri May 17, 2024 3:42 pm    Post subject: Reply with quote

Shodan,

you have a (invisible) kernel Panic ... because you have:
Code:
0000:00:0e.0 RAID bus controller [0104]: Intel Corporation Volume Management Device NVMe RAID Controller Intel Corporation [8086:a77f]
   Subsystem: Dell Volume Management Device NVMe RAID Controller Intel Corporation [1028:0beb]
   Kernel driver in use: vmd
   Kernel modules: ahci

=> Kernel driver in use: vmd
and you have in your new .config:
Code:
#
# PCI controller drivers
#
# CONFIG_VMD is not set

(I have also a Raptor Lake system and this module; another choice is to disable it in your UEFI-BIOS ... IF you dont use the RAID functions; but best is to enable the necessary module in the kernel)

This is also described here: https://wiki.gentoo.org/wiki/User:Pietinger/Tutorials/Manual_Configuring_Kernel_Version_6.6#Part_3_-_Must_Haves

Read this chapter twice because you have also: CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y

(Maybe read the whole article ... and the beginning is here: https://wiki.gentoo.org/wiki/User:Pietinger/Tutorials/Manual_kernel_configuration )


P.S.: I never recommend "make olddefconfig" because you can miss some important new functions ... do always "make oldconfig" and check every new option by yourself ;-)
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4392
Location: Bavaria

PostPosted: Fri May 17, 2024 4:02 pm    Post subject: Reply with quote

P.P.S.:

Oh no ... a second look to your .config showed me that you dont use a initramfs:
Code:
# CONFIG_BLK_DEV_INITRD is not set

This is no problem I do also use no initramfs ... BUT ... in this case you must configure every module kernel needs to access hardisk+root parition STATICALLY into the kernel and NOT as mdoule:
Code:
CONFIG_BLK_DEV_SD=m
CONFIG_SCSI=m
CONFIG_SATA_AHCI=m
...and maybe more

Maybe start a new kernel configuration: https://wiki.gentoo.org/wiki/User:Pietinger/Tutorials/Manual_kernel_configuration#Starting_with_a_clean_environment
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Shodan
n00b
n00b


Joined: 18 Apr 2003
Posts: 24
Location: Milan, Italy

PostPosted: Mon May 20, 2024 9:54 am    Post subject: Reply with quote

Hi
I generally don't use an initramfs and compile all the hardware I need at boot directly in the kernel.

Tried to add anything I thought might help, but no luck.

Genkernel makes a bootable kernel, but it lacks some supports I need and I had them working before.
I'm trying to figure out what support it pulls in to have it work.

Thank you!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54413
Location: 56N 3W

PostPosted: Mon May 20, 2024 10:14 am    Post subject: Reply with quote

Shodan,

What pietinger said.

Your Video set up looks OK.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Shodan
n00b
n00b


Joined: 18 Apr 2003
Posts: 24
Location: Milan, Italy

PostPosted: Thu May 23, 2024 2:08 pm    Post subject: Reply with quote

It was the graphic card: https://postimg.cc/7fwT1K1x

recompiled i915 as a module the system booted regularly

as an answer to the original question about debugging the kernel boot process I added this as a boot command line:
Code:
initcall_debug ignore_loglevel efi=debug earlycon=efifb keep_bootcon


maybe it's overkill since it took a few minutes to reach the kernel panic, but I was able to get the issue.

Now it's working fine

Thank you all!
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4392
Location: Bavaria

PostPosted: Thu May 23, 2024 2:40 pm    Post subject: Reply with quote

Shodan wrote:
Now it's working fine

Thank you all!

Thank you also for your report ! :D

I have just added a new chapter in my wiki article:
https://wiki.gentoo.org/wiki/User:Pietinger/Tutorials/Kernel_Commandline_Parameter#Parameter:_earlycon.3Defifb_and_others
;-)
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum