Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Kernel Hangs at Loading
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
jyoung
Guru
Guru


Joined: 20 Mar 2007
Posts: 436

PostPosted: Sat Dec 23, 2017 1:48 pm    Post subject: Kernel Hangs at Loading Reply with quote

Hi Folks,

After upgrading from a 4.12.12 kernel, the new kernel (4.13.15, 4.14.8-r1) hangs while grub is loading it. All I get this message:
Loading Linux 4.13.15-gentoo ...

Grub is detecting it, since if I move or delete the kernel grub complains that it can't find it. I've tried numerous kernel options (make oldconfig from the working kernel's config, genkernel, etc.), but the result is the same. This problem is particularly hard to debug since the result is a non-working system. Some web searching turns up folk with similar problems, but usually grub complains with some kind of error message (can't find the kernel, can't load the kernel, etc.). Here's, it's just silent.

Any ideas?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21618

PostPosted: Sat Dec 23, 2017 5:23 pm    Post subject: Reply with quote

First, check that the kernel is supposed to print output: no command line options silenced it, that it is writing to the correct output device, etc.

Second, try to find the origin of the failure. Set aside your known-good 4.12.12 kernel. Clean your kernel build area and build a new 4.12.12 kernel from the same configuration, to rule out toolchain changes that may have broken something. If the newly built 4.12.12 also works, then we can assume a kernel source code change is your problem. If the old 4.12.12 works and the new one fails, we can assume there is a toolchain problem.

The rest of this post is written on the assumption it is a source change, not a toolchain change. If the toolchain is implicated, stop here and post back with your findings. Otherwise, read on.

Kernel 4.12.x went up to 4.12.14 before being retired. You could test later 4.12.x kernels in the hope that one of them is broken. If so, it will be comparatively easy to find the bad patch since only a few hundred commits are in question. If all 4.12.x work, and no 4.13.x work, then finding the bad commit is much more tedious. In either case, you will use git bisect to test intermediate kernels to find the first that fails to boot. If 4.12.14 is bad, you can probably find it in ~log2(78) steps. If 4.12.14 is good and 4.13 is bad, you may need ~log2(14150) steps.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54232
Location: 56N 3W

PostPosted: Sat Dec 23, 2017 6:37 pm    Post subject: Reply with quote

jyoung,

Code:
Loading Linux 4.13.15-gentoo ..
is the last message from grub. Look at your grub.cfg.

The first output from the kernel is
Code:
Decompressing Linux...

The kernel needs to have the right decompressor built in, or you don't get any messages and the kernel can't decompress itself.

The file /usr/src/linux/vmlinux is the uncompressed kernel.
Booting that may work but I never tried an uncompressed kernel on amd64.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jyoung
Guru
Guru


Joined: 20 Mar 2007
Posts: 436

PostPosted: Sun Dec 24, 2017 4:43 am    Post subject: Reply with quote

The clean 4.12.12 doesn't have this issue, and neither does the 4.12.14 kernel. I think this rules out a toolchain problem, and points toward a kernel source change between 4.12 and 4.13. To confirm, I'll attempt to compile an earlier 4.13.

I'm also interested in the idea of booting off an uncompressed kernel. When I simply copy vmlinux to /boot (along with the other kernels), grub-mkconfig doesn't seem to detected it. Should I modify grub.cfg for this experiment (despite the warning)?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54232
Location: 56N 3W

PostPosted: Sun Dec 24, 2017 10:07 am    Post subject: Reply with quote

jyoung,

Read the bottom of grub.cfg.

The warning is because manual edits will be removed when grub.cfg is regenerated.
Don't break any of your your working grub.cfg entries.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jyoung
Guru
Guru


Joined: 20 Mar 2007
Posts: 436

PostPosted: Sun Dec 24, 2017 8:35 pm    Post subject: Reply with quote

I guess I'd never looked at the bottom of grub.cfg! I've setup a manual config file in custom.cfg; grub now detects the uncompressed kernel, but I'm getting
"error: invalid magic number"

Below is the contents of custom.cfg. I've mostly copied the setup from the menu entries in grub.cfg


menuentry 'uncompressed' {
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root 39ec98c1-c234-4ee3-bb13-6d0d5f84b1be
else
search --no-floppy --fs-uuid --set=root 39ec98c1-c234-4ee3-bb13-6d0d5f84b1be
fi
echo "Loading linux..."
linux /boot/vmlinux root=/dev/nvme0n1p4 ro rootfstype=ext4
}

One the other front, the 4.13.5 kernel (which doesn't work) is the lowest 4.13 kernel available.
Back to top
View user's profile Send private message
jyoung
Guru
Guru


Joined: 20 Mar 2007
Posts: 436

PostPosted: Sun Dec 24, 2017 9:22 pm    Post subject: Reply with quote

I also get the magic number error if I try to boot off the compressed kernel using the custom menu option.
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 8291
Location: Saint Amant, Acadiana

PostPosted: Sun Dec 24, 2017 9:52 pm    Post subject: Reply with quote

Did you run make clean before you ran make? Maybe you should.
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
ipic
Guru
Guru


Joined: 29 Dec 2003
Posts: 377
Location: UK

PostPosted: Sun Dec 24, 2017 11:59 pm    Post subject: Reply with quote

Code:
Loading Linux 4.13.15-gentoo ..    then nothing


I had this a while back, and the cause was that I hadn't spotted that I had filled up my boot partition. The copy of the kernel images to the boot partition failed silently, truncating the image. Grub found it OK (since the file existed), but it just went nowhere on load.

Probably not your problem but I thought I'd mention it just in case...

Regards
Ian
Back to top
View user's profile Send private message
jyoung
Guru
Guru


Joined: 20 Mar 2007
Posts: 436

PostPosted: Wed Dec 27, 2017 7:31 pm    Post subject: Reply with quote

Just to be sure, I ran make clean in 4.13.5 and rebuilt it. The results were unchanged.

My system is setup with /boot/grub/efi on a separate partition, but /boot/grub on the root partition. So, I'm definitely not running out of space for the new kernels.
Back to top
View user's profile Send private message
xpxp2002
n00b
n00b


Joined: 29 Dec 2017
Posts: 3

PostPosted: Fri Dec 29, 2017 2:20 am    Post subject: Reply with quote

I'm also experiencing this issue. Upgrading from 4.12.12 to 4.14.8-r1. Used oldconfig to bring .config current. Booting off of 4.14.8-r1 freezes at the loading line out of Grub. 4.12.12 works just fine.

I think this new 4.14 branch still has some issues.
Back to top
View user's profile Send private message
xpxp2002
n00b
n00b


Joined: 29 Dec 2017
Posts: 3

PostPosted: Fri Dec 29, 2017 2:57 am    Post subject: Reply with quote

Been working on this for hours since Monday. Just figured it out...for my system, at least.

Try setting CONFIG_PGTABLE_LEVELS=4 if it is set to 5.
Back to top
View user's profile Send private message
jyoung
Guru
Guru


Joined: 20 Mar 2007
Posts: 436

PostPosted: Fri Dec 29, 2017 6:10 pm    Post subject: Reply with quote

xpxp2002, CONFIG_PGTABLE_LEVELS=4 in my .config.

It looks like it's set by arch, without a prompt in menuconfig. I tried setting to to 5 manually (just to see what would happen), but 'make' immediately rewrote .config with 4.
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 8291
Location: Saint Amant, Acadiana

PostPosted: Fri Dec 29, 2017 6:21 pm    Post subject: Reply with quote

This may be at least partially a gcc-6 problem. Have you tried with gcc-7.
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54232
Location: 56N 3W

PostPosted: Fri Dec 29, 2017 6:33 pm    Post subject: Reply with quote

jyoung,

A word of advice on editing the kernel .config by hand. Don't.

Its very easy to end up with an illegal .config that produces a horribly broken kernel.
Then its difficult to diagnose because nobody has seen anything like it before.
The problem stems from a single menuconfig entry flipping lots of .config file flags.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jyoung
Guru
Guru


Joined: 20 Mar 2007
Posts: 436

PostPosted: Fri Dec 29, 2017 6:49 pm    Post subject: Reply with quote

Yeah, I would never try this on a kernel that I actually needed ... I'm still running off 4.12.12 while I work on 4.14.8-r1.

I'm using gcc-7.2.0; in fact, that's the only one I'm seeing in gcc-config -l.
Back to top
View user's profile Send private message
xpxp2002
n00b
n00b


Joined: 29 Dec 2017
Posts: 3

PostPosted: Fri Dec 29, 2017 7:04 pm    Post subject: Reply with quote

jyoung wrote:
xpxp2002, CONFIG_PGTABLE_LEVELS=4 in my .config.

It looks like it's set by arch, without a prompt in menuconfig. I tried setting to to 5 manually (just to see what would happen), but 'make' immediately rewrote .config with 4.

Hmm. Sorry that didn’t work for you. I’ve been trying to get mine to work four days.

I’m on amd64. What arch is this?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54232
Location: 56N 3W

PostPosted: Fri Dec 29, 2017 8:00 pm    Post subject: Reply with quote

jyoung,

It looks like 4.14 is more trouble than its worth
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jyoung
Guru
Guru


Joined: 20 Mar 2007
Posts: 436

PostPosted: Sat Dec 30, 2017 1:19 pm    Post subject: Reply with quote

I'm also on amd64.

This problem also occurred with 4.13.5, but I see that that's already been removed.

Shall we mark this thread as solved? It's not really solved ... I'm certainly willing to try again and report back once a 4.15 kernel is released.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21618

PostPosted: Sat Dec 30, 2017 6:21 pm    Post subject: Reply with quote

Why wait? 4.15 is already up to -rc5. Linus usually releases around -rc7 or -rc8, depending on how he feels about overall quality. You might not want to stay on a -rcN kernel for daily work, but for a quick test, it's probably safe.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54232
Location: 56N 3W

PostPosted: Sat Dec 30, 2017 9:56 pm    Post subject: Reply with quote

Hu,

I've been using it since 4.15.0-rc1 as it has a new amdgpu driver.
I'm on rc-4 now and I can say that it seems to work for me.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jyoung
Guru
Guru


Joined: 20 Mar 2007
Posts: 436

PostPosted: Mon Jan 01, 2018 7:23 pm    Post subject: Reply with quote

I've justed tried 4.15_rc5, and the result is the same. The boot sequence stuck at "Loading Linux " bit.

You folks are getting your kernels from git-sources, yes?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54232
Location: 56N 3W

PostPosted: Mon Jan 01, 2018 8:24 pm    Post subject: Reply with quote

jyoung,

I fetched the kernel from kernel.org but git-sources is the Gentoo way to do the same thing.

Lets start from the very beginning. Post your
Code:
lspci -nnk
ouput.
Pastebin your non working 4.15.0-rc5 .config. Then I can drop it into the kernel and look at it.
Explain your filesystems in use. Particularly root, where it is, what it is and any hoops you need to jump through to mount it.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jyoung
Guru
Guru


Joined: 20 Mar 2007
Posts: 436

PostPosted: Wed Jan 03, 2018 4:25 am    Post subject: Reply with quote

Okay, here's a link to my .config

https://pastebin.com/4d4T1pms

This is a fresh .config, generated by running make menuconfig and then exiting without making any changes. I've also attempted using make oldconfig off the 4.12.12 kernel.

My drive is NVME, so it uses EFI. Also, instead of the partitions being name /dev/sda#, they're /dev/nvme0n1p#

/dev/nvme0n1p1 boot bios partition
/dev/nvme0n1p2 boot partition, mounted at /boot/grub/efi
/dev/nvme0n1p3 swap
/dev/nvme0n1p4 root partition
/dev/nvme0n1p5 home partition

Curiously, running df tells me that the boot and home partitions are mounted as expected, but the root path (/) corresponds to /dev/root instead of /dev/nvmen1p4.

On significant issue is that I had to enable the NVME items in the kernel in order to properly load the drive. I doubt that is the problem here for two reasons: 1) I've also tried enabling them in 4.14, and the problem remained, and make oldconfig from 4.12.12 would have enabled them too, and 2) before I enabled the proper NVME drivers in 4.12.12, the kernel loaded itself and started the boot process, failing at a later step, while here it's not even loading. All that said, I'd be happy to mess around with the NVME stuff some more if you folks think it's likely.

Running
Code:
lspci -nnk
produces

Code:

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1904] (rev 08)
   Subsystem: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:2015]
   Kernel driver in use: skl_uncore
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 520 [8086:1916] (rev 07)
   Subsystem: Microsoft Corporation HD Graphics 520 [1414:0015]
   Kernel driver in use: i915
   Kernel modules: i915
00:05.0 Multimedia controller [0480]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Imaging Unit [8086:1919] (rev 01)
   Subsystem: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Imaging Unit [8086:2015]
00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model [8086:1911]
   Subsystem: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model [8086:2015]
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21)
   Subsystem: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:7270]
   Kernel driver in use: xhci_hcd
   Kernel modules: xhci_pci
00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Thermal subsystem [8086:9d31] (rev 21)
   Subsystem: Intel Corporation Sunrise Point-LP Thermal subsystem [8086:7270]
   Kernel driver in use: intel_pch_thermal
   Kernel modules: intel_pch_thermal
00:14.3 Multimedia controller [0480]: Intel Corporation Device [8086:9d32] (rev 01)
   Subsystem: Intel Corporation Device [8086:7270]
00:15.0 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 [8086:9d60] (rev 21)
   Subsystem: Intel Corporation Sunrise Point-LP Serial IO I2C Controller [8086:7270]
   Kernel driver in use: intel-lpss
   Kernel modules: intel_lpss_pci
00:15.1 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 [8086:9d61] (rev 21)
   Subsystem: Intel Corporation Sunrise Point-LP Serial IO I2C Controller [8086:7270]
   Kernel driver in use: intel-lpss
   Kernel modules: intel_lpss_pci
00:15.2 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #2 [8086:9d62] (rev 21)
   Subsystem: Intel Corporation Sunrise Point-LP Serial IO I2C Controller [8086:7270]
   Kernel driver in use: intel-lpss
   Kernel modules: intel_lpss_pci
00:15.3 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #3 [8086:9d63] (rev 21)
   Subsystem: Intel Corporation Sunrise Point-LP Serial IO I2C Controller [8086:7270]
   Kernel driver in use: intel-lpss
   Kernel modules: intel_lpss_pci
00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-LP CSME HECI #1 [8086:9d3a] (rev 21)
   Subsystem: Intel Corporation Sunrise Point-LP CSME HECI [8086:7270]
   Kernel driver in use: mei_me
   Kernel modules: mei_me
00:16.4 Communication controller [0780]: Intel Corporation Device [8086:9d3e] (rev 21)
   Kernel driver in use: mei_me
   Kernel modules: mei_me
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 [8086:9d14] (rev f1)
   Kernel driver in use: pcieport
   Kernel modules: shpchp
00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 [8086:9d18] (rev f1)
   Kernel driver in use: pcieport
   Kernel modules: shpchp
00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-LP LPC Controller [8086:9d48] (rev 21)
   Subsystem: Intel Corporation Sunrise Point-LP LPC Controller [8086:7270]
00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-LP PMC [8086:9d21] (rev 21)
   Subsystem: Intel Corporation Sunrise Point-LP PMC [8086:7270]
00:1f.3 Audio device [0403]: Intel Corporation Sunrise Point-LP HD Audio [8086:9d70] (rev 21)
   Subsystem: Intel Corporation Sunrise Point-LP HD Audio [8086:7270]
   Kernel driver in use: snd_hda_intel
   Kernel modules: snd_hda_intel, snd_soc_skl
01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 [144d:a802] (rev 01)
   Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 [144d:a801]
   Kernel driver in use: nvme
02:00.0 Ethernet controller [0200]: Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless [11ab:2b38]
   Subsystem: Device [0003:045e]
   Kernel driver in use: mwifiex_pcie
   Kernel modules: mwifiex_pcie
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54232
Location: 56N 3W

PostPosted: Wed Jan 03, 2018 10:33 am    Post subject: Reply with quote

jyoung,

Here's how booting works. It solves the problem of loading an operating system from the block device without being able to read the filesystem on the block device.
This is for BIOS, not EUFI, but the problems are the same EUFI can read exactly one filesystem - vfat, So everything needed to get started has to be there.

The BIOS can read exactly one disk block. That's LBA 0. When it starts, it does all the POST checks, sets up the hardware, loads LBA 0 into RAM and jumps to its start address.
LBA 0 contains at most 446 bytes of code. All it can do is make BIOS calls to load some more disk blocks into RAM ... and jump to the start address.
So we have a chain of loaders, each more capable than the last. Eventually, grub gets loaded, by reading the filesystem its on and shows you a menu.
When you make your choice, Grub shows the message about loading the kernel and if you have an initrd, about loading the intrd.
You don't report the initrd message - so lets assume you don't have one.

Grub exits by jumping to the kernel start address. The kernel binary is all alone at this time, it can't load any modules until root is mounted as they are in /lib/modules

The kernel decompresses itself and as it starts, it puts a message on the display that you don't see. Its a good idea to have all the modules used in lspci configured as <*> in the kernel.

Code:
CONFIG_DRM_I915=m
is for your Intel framebuffer driver. It will start after root is mounted bot is otherwise OK.
Under
Code:
 # Frame buffer hardware drivers
only
Code:
CONFIG_FB_EFI=y
# CONFIG_FB_SIMPLE is not set
may be enabled. The others all fight over the hardware and no display driver works.

Code:
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
Your display is in portrait mode?

Code:
xhci_hcd. CONFIG_USB_XHCI_HCD=m
No USB 3 until root is mounted. It may not matter but if you had an initrd, this may prevent you interacting with the rescue shell.

Ahhh.
Code:
Sunrise Point-LP
Thats in the middle of everything on your system. Everything Sunrise Point related must be built into the kernel.
Code:
CONFIG_MFD_INTEL_LPSS=m
CONFIG_MFD_INTEL_LPSS_PCI=m
CONFIG_INTEL_MEI_ME=m

All need to be built in.

Change
Code:
CONFIG_HOTPLUG_PCI_SHPC=m
to built in too.

I don't expect it to boot after those changes but we might get some debug information.

As a rule of thumb, everything needed to get the root filesystem mounted needs to be built in. Other things can be left as modules.
The drivers for your Sunrise Point chipset come under the heading needed to get the root filesystem mounted.

-- edit --

Some of those settings will only be available as off or <M>. You will need to go back up the menu system an change the menu(s) from <M> to <*> to be able to select built in.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum