Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Debugging kernel boot?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
uberDoward
n00b
n00b


Joined: 09 Jun 2011
Posts: 35

PostPosted: Sun Nov 19, 2017 3:16 am    Post subject: Debugging kernel boot? Reply with quote

I don't think I'm getting as far as Open-RC - I get the linux penguins along the top, with [timestamp] message displayed.

Anyway, the kernel appears to be getting hung, without actually going into a kernel panic. Is there a way to debug the actual kernel boot process? I find a lot about debugging Open-RC, little about the kernel's boot...
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6339
Location: almost Mile High in the USA

PostPosted: Sun Nov 19, 2017 6:09 am    Post subject: Reply with quote

Knowing what were the last things that it printed before it hangs would be useful. A picture would help if it's difficult to type what it wrote.

Yes, it would be helpful to know if it started openRC or did it not even get that far, did it free unused kernel memory (usually one of the last things it does before handing off control to init)?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 39261
Location: 56N 3W

PostPosted: Sun Nov 19, 2017 9:55 am    Post subject: Reply with quote

uberDoward,

Its possible to configure the kernel with no console. Everything still works but there is nothing on the screen.
From memory, you don't even get the tux icons.

If openrc is being started, you may be able to log in via ssh, if thats set up.

Then there is a serial console, if your system has a real serial port and you have a way to connect to it.
There is also console over network.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
bunder
Bodhisattva
Bodhisattva


Joined: 10 Apr 2004
Posts: 5312

PostPosted: Sun Nov 19, 2017 11:41 am    Post subject: Reply with quote

Is it possible to get early printk, then have it switch to a non-functional framebuffer? If that's possible, that could in theory give the appearance of a hung boot with penguins.

With no framebuffer, you'd just get a black screen and no penguins.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 39261
Location: 56N 3W

PostPosted: Sun Nov 19, 2017 11:55 am    Post subject: Reply with quote

bunder,

Thats possible with the text console build in then switching to a broken framebuffer built as a module.
However - no penguins.

Its also possible to have several framebuffers configured the preferred one could be broken.
Being lazy, I have vesafb and amdgpudrmfb.

They both work and the switch is clearly visible. Both during the boot process and in dmesg.
Code:
[    1.527743] vesafb: mode is 1024x768x16, linelength=2048, pages=29
[    1.527745] vesafb: scrolling: redraw
[    1.527748] vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
[    1.527760] vesafb: framebuffer at 0xd0000000, mapped to 0xffffc90000400000, using 3072k, total 49152k
[    1.530011] fb0: VESA VGA frame buffer device
[    1.553452] fb: switching to amdgpudrmfb from VESA VGA

_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
uberDoward
n00b
n00b


Joined: 09 Jun 2011
Posts: 35

PostPosted: Sun Nov 19, 2017 4:56 pm    Post subject: Reply with quote

Using ASPEED, no modules (going for a very lean monolithic kernel). Framebuffer appears to switch over - now you'e got me wondering if I forgot to put in the console LOL, let me check that.
Back to top
View user's profile Send private message
uberDoward
n00b
n00b


Joined: 09 Jun 2011
Posts: 35

PostPosted: Sun Nov 19, 2017 5:42 pm    Post subject: Reply with quote

I'll get a picture shortly. TTY is compiled into the kernel, let me check the framebuffers enabled.... are there specific options I need to have enabled?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6339
Location: almost Mile High in the USA

PostPosted: Sun Nov 19, 2017 5:48 pm    Post subject: Reply with quote

I'm confused, you did say the penguins showed up as well as the [0.2512512] blahblah timestamped kernel messages?

I wasn't sure if you were asking a general question or trying to fix your specific box... If you did see the penguins and kernel messages, at least there's some clues to figure out what's going on.

On the other hand, black screen boots with no penguins are most annoying to debug.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 39261
Location: 56N 3W

PostPosted: Sun Nov 19, 2017 6:00 pm    Post subject: Reply with quote

uberDoward,

Make friends with wgetpaste.

Put your lspci output in a post together with a link to your grub.cfg, or whatever your boot loader config file is and a link to your kernel .config file.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
uberDoward
n00b
n00b


Joined: 09 Jun 2011
Posts: 35

PostPosted: Mon Nov 20, 2017 12:35 am    Post subject: Reply with quote

NeddySeagoon, that's freaking awesome!

config: https://paste.pound-python.org/show/7eCSAtiIBY3XYcLrhLUY/
grub.cfg: https://paste.pound-python.org/show/P69GrwzUOs0R54dWhAOY/

Picture of boot: https://imagebin.ca/v/3ht5U779oAg3
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 39261
Location: 56N 3W

PostPosted: Mon Nov 20, 2017 11:22 am    Post subject: Reply with quote

uberDoward,

What we know so far ...
Grub loads the kernel and the kernel starts with a framebuffer console but we don't know which one.
UVESA is no longer complete, so Its not that one.

The help text on
Code:
CONFIG_DRM_AST

Say yes for experimental AST GPU driver. Do not enable this driver without having a working -modesetting, and a version of AST that knows to fail if KMS is bound to the driver. These GPUs are commonly found in server chipsets.

Some/all DRM kernel drivers provide a free framebuffer, so at a guess, the kernel starts with the VESA framebuffer then switches to a broken AST framebuffer. Hence the warning about having a working -modesetting.

As a test, add nomodeset to your kernel command line(s) in grub.cfg. $EDITOR will be fine meanwhile.
You won't like the result but we should get more information.

Your lspci output will still be useful.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6339
Location: almost Mile High in the USA

PostPosted: Mon Nov 20, 2017 5:53 pm    Post subject: Reply with quote

Since the min installer CD booted( ? - what boot media did you use? ) using that kernel's .config as a starting point would be helpful. Agreed your lspci info would be useful.

Other key things that are "interesting": It took almost 20 seconds for things to settle down. A lot of these messages may have shown up asynchronously.

Some things I'd try to help debug(but not "fix"): Removing USB drivers (so we don't see all the USB async stuff at the expense of no keyboard/mouse, but at least less things will scroll off). Does shift-pageup scroll back show anything interesting (which won't work if your usb/keyboard isn't compiled, but, hey...)

Does /dev/sde4 sound like the proper root disk?

(Incidentally, I hate the penguins. I never compile that in because it hides 3-4 lines of screen real estate for kernel boot debug :-))
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 39261
Location: 56N 3W

PostPosted: Mon Nov 20, 2017 6:12 pm    Post subject: Reply with quote

eccerr0r,

There is also framebuffer rotate to get more lines on the screen.

The kernel normally mounts root before USB is initalised, so I think that root is mounted but it might still be read only.

-- edit --

We can try Interactive mode for openrc too.
Edit /etc/rc.conf
Find the line that says
Code:
#rc_interactive="YES"
and remove the # at the start. Save the change.

Reboot normally. As soon as you see the Penguins, press and hold the 'I' key.
Openrc will stop and ask about each service that it wants to start.
This happens before your keymap has been set, so its the 'I' key on the USA QWERTY keyboard layout.
Depending on your keymap, that might matter. I use dvorak-uk. It matters to me.

This will tell if openrc gets started or not.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
uberDoward
n00b
n00b


Joined: 09 Jun 2011
Posts: 35

PostPosted: Mon Nov 20, 2017 11:33 pm    Post subject: Reply with quote

lspci -nnk output:
https://paste.pound-python.org/show/OzuhINTSLRkDa5SYIt4z/

Let me recompile without the penguins, lol - I've tried interactive mode, but to no avail.

/dev/sde4 is rootfs, /dev/sde2 is boot. /dev/sde is the OS drive (32GB ssd)

I'll pull out the USB stuff, see if anything else helpful comes up :)
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 39261
Location: 56N 3W

PostPosted: Mon Nov 20, 2017 11:55 pm    Post subject: Reply with quote

uberDoward,

Code:
01:09.0 VGA compatible controller [0300]: ASPEED Technology, Inc. ASPEED Graphics Family [1a03:2000] (rev 10)
   Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family [1a03:2000]


That's interesting for what it doesn't say. Look no kernel module.

Try turning off
Code:
CONFIG_DRM_AST
Whatever was driving the console when you posted lspci, it wasn't that.
What does dmesg have to say about the console driver?

Code:
$ dmesg | grep -B2 Console
[    0.000000]    Tasks RCU enabled.
[    0.000000] NR_IRQS: 4352, nr_irqs: 472, preallocated irqs: 16
[    0.000000] Console: colour dummy device 80x25
--
[    1.560247] vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
[    1.560259] vesafb: framebuffer at 0xd0000000, mapped to 0xffffc90000400000, using 3072k, total 49152k
[    1.561408] Console: switching to colour frame buffer device 128x48
--
[    1.585905] checking generic (d0000000 3000000) vs hw (d0000000 10000000)
[    1.585905] fb: switching to amdgpudrmfb from VESA VGA
[    1.585929] Console: switching to colour dummy device 80x25
--
[    2.611092] [drm]    pitch is 10240
[    2.611125] fbcon: amdgpudrmfb (fb0) is primary device
[    2.828946] Console: switching to colour frame buffer device 320x90

_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
uberDoward
n00b
n00b


Joined: 09 Jun 2011
Posts: 35

PostPosted: Tue Nov 21, 2017 1:08 am    Post subject: Reply with quote

Console grep from LiveCD: https://paste.pound-python.org/show/wiZ4B8lXGzUipTJnRiud/

Very interesting - let me remove the AST module, and see if there's a vesafb I don't have set up in the kernel...

*edit*

Latest kernel .config : https://paste.pound-python.org/show/Vb7OLaGwv2LfxgsAxbtc/
Back to top
View user's profile Send private message
uberDoward
n00b
n00b


Joined: 09 Jun 2011
Posts: 35

PostPosted: Tue Nov 21, 2017 2:13 am    Post subject: Reply with quote

Ok, weird. No more high res VESA, but got a kernel panic (VFS: unable to mount root fs on unknown block(8,68))

Looking into it, noticed that hitting grub's command line, everything was pointing @ hd4,gpt2. ls hd0,gpt2, however was the correct one. Now, why would the BIOS decide to change hd0? No idea, I didn't change anything in there (I have first HDD boot as my /dev/sde 32GB SSD).

So manually editing grub via 'e' to point to (hd0,gpt4) and /dev/sda4 for the kernel command line yields: https://imagebin.ca/v/3i0X9x50yDLx
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6339
Location: almost Mile High in the USA

PostPosted: Tue Nov 21, 2017 2:47 am    Post subject: Reply with quote

One thing that gets tripped up frequently is that the (hdX,Y) in grub has no relationship to /dev/sdX in Linux. Ideally they map directly but no. Also the (hdX,Y) do nothing for the kernel, it's only for grub to locate boot images (kernel, initramfs).

Anyway, interesting, so if you do have root=/dev/sde4 it behaves differently than if you have root=/dev/sda4? That would imply the kernel did end up switching over to init?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 39261
Location: 56N 3W

PostPosted: Tue Nov 21, 2017 10:54 am    Post subject: Reply with quote

uberDoward,

Grub sees block devices as enumerated by the motherboard firmware.

The kernel sees them in PCI bus order as it scans the PCI bus but its not that simple either.
'Built in' are always enumerated before modules.
It gets worse. The kernel may use several threads for PCI enumeration, so there is a possibility of a race.
The race can change the HDD order from kernel build to kernel build or if you are really unlucky, from boot to boot.

The point is that there is no deliberate correlation between what grub sees and the kernel sees.

Use root=PARTUUID=<your_root_PARTUUID> in place of root=/dev/sd..

blkid will show all the PARTUUIDs. Google for the exact syntax.
It won't matter what device root is on, the kernel will find it.

The same thing will upset /etc/fstab. You can use UUID or PARTUUID there, so that dynamic device renaming is harmless.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
uberDoward
n00b
n00b


Joined: 09 Jun 2011
Posts: 35

PostPosted: Wed Nov 22, 2017 12:52 am    Post subject: Reply with quote

So how do I make the UUID stick to a grub-mkconfig?

Note also, no initramfs here on startup. I've just tried modifying /boot/grub/grub.cfg by hand, though, so let's see what happens, lol
Back to top
View user's profile Send private message
uberDoward
n00b
n00b


Joined: 09 Jun 2011
Posts: 35

PostPosted: Wed Nov 22, 2017 2:15 am    Post subject: Reply with quote

So I got it to boot by setting /dev/sdg4 after editing the grub menu ('e' @ grub entry).

Something somewhere is happily screwing up my boot device names. I couldn't even boot with root=PARTUUID=<partuuid> so I'm really at a loss.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6339
Location: almost Mile High in the USA

PostPosted: Wed Nov 22, 2017 6:36 am    Post subject: Reply with quote

When you made grub-mkconfig it generated the /dev/sdXX and not PARTUUIDs? Changed /etc/default/grub ?

I suppose if you have a lot of hard drives and they may change around randomly, it might be worth it to make sure PARTUUID works, or use initramfs that supports UUID.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 39261
Location: 56N 3W

PostPosted: Wed Nov 22, 2017 11:46 am    Post subject: Reply with quote

uberDoward,

The kernel understands PARTUUID - its a property of the partition.
To use UUID, which is a property of a filesystem, you must use an initrd that contains the userspace mount binary.

Code:
/sbin/blkid
/dev/sda1: UUID="9392926d-6408-6e7a-8663-82834138a597" TYPE="linux_raid_member" PARTUUID="0553caf4-01"
/dev/sde1: UUID="c400b18c-0210-4338-a0fd-f437ecbaaf99" TYPE="ext4" PARTLABEL="ext4" PARTUUID="150e6ef1-7ba8-409c-9c3f-dbdecdc9f18b"


sda is MSDOS and sde is GPT.
Notice that for GPT, the UUID and PARTUUID look similar but you need to use the right one.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
uberDoward
n00b
n00b


Joined: 09 Jun 2011
Posts: 35

PostPosted: Thu Nov 23, 2017 7:15 pm    Post subject: Reply with quote

Yeah, it refused to work via UUID.

I'm not using an initramfs - I thought the kernel should boot anyway.

For now, though, it's fixed. I re-did the grub-mkconfig -o /boot/grub/grub.cfg after booting the manually altered grub to get into my Gentoo system, and all is good now.

Very odd behavior - wish I had the time to figure out what went wrong and fix it, though.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 39261
Location: 56N 3W

PostPosted: Thu Nov 23, 2017 8:09 pm    Post subject: Reply with quote

uberDoward,

You write UUID - which cannot work, instead of PARTUUID.
Did you test with UUID or PARTUUID in grub.cfg?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum