View previous topic :: View next topic |
Author |
Message |
jesnow l33t
Joined: 26 Apr 2006 Posts: 856
|
Posted: Tue Feb 26, 2013 12:57 am Post subject: kernel upgrade disaster: panic on root mount [solved] |
|
|
I just upgraded from 3.5.7 to 3.7.9 when it went stable. The 3.7.9 kernel panics when it tries to mount root. (EXT4). Strange, but OK, I'll just use the old kernel until I figure out what's going on. But now the old kernel panics too. A still older kernel can't deal with the new udev and mounts root, but doesn't make it to a login.
I'm mystified. At first I thought maybe LILO had overwritten the partition table (that was before 3.2.5 succeeded in mounting root). I booted with a sabayon usb stick and can see everything, so I haven't lost data, but now I don't know what to fix.
Help me out here. I don't even know where to start!
Cheers,
jon.
Last edited by jesnow on Wed Feb 27, 2013 6:49 pm; edited 2 times in total |
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21624
|
Posted: Tue Feb 26, 2013 3:10 am Post subject: |
|
|
Pick a problem to solve first, and provide enough information that we can help you solve it. Maybe you want to abandon the old kernels, make 3.7.9 work, and never look back. Maybe you want to make 3.5.7 work again, and fix 3.7.9 afterward. Regardless, please provide at least the last 25 lines of output before the panic, and specify which problem you have selected. |
|
Back to top |
|
|
jesnow l33t
Joined: 26 Apr 2006 Posts: 856
|
Posted: Tue Feb 26, 2013 12:58 pm Post subject: |
|
|
If I had a lot of troubleshooting info, I would probably have solved it myself. I need to develop some hypotheses and build a decision tree, and without going to the extra step of capturing the output from a panicking kernel (set the console to the serial port I haven't got?), I'm kind of stuck.
I typed in a few lines from the panic output -- here they are.
The top line on the screen starts out:
Code: |
Call Trace:
[<c1434e79>] ? panic+0x7b/0x159
[<c1601c50>] ? mount_block_root+0x1df/0x1ef
[<c1601dba>] ? mount_root+0x7d/0xd3
|
It goes on like that for 6 more lines, then
Code: |
panic occurred, switching back to text console
----------------[ cut here ]--------------------
WARNING: at arch/x86/kernel/apic/ipi.c:113 default_send_IPI_mask_logical+0x55/0xad
Hardware name: VGN-TZ90S
Modules linked in:
Pid: 1, comm: swapper/0 Not tainted 3.7.9-gentoo #1
Call Trace:
|
Then more addresses. It looks to me like it panicked *again* while trying to get to a text console.
So far I have verified that both the partition table and file system are sound, and supported by both kernels. That leaves a bad link in LILO's internal table for the location of / as the only possibility I can think of to test.
Not sure how to do that or remediate it, or if there's another possibility. |
|
Back to top |
|
|
baaann Guru
Joined: 23 Jan 2006 Posts: 558 Location: uk
|
Posted: Tue Feb 26, 2013 7:56 pm Post subject: |
|
|
No answers just some ideas(may be well wide of the mark)
I think that Hu has asked which kernel you wish to go with because some config locations changed >=3.7(certainly with video, but maybe other devices?)
Have you enabled devtmpfs in your config? Udev requires that.
Given that your system was booting prior to upgrading the kernel, maybe post your config and device details(should be able to get these with a livecd) so that the experts(doesn't include me ) can check it out |
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21624
|
Posted: Tue Feb 26, 2013 11:00 pm Post subject: |
|
|
jesnow wrote: | I need to develop some hypotheses and build a decision tree, and without going to the extra step of capturing the output from a panicking kernel (set the console to the serial port I haven't got?), I'm kind of stuck. | Do you have no way at all to capture the contents of the screen? No camera, either in a phone or freestanding? No netconsole support?
baaann is correct. You need to pick which kernel you want to fix first. The 3.5 kernel is an easy choice, because we know it worked before an unspecified change to the system. The 3.7 kernel is a harder choice since we do not know if it would start working when you fix 3.5 or if there is something else wrong with it. |
|
Back to top |
|
|
jesnow l33t
Joined: 26 Apr 2006 Posts: 856
|
Posted: Wed Feb 27, 2013 1:16 am Post subject: |
|
|
Thanks for explaining. I think fixing the 3.5 kernel makes sense, as that one worked. I'm getting it chrooted right now.
UPDATE: I chrooted it and cleaned the ~20 old kernels out of lilo.conf, on the off chance that an overstuffed MBR was responsible.
The Sabayon distro I used has 3.7.0, and what used to be /dev/hda (the original ssd on the TZ90) is now /dev/sda. I hope that change won't cause trouble down the road. Anyway LILO ran fine. But didn't solve the problem. |
|
Back to top |
|
|
jesnow l33t
Joined: 26 Apr 2006 Posts: 856
|
|
Back to top |
|
|
jburns Veteran
Joined: 18 Jan 2007 Posts: 1214 Location: Massachusetts USA
|
Posted: Wed Feb 27, 2013 6:06 am Post subject: |
|
|
Is your 357 kernel trying to mount the correct disk? |
|
Back to top |
|
|
jesnow l33t
Joined: 26 Apr 2006 Posts: 856
|
Posted: Wed Feb 27, 2013 11:45 am Post subject: |
|
|
jburns wrote: | Is your 357 kernel trying to mount the correct disk? |
That's a very good question.
All the symptoms seem to point that way, don't they. I did notice
that the disk names had changed when I edited lilo.conf just now. It used to be that all disks were /dev/hdx,
except for scsi which were /dev/sdx. Now (since the new udev?) all disks are /dev/sdx, and
the *IDE* ssd that came builtin to the TZ-90 is no longer /dev/hda but /dev/sda, meaning that the sata
I'm booting from was /dev/sda and is now /dev/sdb.
That's a hypothesis to test when I get to work. BUT, how would lilo have found the kernel image just now
when I ran it? But that's definitely a line to follow. I don't think kernel configuration is the issue. |
|
Back to top |
|
|
jesnow l33t
Joined: 26 Apr 2006 Posts: 856
|
Posted: Wed Feb 27, 2013 6:48 pm Post subject: |
|
|
Success.
The problem was in fact that the device naming had changed.
I seem to recall disabling CONFIG_IDE as instructed by somebody. I presume the sata driver owns the ide code now and gives ide devices /dev/sdx device files, so that changed the device naming. Duh. It was so mysterious before and so obvious now.
The internal IDE ssd became /dev/sda and the SATA SSD became /dev/sdb -- I had always pointed root at /dev/sda1 and only needed to change it to /dev/sdb1 for both kernels to work.
Thanks jburns for coming up with the key question to ask!
I'm back in business!
Jon. |
|
Back to top |
|
|
|