Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[bug + workaround]Kernel Panic during boot (VBox 4.3.10 VM)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Installing Gentoo
View previous topic :: View next topic  
Author Message
thurnax
n00b
n00b


Joined: 17 Apr 2014
Posts: 4

PostPosted: Thu Apr 17, 2014 2:19 pm    Post subject: [bug + workaround]Kernel Panic during boot (VBox 4.3.10 VM) Reply with quote

I'm trying to install a stage3 install as a guest on VirtualBox 4.3.10 host on Win7-x64. But no matter how I try, the install fails to boot. I tried bootstrapping and compiling the kernel manually but each boot ends with a "Kernel Panic - Not Synching: Attempted to kill init!". When trying to configure the kernel using the genkernel tool, the boot gets stuck at "Loading initial ramdisk ...". I cannot get the system to generate the rc.log so it must fail before rc comes in.

The kernel to be installed is the x86_64 variant of 3.12.13. I added the '-march=native' flag into CFLAGS of /etc/portage/make.conf. In VirtualBox I assigned 3 CPU cores to the VM, ued bridged networking, 2GB RAM, 8GB hard drive image file. I conducted all installs from the latest SystemrescueCD, otherwise I have followed the handbook to the t. I have also tried DasGregor's 5-part "Gentoo Install Guide" on Youtube and I have also tried Simba7's Stage1 from SysrescueCD.



Some things to be noted during install:
1: I use fdisk (I have also used gdisk) and make primary partitions based on MBR (i.e. non-GPT install, so the extra 2MB UEFI BIOS boot partition is not needed). The following partitions are used:

/dev/sda1 ext2 128MB /boot +bootflag 8300 Linux
/dev/sda2 swp 512MB <swap> 8200 Linux Swap
/dev/sda3 ext4 7+GB / 8300 Linux

Sometimes I forgot to set the bootflag, and sometimes the /dev/sda3 was mistakenly set as ext3 in the /etc/fstab while it is formatted as ext4. These issues have been addressed while not solving the issue at hand.

2: 'emerge grub:0' fails to install but 'emerge sys-boot/grub' does not. However, after emerge, the '/dev/sda1' on '/boot' is now mounted as read-only making the 'grub2-install /dev/sda' fail to complete. A remount of /boot as rw helps fixing this issue.


Last edited by thurnax on Mon Apr 21, 2014 10:56 pm; edited 4 times in total
Back to top
View user's profile Send private message
thurnax
n00b
n00b


Joined: 17 Apr 2014
Posts: 4

PostPosted: Thu Apr 17, 2014 4:40 pm    Post subject: Reply with quote

I tried to rebuild the kernel and initramfs using one of the hardened profiles instead. The genkernel tool fails with the error message that "/sbin/mount.zfs" cannot be found. Trying to emerge zfs fails due to what appears to be a "masking" issue. It says, "The following keyword changes are necessary to proceed:" followed by a number of "required by" lines. Strange. Edit: After reading into the Gentoo Wiki page of zfs, I managed to get the accept_keywords handled and make the install. I find it a little strange however that this isn't handled automatically when changing the profile to the hardened kernel that uses zfs by default.

The main issue is that the install fails to boot so the intention here is to try a different build to see if that fails as well...

After following the thread:

http://forums.gentoo.org/viewtopic-t-912622-start-0.html

where the issue remains unsolved.I start to realize that the problem might be with GRUB. But I cannot get the GRUB legacy to compile ('emerge sys-boot/grub:0'). It returns the following errors in the configure.log:

Quote:
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/g
cc/x86_64-pc-linux-gnu/4.7.3/libgcc.a when searching for -lgcc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lgcc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/libgcc_s.so when searching for -lgcc_s
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lgcc_s
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../lib64/libc.so when searching for -lc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../lib64/libc.a when searching for -lc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/../lib64/libc.so when searching for -lc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/../lib64/libc.a when searching for -lc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../libc.so when searching for -lc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../libc.a when searching for -lc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/libc.so when searching for -lc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/libc.a when searching for -lc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/libgcc.a when searching for -lgcc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lgcc
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/libgcc_s.so when searching for -lgcc_s
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lgcc_s
collect2: error: ld returned 1 exit status


I tried running 'perl-cleaner --all' as suggested in another thread when grub-legacy fails to compile but it still fails to emerge. I tried installing 'grub-static' instead. I noted the yellow box under the GRUB legacy section of the Gentoo Handbook but I'm not using a non-multilib profile.




Update:

After resolving the bootloader issues and installing GRUB0.97r12 (grub-static) instead of GRUB2, I get kernel panic during boot. Kernel panic! How do I proceed? The Handbook doesn't say anything about a kernel panic that is supposed to come. It should be noted that I installed Archlinux on another VM with the same settings and that VM runs just fine.

:?:

Here's a screen dump (png) showing the whole boot procedure:

http://s000.tinyupload.com/?file_id=23738542452707218690
Back to top
View user's profile Send private message
thurnax
n00b
n00b


Joined: 17 Apr 2014
Posts: 4

PostPosted: Fri Apr 18, 2014 12:43 pm    Post subject: Reply with quote

As per §KC13 section 4) in the FAQ regarding Kernel Compilation I hereby supply the logs as suggested when the kernel panics:

/usr/src/linux/.config - I cannot identify relevant sections of this file (such as 'Block Devices') so I supply the entire thing:

http://pastebin.com/Ff9T1RY2

/usr/sbin/lspci output: http://pastebin.com/a40QLEgK
My /etc/fstab config: http://pastebin.com/f89vuQ2j
My partition table: http://pastebin.com/aGT2mdbA
/boot/grub/grub.conf: http://pastebin.com/yYrDhrbR

Once again, the the screen capture of the boot procedure after GRUB. Note that the output of the capture is limited by the 30fps update rate of the display:

http://s000.tinyupload.com/?file_id=23738542452707218690
Back to top
View user's profile Send private message
TomWij
Developer
Developer


Joined: 04 Jul 2012
Posts: 1551

PostPosted: Sat Apr 19, 2014 8:54 am    Post subject: Reply with quote

The failing instruction seems to reveal the problem lies with Intel uncore PMU support, see if VirtualBox allows you to change something about this in its settings; in specific it bails out when initializing the MSR (Model Specific Registers), sounds like something rather specific and low level to me. You might want to capture the full trace and report it upstream at https://bugzilla.kernel.org; to the kernel parameters, you can add something like boot_delay=100 (adjust as you see fit, lower is faster) that allows 10 messages per second and thus can allow you to capture the top part of the kernel BUG output.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 32018
Location: 56N 3W

PostPosted: Sat Apr 19, 2014 9:40 am    Post subject: Reply with quote

thurnax,

To test if its your kernel or VBox settings, attempt to boot a System Rescue CD ISO image in your VBox.
If it fails, its your VBox settings, the test is not conclusive though, as if it boots, it may be that System Rescue CD is more tolerant of your VBox setup.
Test with the System Rescue CD 64 bit kernel.

System Rescue CD is Gentoo based.

So far, grub has done its stuff and loaded the kernel and the initrd. As boot failed before attempting to mount root, only VBox, the kernel and initrd can be involved,
so that narrows it down a little.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
thurnax
n00b
n00b


Joined: 17 Apr 2014
Posts: 4

PostPosted: Sat Apr 19, 2014 12:19 pm    Post subject: Reply with quote

The SystemrescueCD works just fine and I've been using it all the time. It is 64-bit and it is using kernel version 3.10.32. The Archlinux install is also 64-bit and it also works fine. The kernel version of the Archlinux install is 3.14.1.

As a side note, I think I know what is the problem with GRUB2. I think there is a bug that prevents it from properly setting up MBR installs whereas it works for GPT installs. But thats my 2 cents.

The uname and /proc output for the working installs:

SysrescueCD:
Linux sysresccd 3.10.32-std410-amd64 #2 SMP Tue Mar 11 21:05:08 UTC 2014 x86_64 Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz GenuineIntel GNU/Linux

Linux version 3.10.32-std410-amd64 (root@catalyst) (gcc version 4.4.7 (Gentoo 4.4.7 p1.2, pie-0.4.5) ) #2 SMP Tue Mar 11 21:05:08 UTC 2014

Archlinux:
Linux thurnax 3.14.1-1-ARCH #1 SMP PREEMPT Mon Apr 14 20:40:47 CEST 2014 x86_64 GNU/Linux

Linux version 3.14.1-1-ARCH (nobody@var-lib-archbuild-testing-x86_64-tobias) (gcc version 4.8.2 20140206 (prerelease) (GCC) ) #1 SMP PREEMPT Mon Apr 14 20:40:47 CEST 2014


Regarding VirtualBox, I cannot find any parameters in the GUI that allows you to modify Intel Uncore PMU support. Admittedly, there are more things under the hood than the GUI allows you to see, but the documentation for the vboxmanage command doesn't show any available parameters for toggling PMU. The only CPU parameters that you can modify are the following; PAE/NX, Nested Paging, and VT-x/AMD-v. There are also options (not shown in GUI) for toggling long mode, tagged TLB/VPID (when VT-x is enabled) and 'unrestricted guest mode'. But no PMU.

Adding the "boot_delay=100" parameter to the kernel line in GRUB (by using 'e' in the boot menu) doesn't slow down the printk messages. I tried 10, 100, and 1000 to no avail. Perhaps some of what is on this page

https://wiki.archlinux.org/index.php/Boot_debugging

is supported by Gentoo kernels? I'll try and recompile with the ordinary desktop profile, right now I'm using a kernel compiled with the hardened profile. Edit: I switched back using 'eselect profile set 3' and recompiled the kernel. The problem remains but now the kernel panic ends with the message:
Kernel panic — not syncing: attempted to kill init! exitcode=0x0000000b

I dug into the changelogs of the 3.12.x kernel on the kernel.org website. In the changelog for kernel 3.12.14 I found the following entry:

Quote:
commit 04f4e59ba8d8d7db9fad2c1cee1d4ac6dc8bd7c5
Author: Peter Zijlstra <peterz@infradead.org>
Date: Fri Feb 21 16:03:12 2014 +0100

perf/x86: Fix event scheduling

commit 26e61e8939b1fe8729572dabe9a9e97d930dd4f6 upstream.

Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
with perf WARN_ON()s triggering. He also provided traces of the failures.

This is I think the relevant bit:

> pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable
> pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: {
> pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null))
> pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
> pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: {
> pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800)

So we add the insn:p event (fd[23]).

At this point we should have:

n_events = 2, n_added = 1, n_txn = 1

> pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
> pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)

We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
that's not visible.

group_sched_in()
pmu->start_txn() /* nop - BP pmu */
event_sched_in()
event->pmu->add()

So here we should end up with:

0: n_events = 3, n_added = 2, n_txn = 2
4: n_events = 4, n_added = 3, n_txn = 3

But seeing the below state on x86_pmu_enable(), the must have failed,
because the 0 and 4 events aren't there anymore.

Looking at group_sched_in(), since the BP is the leader, its
event_sched_in() must have succeeded, for otherwise we would not have
seen the sibling adds.

But since neither 0 or 4 are in the below state; their event_sched_in()
must have failed; but I don't see why, the complete state: 0,0,1:p,4
fits perfectly fine on a core2.

However, since we try and schedule 4 it means the 0 event must have
succeeded! Therefore the 4 event must have failed, its failure will
have put group_sched_in() into the fail path, which will call:

event_sched_out()
event->pmu->del()

on 0 and the BP event.

Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
giving what we see below:

n_event = 2, n_added = 2, n_txn = 2

> pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable
> pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: {
> pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null))
> pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
> pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: {
> pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800)
> pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0

So the problem is that x86_pmu_del(), when called from a
group_sched_in() that fails (for whatever reason), and without x86_pmu
TXN support (because the leader is !x86_pmu), will corrupt the n_added
state.

Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

Could this be the culprit that is now fixed in kernel 3.12.14? The latest build of this line of kernels is 17 and I could see more PMU related fixes further down in the changelogs.

Edit: After some random googling on the error messages in the trace of the kernel panic I found the following thread:

http://forums.gentoo.org/viewtopic-p-7518846.html

The commandline:

Code:
"D:\Program Files\Oracle\VirtualBox\VBoxManage.exe" setextradat
a "Gentoo Stage 1" VBoxInternal/CPUM/EnableHVP 1

appeared to fix this issue.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Installing Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum