View previous topic :: View next topic |
Author |
Message |
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Tue Jan 02, 2018 8:50 pm Post subject: 4.12 to 4.14, kernel panic. Solved. |
|
|
Hi,
I did an update, got the new linux-4.14.8-gentoo-r1 sources and decided to upgrade from 4.12.12-gentoo.
I got a kernel panic early in the boot, no logs are written. I would appreciate some help finding what I messed up.
I have an atom c2758 board using profile default/linux/amd64/17.1/no-multilib/hardened. This is the latest testing profile, it's a non-production system at the moment.
Clearly something changed from 4.12 to 4.14 and I don't know what it is.
My previous (working) config: https://paste.pound-python.org/show/bSYRIlJ8yJR9WTHj2Gjl/
My next (non-working) config: https://paste.pound-python.org/show/BNIb6Cza1u5WEtKFhqnP/
The difference between them: https://paste.pound-python.org/show/LlvYMXgHmAwU2XkjO9gm/
I recorded the console on startup, and while I can't paste the video I can type a few lines in:
Code: |
# Does some IPMI detection
ACPI: Power Button [PWRF]
Bug: unable to handle kernel NULL pointer dereference at 0000000000000000064
IP: __kmalloc+0xce/0x1d0
PGD 0 P4D 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.8-gentoo-r1-k1 #2
Hardware name: Supermicro A1SRM-LN7/LN5F/A1SRM-LN7F-2758, BIOS 1.0 09/17/2014
task: ffffa102ed8e0000 task.stack: ffffab6d40010000
RIP: 0010:__kmalloc+0xce/0x1d0
RSP: 0000:ffffab6d40013c90 EFLAGS: 00010202
RAX: 00000000000000 RBX: 000000000000064 RCX: 00000000000001af
RDX: 000000000001ae RSI: 000000000000000 RDI: 000000000001d660
RBP: ffffab6d40013cc0 R08: ffffa102ecc98c00 R09: ffffffffa27a8ec5
R10: ffffd721d1af5740 R11: ffffa102ed4c935f R12: 00000000014000c0
R13: 00000000000140 R14: fffa102ed803080 R15: ffffa102ed803080
FS: 00000000000000(0000) GS:ffffa102ffc00000(0000) knlGS: 000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000064 CR3: 00000001b020a000 CR4: 00000000001006f0
Call Trace:
acpi_processor_get_throttling_info+0x445/0x630
__acpi_processor_start+0x83/0x1d0
acpi_processor_start+0x4d/0x60
driver_probe_device+0x25a/0x2f0
__driver_attach+0xaf/0xc0
? driver_probe_device+0x2f0/0x2f0
bus_for_each_dev+0x6d/0xa0
driver_attach+0x2e/0x30
bus_add_driver+0x12f/0x230
? do_early_param+0xa2/0xa2
driver_register+0x70/0xf0
? acpi_video_init+0x9a/0x9a
acpi_processor_driver_init+0x34/0xa8
? acpi_video_init+0x9a/0x9a
do_one_initcall+0x5e/0x1a0
? do_early_param+0xa2/0xa2
kernel_init_freeable+0x179/0x1fc
? rest_init+0xc0/0xc0
kernel_init+0x1e/0x110
ret_from_fork+0x25/0x30
Code: e7 00 00 00 49 63 46 20 49 8b 3e 48 8d 4a 01 49 8b 1c 00 49 8d 00 65 48 0f c7 0f 0f 94 c0 84 c0 74 c6 48 85
db 74 0b 49 63 46 20 <48> 8b 04 03 0f 18 08 41 f7 c4 00 80 00 00 49 8d 18 0f 85 d1 00
RIP: __kmalloc+0x1d0 RSP: fffab6d40013c90
CR2: 000000000000064
---[ end trace 223394f177cfe3e2 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
Kernel Offset: 0x21000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
sched: Unexpected reschedule of offline CPU#4!
-----------------[ cut here ]-------------------
WARNING: CPU: 0 PID: 1 at /usr/src/linux-4.14.8.gentoo-r1/arch/x86/kernel/smp.c: 128 native_smp_send_reschedule+0x47/0x50
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G D 4.14.8-gentoo-r1-k1 #2
blah blah blah.
... |
There is more in that
That exitcode seems to reference missing modules, but I built the modules and installed them before the kernel. My command was:
Code: | mount /boot; make && make modules && make modules_install && make install |
Thanks.
Last edited by 1clue on Fri Feb 09, 2018 6:27 pm; edited 1 time in total |
|
Back to top |
|
|
fedeliallalinea Administrator
Joined: 08 Mar 2003 Posts: 30842 Location: here
|
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Tue Jan 02, 2018 9:03 pm Post subject: |
|
|
That's odd. I don't have any keywords and this kernel just came down this morning. I did an emerge-webrsync too.
I wonder why it came down?
Thanks. |
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Tue Jan 02, 2018 9:06 pm Post subject: |
|
|
And more importantly, why would I want to go all the way back to 4.9? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54119 Location: 56N 3W
|
Posted: Tue Jan 02, 2018 9:12 pm Post subject: |
|
|
1clue,
4.12 no longer gets security patches, so its masked.
4.14 and gentoo gcc-6.4 don't play nicely (Linus is not amused).
That was masked while the investigation was underway.
It turn, that makes 4.9 the current stable. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Tue Jan 02, 2018 9:22 pm Post subject: |
|
|
My 4.12 was compiled before the switch. I think I'll stick it out for awhile rather than go back to the dark ages.
Thanks. |
|
Back to top |
|
|
Fluxie n00b
Joined: 20 Jul 2004 Posts: 6
|
Posted: Thu Jan 04, 2018 7:22 pm Post subject: |
|
|
NeddySeagoon wrote: | 1clue,
4.14 and gentoo gcc-6.4 don't play nicely (Linus is not amused).
That was masked while the investigation was underway.
|
Could you point me where you found this bit of information?
I'm curious because I'm currently running "4.14.10-gentoo-r1" compiled with GCC-6.4.0. This combination does seem stable to me but I would like to be sure. Also I would rather not switch to 4.9 because I have a new AMD processor which doesn't play nicely with 4.9, afaik...
Exact version: "Linux version 4.14.10-gentoo-r1 (root@<masked>i) (gcc version 6.4.0 (Gentoo 6.4.0 p1.1)) #1 SMP Mon Jan 1 12:03:20 EET 2018"
Thanks:) |
|
Back to top |
|
|
asturm Developer
Joined: 05 Apr 2007 Posts: 8933
|
Posted: Thu Jan 04, 2018 7:27 pm Post subject: |
|
|
The issues were caused by hardened patchset. So if your kernel image works, just stick with it. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54119 Location: 56N 3W
|
Posted: Thu Jan 04, 2018 7:48 pm Post subject: |
|
|
Fluxie,
Its on the LKML. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Thu Jan 04, 2018 8:19 pm Post subject: |
|
|
NeddySeagoon wrote: | Fluxie,
Its on the LKML. |
So evidently it's a gentoo-specific issue in the compiler? When can we expect a new compiler? With this spectre/meltdown BS I would like to get on a newer kernel.
And as far as that goes, has anyone found what kernel options we should disable or enable in light of these bugs? I know it's not a complete fix, but I don't want to shoot myself in the foot here. All I can find are some general-public 'we're working on it' crap. I have a shitload of boxes to fix. Pardon my French. |
|
Back to top |
|
|
asturm Developer
Joined: 05 Apr 2007 Posts: 8933
|
Posted: Thu Jan 04, 2018 8:58 pm Post subject: |
|
|
Do you use hardened profile? |
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Thu Jan 04, 2018 9:14 pm Post subject: |
|
|
default/linux/amd64/17.1/no-multilib/hardened |
|
Back to top |
|
|
toralf Developer
Joined: 01 Feb 2004 Posts: 3922 Location: Hamburg
|
Posted: Thu Jan 04, 2018 10:23 pm Post subject: |
|
|
4.14.11 contains the -fno-stack-check quirk, so a stable hardened Gentoo Linux compiles and boots the kernel fine (tested at my hardened server and my hardened client) |
|
Back to top |
|
|
ct85711 Veteran
Joined: 27 Sep 2005 Posts: 1791
|
Posted: Thu Jan 04, 2018 10:51 pm Post subject: |
|
|
From what I recall, towards the end of the message thread in regards to this issue, it sounded like they intend to just straight out strip stack-check and/or force no-stack-check from the compiler flags so that this issue won't be a factor. Though who knows on what versions this change would be done on. |
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21518
|
Posted: Fri Jan 05, 2018 2:52 am Post subject: |
|
|
1clue wrote: | So evidently it's a gentoo-specific issue in the compiler? | No. It's a negative interaction among:- Upstream gcc implements -fstack-check in a way that the kernel developers think is ugly and questionable (but runs correctly for user code).
- -fstack-check will, for certain kernel functions, generate code that breaks the kernel. For other functions, it's suboptimal and possibly wrong in a subtle way, but is not immediately system-breaking. Unfortunately, for the functions that it breaks outright, almost everybody needs those functions to work, so you cannot avoid the problem by being lucky or disabling optional kernel features.
- Hardened Gentoo (not Gentoo in general, but only the hardened profiles) default-enable this feature written by upstream.
1clue wrote: | When can we expect a new compiler? | You don't need a new compiler. You need not to generate user-mode-specific stack probes when compiling the kernel. This can be done by not using a hardened gcc or by passing -fno-stack-check. Per toralf's post two up from mine (and three down from yours), the latest 4.14.x will do this for you. To quote Greg KH, "all users must upgrade."
1clue wrote: | With this spectre/meltdown BS I would like to get on a newer kernel.
And as far as that goes, has anyone found what kernel options we should disable or enable in light of these bugs? I know it's not a complete fix, but I don't want to shoot myself in the foot here. | As for Meltdown and Spectre, you may or may not be in a position to need the KPTI patches. If you have a large number of machines you manage, then you probably have at least some where you allow untrusted users to run unprivileged code. Those machines may need KPTI, depending on exactly how little you trust the users. For a complete fix, you could switch to using unaffected CPUs. Per the reporting I've read, pre-1995 CPUs are unaffected, as are old in-order-only Intel Atom chips. |
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Fri Jan 05, 2018 4:01 am Post subject: |
|
|
@Hu,
Thanks much, that gives me a path. So -fno-stack-check can be done just on the kernel, does not need to be changed in make.conf? I'll re-read the stuff above just in case it's mentioned there and I missed it.
- I have zero hardware which is unaffected. You'd think I would luck out once maybe, statistically speaking.
- I have no Gentoo systems with a gui or which are used by an untrusted user logging into a shell. They're all servers and security appliances and KVM/QEMU which run server VMs.
- I have many more boxes and VMs which are some sort of binary distro. So I'm a bit frazzled right now. Not your problem. I'm waiting on those to see what the distro does.
- My test Gentoo box has QuickAssist. It's an atom c2758. I'm scared to find out what that means with respect to Spectre and Meltdown. Fortunately enough it's been overkill for everything I've configured it for, so a loss of performance is unlikely to matter much.
|
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21518
|
Posted: Fri Jan 05, 2018 4:56 am Post subject: |
|
|
The kernel build does not respect make.conf, so changing it there will not help you. The user packages that respect it do not need it changed, so changing it would be counterproductive. If you want to hand-apply the change for the kernel build, I believe placing the value in $KBUILD_CFLAGS will suffice (but this is from old memory, so it might be wrong; check before relying on it).
For 2, if you don't let untrusted users run arbitrary unprivileged code, your risk is lower. I can't say it's impossible for an untrusted user to leverage the existing programs, but if they have no shell access, no permission to upload programs to run, and no permission to upload scripts to run, they will likely have a very difficult time running code that can exploit these problems due to the need to run specific sequences with tight timing tolerances. The VM hosts could be a problem, if you have untrusted users running in the guests (including, but not necessarily limited to, untrusted users who are authorized to be root on their respective guests). If the VMs are intended for isolation/management/redundancy purposes, rather than security enforcement against untrusted users, then they are probably fine.
I can't usefully comment on the other points. |
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Fri Jan 05, 2018 5:29 am Post subject: |
|
|
Reading this from my phone, I think you've given me what I need.
The vm guests are all servers, mostly non-gui unless it's something like oracle, which wants centos.
At any rate the only people who have interesting access to any of this, either host or guest, are financially driven by the need for these systems to work correctly and have worked with me for 5 years or better. That's still no guarantee but I'll take those odds.
Thanks for your time. I'll post here later with status one way or the other. |
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Mon Jan 08, 2018 6:33 pm Post subject: |
|
|
I'm pretty sure I have a non-working kernel based on some other config option, but I'm not sure if I'm doing no-stack-check right.
Code: |
export KBUILD_CFLAGS='-fno-stack-check'
make clean
mount /boot
make && make modules && make modules_install && make install
grub-mkconfig > /boot/grub/grub.cfg
|
|
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21518
|
Posted: Tue Jan 09, 2018 12:15 am Post subject: |
|
|
That looks right, except that I was wrong about the variable to use, and you did not catch me in it. Looking now at the kernel sources, I think the right variable is $KCFLAGS. Variable KBUILD_CFLAGS is used internally, and may ignore your environment setting.
The simplest option is to use a kernel that sets -fno-stack-check automatically through its build system. |
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Tue Jan 09, 2018 4:44 pm Post subject: |
|
|
Hu wrote: | That looks right, except that I was wrong about the variable to use, and you did not catch me in it. Looking now at the kernel sources, I think the right variable is $KCFLAGS. Variable KBUILD_CFLAGS is used internally, and may ignore your environment setting.
The simplest option is to use a kernel that sets -fno-stack-check automatically through its build system. |
I'm feeling kinda stupid right now. I've been running Linux since the 90s, compiling my own kernels since about 98, and never once passed a kernel build option. My Google kung-fu is broken, getting no results. |
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Tue Jan 09, 2018 6:53 pm Post subject: |
|
|
Still no joy.
Using kernel linux-4.14.8-gentoo-r1, hardened 17.1 profile. I can't tell looking at the 'make' output whether it took the setting or not.
Going to dig deeper into the diff between the new config and the old config, and see if there's some other reason why I bricked my kernel. |
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21518
|
Posted: Wed Jan 10, 2018 3:07 am Post subject: |
|
|
Your technique looked correct, aside from using the variable name I picked without adequate checking. You should export KCFLAGS=-fno-stack-check, rather than export KBUILD_CFLAGS=.... You can check the make output by switching the kernel build to verbose mode. If I recall correctly, that is make V=1 make-target. |
|
Back to top |
|
|
1clue Advocate
Joined: 05 Feb 2006 Posts: 2569
|
Posted: Sat Jan 13, 2018 8:03 pm Post subject: |
|
|
I've verified that the -fno-stack-check is being applied. The kernel still panics during the first second or so of boot. So it's a problem with my config, I did something stupid with the new options when I switched to 4.14.8. |
|
Back to top |
|
|
mimosinnet l33t
Joined: 10 Aug 2006 Posts: 713 Location: Barcelona, Spain
|
Posted: Fri Jan 19, 2018 1:21 pm Post subject: |
|
|
Have you been able to solve it?
I am having a similar issue when moving to 4.14.8 kernel in a hardened box. This is the screenshot of the error. I can boot with SystemRescueCd and this is the kernel created with oldconfig and this is a new kernel config from scratch.
Cheers! _________________ Please add [solved] to the initial post's subject line if you feel your problem is resolved.
Take care of the community answering unanswered posts. |
|
Back to top |
|
|
|