Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
4.12 to 4.14, kernel panic. Solved.
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Tue Jan 02, 2018 8:50 pm    Post subject: 4.12 to 4.14, kernel panic. Solved. Reply with quote

Hi,

I did an update, got the new linux-4.14.8-gentoo-r1 sources and decided to upgrade from 4.12.12-gentoo.

I got a kernel panic early in the boot, no logs are written. I would appreciate some help finding what I messed up.

I have an atom c2758 board using profile default/linux/amd64/17.1/no-multilib/hardened. This is the latest testing profile, it's a non-production system at the moment.

Clearly something changed from 4.12 to 4.14 and I don't know what it is.

My previous (working) config: https://paste.pound-python.org/show/bSYRIlJ8yJR9WTHj2Gjl/

My next (non-working) config: https://paste.pound-python.org/show/BNIb6Cza1u5WEtKFhqnP/

The difference between them: https://paste.pound-python.org/show/LlvYMXgHmAwU2XkjO9gm/

I recorded the console on startup, and while I can't paste the video I can type a few lines in:
Code:

# Does some IPMI detection
ACPI: Power Button [PWRF]
Bug: unable to handle kernel NULL pointer dereference at 0000000000000000064
IP: __kmalloc+0xce/0x1d0
PGD 0 P4D 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.8-gentoo-r1-k1 #2
Hardware name: Supermicro A1SRM-LN7/LN5F/A1SRM-LN7F-2758, BIOS 1.0 09/17/2014
task: ffffa102ed8e0000 task.stack: ffffab6d40010000
RIP: 0010:__kmalloc+0xce/0x1d0
RSP: 0000:ffffab6d40013c90 EFLAGS: 00010202
RAX: 00000000000000 RBX: 000000000000064 RCX: 00000000000001af
RDX: 000000000001ae RSI: 000000000000000 RDI: 000000000001d660
RBP: ffffab6d40013cc0 R08: ffffa102ecc98c00 R09: ffffffffa27a8ec5
R10: ffffd721d1af5740 R11: ffffa102ed4c935f R12: 00000000014000c0
R13: 00000000000140 R14: fffa102ed803080 R15: ffffa102ed803080
FS:  00000000000000(0000) GS:ffffa102ffc00000(0000) knlGS: 000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000064 CR3: 00000001b020a000 CR4: 00000000001006f0
Call Trace:
 acpi_processor_get_throttling_info+0x445/0x630
 __acpi_processor_start+0x83/0x1d0
 acpi_processor_start+0x4d/0x60
 driver_probe_device+0x25a/0x2f0
 __driver_attach+0xaf/0xc0
 ? driver_probe_device+0x2f0/0x2f0
 bus_for_each_dev+0x6d/0xa0
 driver_attach+0x2e/0x30
 bus_add_driver+0x12f/0x230
 ? do_early_param+0xa2/0xa2
 driver_register+0x70/0xf0
 ? acpi_video_init+0x9a/0x9a
 acpi_processor_driver_init+0x34/0xa8
 ? acpi_video_init+0x9a/0x9a
 do_one_initcall+0x5e/0x1a0
 ? do_early_param+0xa2/0xa2
 kernel_init_freeable+0x179/0x1fc
 ? rest_init+0xc0/0xc0
 kernel_init+0x1e/0x110
 ret_from_fork+0x25/0x30
Code: e7 00 00 00 49 63 46 20 49 8b 3e 48 8d 4a 01 49 8b 1c 00 49 8d 00 65 48 0f c7 0f 0f 94 c0 84 c0 74 c6 48 85
db 74 0b 49 63 46 20 <48> 8b 04 03 0f 18 08 41 f7 c4 00 80 00 00 49 8d 18 0f 85 d1 00
RIP: __kmalloc+0x1d0 RSP: fffab6d40013c90
CR2: 000000000000064
---[ end trace 223394f177cfe3e2 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

Kernel Offset: 0x21000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

sched: Unexpected reschedule of offline CPU#4!
-----------------[ cut here ]-------------------
WARNING: CPU: 0 PID: 1 at /usr/src/linux-4.14.8.gentoo-r1/arch/x86/kernel/smp.c: 128 native_smp_send_reschedule+0x47/0x50
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G       D             4.14.8-gentoo-r1-k1 #2
blah blah blah.
...


There is more in that
That exitcode seems to reference missing modules, but I built the modules and installed them before the kernel. My command was:

Code:
mount /boot; make && make modules && make modules_install && make install


Thanks.


Last edited by 1clue on Fri Feb 09, 2018 6:27 pm; edited 1 time in total
Back to top
View user's profile Send private message
fedeliallalinea
Bodhisattva
Bodhisattva


Joined: 08 Mar 2003
Posts: 18261
Location: here

PostPosted: Tue Jan 02, 2018 8:57 pm    Post subject: Reply with quote

Can be related to this?

Reference:
https://forums.gentoo.org/viewtopic-t-1074646.html
_________________
Questions are guaranteed in life; Answers aren't.
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Tue Jan 02, 2018 9:03 pm    Post subject: Reply with quote

That's odd. I don't have any keywords and this kernel just came down this morning. I did an emerge-webrsync too.

I wonder why it came down?

Thanks.
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Tue Jan 02, 2018 9:06 pm    Post subject: Reply with quote

And more importantly, why would I want to go all the way back to 4.9?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 40479
Location: 56N 3W

PostPosted: Tue Jan 02, 2018 9:12 pm    Post subject: Reply with quote

1clue,

4.12 no longer gets security patches, so its masked.
4.14 and gentoo gcc-6.4 don't play nicely (Linus is not amused).
That was masked while the investigation was underway.

It turn, that makes 4.9 the current stable.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Tue Jan 02, 2018 9:22 pm    Post subject: Reply with quote

My 4.12 was compiled before the switch. I think I'll stick it out for awhile rather than go back to the dark ages.

Thanks.
Back to top
View user's profile Send private message
Fluxie
n00b
n00b


Joined: 20 Jul 2004
Posts: 6

PostPosted: Thu Jan 04, 2018 7:22 pm    Post subject: Reply with quote

NeddySeagoon wrote:
1clue,
4.14 and gentoo gcc-6.4 don't play nicely (Linus is not amused).
That was masked while the investigation was underway.


Could you point me where you found this bit of information?

I'm curious because I'm currently running "4.14.10-gentoo-r1" compiled with GCC-6.4.0. This combination does seem stable to me but I would like to be sure. Also I would rather not switch to 4.9 because I have a new AMD processor which doesn't play nicely with 4.9, afaik...

Exact version: "Linux version 4.14.10-gentoo-r1 (root@<masked>i) (gcc version 6.4.0 (Gentoo 6.4.0 p1.1)) #1 SMP Mon Jan 1 12:03:20 EET 2018"

Thanks:)
Back to top
View user's profile Send private message
asturm
Developer
Developer


Joined: 05 Apr 2007
Posts: 6135
Location: Austria

PostPosted: Thu Jan 04, 2018 7:27 pm    Post subject: Reply with quote

The issues were caused by hardened patchset. So if your kernel image works, just stick with it.
_________________
backend.cpp:92:2: warning: #warning TODO - this error message is about as useful as a cooling unit in the arctic
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 40479
Location: 56N 3W

PostPosted: Thu Jan 04, 2018 7:48 pm    Post subject: Reply with quote

Fluxie,

Its on the LKML.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Thu Jan 04, 2018 8:19 pm    Post subject: Reply with quote

NeddySeagoon wrote:
Fluxie,

Its on the LKML.


So evidently it's a gentoo-specific issue in the compiler? When can we expect a new compiler? With this spectre/meltdown BS I would like to get on a newer kernel.

And as far as that goes, has anyone found what kernel options we should disable or enable in light of these bugs? I know it's not a complete fix, but I don't want to shoot myself in the foot here. All I can find are some general-public 'we're working on it' crap. I have a shitload of boxes to fix. Pardon my French.
Back to top
View user's profile Send private message
asturm
Developer
Developer


Joined: 05 Apr 2007
Posts: 6135
Location: Austria

PostPosted: Thu Jan 04, 2018 8:58 pm    Post subject: Reply with quote

Do you use hardened profile?
_________________
backend.cpp:92:2: warning: #warning TODO - this error message is about as useful as a cooling unit in the arctic
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Thu Jan 04, 2018 9:14 pm    Post subject: Reply with quote

default/linux/amd64/17.1/no-multilib/hardened
Back to top
View user's profile Send private message
toralf
Developer
Developer


Joined: 01 Feb 2004
Posts: 3537
Location: Hamburg

PostPosted: Thu Jan 04, 2018 10:23 pm    Post subject: Reply with quote

4.14.11 contains the -fno-stack-check quirk, so a stable hardened Gentoo Linux compiles and boots the kernel fine (tested at my hardened server and my hardened client)
Back to top
View user's profile Send private message
ct85711
Veteran
Veteran


Joined: 27 Sep 2005
Posts: 1526

PostPosted: Thu Jan 04, 2018 10:51 pm    Post subject: Reply with quote

From what I recall, towards the end of the message thread in regards to this issue, it sounded like they intend to just straight out strip stack-check and/or force no-stack-check from the compiler flags so that this issue won't be a factor. Though who knows on what versions this change would be done on.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 12121

PostPosted: Fri Jan 05, 2018 2:52 am    Post subject: Reply with quote

1clue wrote:
So evidently it's a gentoo-specific issue in the compiler?
No. It's a negative interaction among:
  • Upstream gcc implements -fstack-check in a way that the kernel developers think is ugly and questionable (but runs correctly for user code).
  • -fstack-check will, for certain kernel functions, generate code that breaks the kernel. For other functions, it's suboptimal and possibly wrong in a subtle way, but is not immediately system-breaking. Unfortunately, for the functions that it breaks outright, almost everybody needs those functions to work, so you cannot avoid the problem by being lucky or disabling optional kernel features.
  • Hardened Gentoo (not Gentoo in general, but only the hardened profiles) default-enable this feature written by upstream.
1clue wrote:
When can we expect a new compiler?
You don't need a new compiler. You need not to generate user-mode-specific stack probes when compiling the kernel. This can be done by not using a hardened gcc or by passing -fno-stack-check. Per toralf's post two up from mine (and three down from yours), the latest 4.14.x will do this for you. To quote Greg KH, "all users must upgrade." ;)
1clue wrote:
With this spectre/meltdown BS I would like to get on a newer kernel.

And as far as that goes, has anyone found what kernel options we should disable or enable in light of these bugs? I know it's not a complete fix, but I don't want to shoot myself in the foot here.
As for Meltdown and Spectre, you may or may not be in a position to need the KPTI patches. If you have a large number of machines you manage, then you probably have at least some where you allow untrusted users to run unprivileged code. Those machines may need KPTI, depending on exactly how little you trust the users. For a complete fix, you could switch to using unaffected CPUs. Per the reporting I've read, pre-1995 CPUs are unaffected, as are old in-order-only Intel Atom chips. ;)
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Fri Jan 05, 2018 4:01 am    Post subject: Reply with quote

@Hu,

Thanks much, that gives me a path. So -fno-stack-check can be done just on the kernel, does not need to be changed in make.conf? I'll re-read the stuff above just in case it's mentioned there and I missed it.

  1. I have zero hardware which is unaffected. You'd think I would luck out once maybe, statistically speaking.
  2. I have no Gentoo systems with a gui or which are used by an untrusted user logging into a shell. They're all servers and security appliances and KVM/QEMU which run server VMs.
  3. I have many more boxes and VMs which are some sort of binary distro. So I'm a bit frazzled right now. Not your problem. I'm waiting on those to see what the distro does.
  4. My test Gentoo box has QuickAssist. It's an atom c2758. I'm scared to find out what that means with respect to Spectre and Meltdown. Fortunately enough it's been overkill for everything I've configured it for, so a loss of performance is unlikely to matter much.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 12121

PostPosted: Fri Jan 05, 2018 4:56 am    Post subject: Reply with quote

The kernel build does not respect make.conf, so changing it there will not help you. The user packages that respect it do not need it changed, so changing it would be counterproductive. If you want to hand-apply the change for the kernel build, I believe placing the value in $KBUILD_CFLAGS will suffice (but this is from old memory, so it might be wrong; check before relying on it).

For 2, if you don't let untrusted users run arbitrary unprivileged code, your risk is lower. I can't say it's impossible for an untrusted user to leverage the existing programs, but if they have no shell access, no permission to upload programs to run, and no permission to upload scripts to run, they will likely have a very difficult time running code that can exploit these problems due to the need to run specific sequences with tight timing tolerances. The VM hosts could be a problem, if you have untrusted users running in the guests (including, but not necessarily limited to, untrusted users who are authorized to be root on their respective guests). If the VMs are intended for isolation/management/redundancy purposes, rather than security enforcement against untrusted users, then they are probably fine.

I can't usefully comment on the other points.
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Fri Jan 05, 2018 5:29 am    Post subject: Reply with quote

Reading this from my phone, I think you've given me what I need.

The vm guests are all servers, mostly non-gui unless it's something like oracle, which wants centos.

At any rate the only people who have interesting access to any of this, either host or guest, are financially driven by the need for these systems to work correctly and have worked with me for 5 years or better. That's still no guarantee but I'll take those odds.

Thanks for your time. I'll post here later with status one way or the other.
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Mon Jan 08, 2018 6:33 pm    Post subject: Reply with quote

I'm pretty sure I have a non-working kernel based on some other config option, but I'm not sure if I'm doing no-stack-check right.

Code:

export KBUILD_CFLAGS='-fno-stack-check'
make clean
mount /boot
make && make modules && make modules_install && make install
grub-mkconfig > /boot/grub/grub.cfg
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 12121

PostPosted: Tue Jan 09, 2018 12:15 am    Post subject: Reply with quote

That looks right, except that I was wrong about the variable to use, and you did not catch me in it. Looking now at the kernel sources, I think the right variable is $KCFLAGS. Variable KBUILD_CFLAGS is used internally, and may ignore your environment setting.

The simplest option is to use a kernel that sets -fno-stack-check automatically through its build system.
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Tue Jan 09, 2018 4:44 pm    Post subject: Reply with quote

Hu wrote:
That looks right, except that I was wrong about the variable to use, and you did not catch me in it. Looking now at the kernel sources, I think the right variable is $KCFLAGS. Variable KBUILD_CFLAGS is used internally, and may ignore your environment setting.

The simplest option is to use a kernel that sets -fno-stack-check automatically through its build system.


I'm feeling kinda stupid right now. I've been running Linux since the 90s, compiling my own kernels since about 98, and never once passed a kernel build option. My Google kung-fu is broken, getting no results.
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Tue Jan 09, 2018 6:53 pm    Post subject: Reply with quote

Still no joy.

Using kernel linux-4.14.8-gentoo-r1, hardened 17.1 profile. I can't tell looking at the 'make' output whether it took the setting or not.

Going to dig deeper into the diff between the new config and the old config, and see if there's some other reason why I bricked my kernel.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 12121

PostPosted: Wed Jan 10, 2018 3:07 am    Post subject: Reply with quote

Your technique looked correct, aside from using the variable name I picked without adequate checking. You should export KCFLAGS=-fno-stack-check, rather than export KBUILD_CFLAGS=.... You can check the make output by switching the kernel build to verbose mode. If I recall correctly, that is make V=1 make-target.
Back to top
View user's profile Send private message
1clue
Advocate
Advocate


Joined: 05 Feb 2006
Posts: 2247

PostPosted: Sat Jan 13, 2018 8:03 pm    Post subject: Reply with quote

I've verified that the -fno-stack-check is being applied. The kernel still panics during the first second or so of boot. So it's a problem with my config, I did something stupid with the new options when I switched to 4.14.8.
Back to top
View user's profile Send private message
mimosinnet
l33t
l33t


Joined: 10 Aug 2006
Posts: 640
Location: Barcelona, Spain

PostPosted: Fri Jan 19, 2018 1:21 pm    Post subject: Reply with quote

Have you been able to solve it?

I am having a similar issue when moving to 4.14.8 kernel in a hardened box. This is the screenshot of the error. I can boot with SystemRescueCd and this is the kernel created with oldconfig and this is a new kernel config from scratch.

Cheers!
_________________
Please add [solved] to the initial post's subject line if you feel your problem is resolved.
Take care of the community answering unanswered posts.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum