What would cause kernel taints

albright · Posted: Sun Sep 07, 2014 12:56 pm Post subject: What would cause kernel taints

doing an emerge this morning, failed with these log messages (see below):

Is this failing hardware? Memory? Motherboard?

EDIT: system would not reboot save via sys-rq-b

It's a bit worrying

eccerr0r · Posted: Sun Sep 07, 2014 1:22 pm Post subject:

Most common cause of tainted dumps: P: Proprietary kernel modules inserted (Nvidia, ati-drivers, compaq raid array, etc.)

However you have other, more serious problems: D means the kernel panicked earlier and tried to continue onwards. So there was another oops earlier than this. W means there was an earlier warning. And O means you built an out of kernel module you insmodded into the kernel. Kernel debuggers don't like P and O flags as they don't know what they may be dealing with, and D / W flags could mean secondary corruption.

You'll need to find the first oops and debug that first. Debugging second corruptions tend to be fruitless as they may have been caused by the first problem.

Judging by this oops, you need a reboot badly, your kernel is in very bad shape right now.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

N8Fear · Posted: Sun Sep 07, 2014 1:24 pm Post subject:

Tainted in conjunction with the kernel means that there are non-GPL modules loaded like i.e. nvidia or zfs or virtual box drivers.
This (likely) hasn't got to do anything with the call trace you get.
Can you reproduce this issue (by e.g. loading a certain module or by running a certain program)?

albright · Posted: Sun Sep 07, 2014 1:37 pm Post subject:

thanks for the replies

If I look throught /var/log/messages-2014* I see these from
yesterday:

Hu · Moderator Joined: 06 Mar 2007 Posts: 21635

N8Fear is inaccurate. Refer to eccerr0r's post instead, since there are ways to get a tainted kernel even without the ability to load modules.

OP: what kernel modules do you load on this system? We should start with accounting for why you have P+O, then deal with the warnings if those are independent of using the out-of-tree modules.

albright · Posted: Sun Sep 07, 2014 5:14 pm Post subject:

I think the only proprietary module is nvidia

Here's the full list:

Hu · Moderator Joined: 06 Mar 2007 Posts: 21635

If you blacklist the nVidia module and reboot, can you reproduce any of the failures?

eccerr0r · Posted: Mon Sep 08, 2014 8:41 pm Post subject:

While I highly doubt nvidia is causing the problem but as stated above, yes, it would make it much better to take this variable out of the equation hence removing it is a good idea to test. The reason being, if nvidia-driver had a function call "wipe_out_random_memory_location(x)" and due to closed source we don't see it, this truly is the problem and not whatever the oops indicates.

As stated earlier a WARNING could cause taint. Do you see WARNING (in all caps) show up in your kernel logfiles?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

albright · Posted: Mon Sep 08, 2014 9:35 pm Post subject:

No errors have occurred in the last 30 hours or so.

I have a suspicion that the last error was caused when I plugged
in a bad usb drive (hence the khubd error). It was the same drive
that started the problem in the first place, which I had put in a
usb case to see if I could recover anything. The drive was unreadable ...

Since then the system has been running perfectly.

If the problem recurs, I'll try with the nvidia module blacklisted.
_________________
.... there is nothing - absolutely nothing - half so much worth
doing as simply messing about with Linux ...
(apologies to Kenneth Graeme)

eccerr0r · Posted: Mon Sep 08, 2014 10:46 pm Post subject:

Something really bad must have happened to get the "D" taint. Perhaps that first W caused death, don't know.

Kind of funny, these taints are all just to help out the LKML and debuggers know whether to start looking at a problem. It looks like many of the flags were added post proprietary modules. Though you don't have it, I'm curious of the "S" taint - where the kernel detects SMP incompatible CPUs installed - I had been running a dual Celeron machine in the past that should qualify for the "S" taint.

I've never had hard drives in recent times cause system freeze-ups - they cause slow downs from retrys and I/O errors when they get offlined for me. You may have to look into other hardware issues, most of the system freezes I've had were due to bad motherboard devices.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

albright · Posted: Tue Sep 09, 2014 1:07 am Post subject: