Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
BUG message during boot - Kernel or Nvidia?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Holysword
l33t
l33t


Joined: 19 Nov 2006
Posts: 946
Location: Greece

PostPosted: Sun Apr 20, 2014 5:51 pm    Post subject: BUG message during boot - Kernel or Nvidia? Reply with quote

Hello there,

Recently I've been experiencing weird issues during boot process. The OpenRC goes on normally until it reaches the "initializing uevents" thing, and then it glitches horribly, the screen goes black and then it comes back without colors and on the same format as dmesg, rather than the OpenRC startup standard. Then after a few seconds it starts the OpenRC again and finishes initializing everything.

Then it gives me the username/password screen (I don't have a login manager) but while I am trying to type the username and the password, it keeps overriding the screen with information from ilwifi module! After few seconds it stops, and then I can log in. So I log in and try to start X - only twm, nothing else. It shows a black screen with the cursor on the top left corner (just the "_") still and that's it; crashes, I cannot switch TTY, I cannot do anything other than hard-rebooting. When I check Xorg.0.log, it is empty - completely blank. Now, the worst part - all of this is random. It happens sometimes, and then only way to actually get a working computer is to keep restarting until it does not happen. It only works when the first glitch (after initializing uevents part) does not happen. I found this in my dmesg log, dunno if it gives any clues:

Code:
[    9.377864] systemd-udevd[1118]: renamed network interface wlan0 to wlo1
[    9.377866] BUG: unable to handle kernel NULL pointer dereference at           (null)
[    9.377870] IP: [<ffffffff8152aa19>] __down+0x3c/0x93
[    9.377871] PGD 446694067 PUD 446770067 PMD 0
[    9.377873] Oops: 0002 [#1] PREEMPT SMP
[    9.377886] Modules linked in: nvidia(PO+) iwldvm mac80211 i915(+) uvcvideo btusb hp_accel videobuf2_vmalloc lis3lv02d videobuf2_memops intel_agp hid_ortek videobuf2_core videodev media input_polldev video snd_hda_codec_idt snd_hda_intel bluetooth iwlwifi psmouse r8169 mii intel_gtt snd_hda_codec thermal ac fan cfg80211 rfkill x86_pkg_temp_thermal battery coretemp processor snd_hwdep snd_pcm wmi snd_page_alloc snd_timer snd soundcore i2c_i801 drm_kms_helper button lpc_ich mfd_core thermal_sys hwmon efivarfs
[    9.377888] CPU: 0 PID: 1169 Comm: nvidia-smi Tainted: P          IO 3.13.6-gentoo #1
[    9.377888] Hardware name: Hewlett-Packard HP ENVY TS m7 Notebook PC/1966, BIOS F.1C 06/07/2013
[    9.377889] task: ffff88044a5140d0 ti: ffff8804474ce000 task.ti: ffff8804474ce000
[    9.377891] RIP: 0010:[<ffffffff8152aa19>]  [<ffffffff8152aa19>] __down+0x3c/0x93
[    9.377892] RSP: 0018:ffff8804474cfbd0  EFLAGS: 00010082
[    9.377893] RAX: 0000000000000000 RBX: 7fffffffffffffff RCX: 0000000000000000
[    9.377893] RDX: ffffffffa0fab6a0 RSI: ffffffffa0d35cb5 RDI: ffffffffa0fab698
[    9.377894] RBP: ffff8804474cfc10 R08: 0000000000000000 R09: 0000000000000000
[    9.377894] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffa0fab698
[    9.377894] R13: ffff88044a5140d0 R14: ffff880445fab2f0 R15: 00000000000000ff
[    9.377895] FS:  00007f76e2b63700(0000) GS:ffff88045f200000(0000) knlGS:0000000000000000
[    9.377896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.377897] CR2: 0000000000000000 CR3: 0000000445fad000 CR4: 00000000001407f0
[    9.377897] Stack:
[    9.377898]  ffffffffa0fab6a0 0000000000000000 00000000000000d0 0000000000000246
[    9.377900]  ffff880449dc6a80 ffffffffa0fab698 ffff88044640c088 ffff880449fef480
[    9.377901]  ffff880447469980 ffffffff81073ad6 0000000000000282 ffff880447469980
[    9.377901] Call Trace:
[    9.377904]  [<ffffffff81073ad6>] ? down+0x36/0x40
[    9.377957]  [<ffffffffa0b70c53>] ? nvidia_open+0x453/0x920 [nvidia]
[    9.377959]  [<ffffffff8139ed8a>] ? kobj_lookup+0x10a/0x170
[    9.378004]  [<ffffffffa0b7ac4f>] ? nvidia_frontend_open+0x3f/0x90 [nvidia]
[    9.378006]  [<ffffffff8110f196>] ? chrdev_open+0x96/0x1c0
[    9.378008]  [<ffffffff8110f100>] ? cdev_put+0x30/0x30
[    9.378010]  [<ffffffff81109092>] ? do_dentry_open+0x1a2/0x2a0
[    9.378011]  [<ffffffff811095e8>] ? finish_open+0x28/0x40
[    9.378013]  [<ffffffff81117f33>] ? do_last.isra.52+0x4a3/0xc70
[    9.378014]  [<ffffffff811187b2>] ? path_openat+0xb2/0x660
[    9.378016]  [<ffffffff810cb8bf>] ? shmem_xattr_validate+0x8f/0xd0
[    9.378017]  [<ffffffff81119d15>] ? do_filp_open+0x35/0x80
[    9.378019]  [<ffffffff81109553>] ? chown_common.isra.15+0x83/0xf0
[    9.378021]  [<ffffffff8152b9be>] ? _raw_spin_lock+0xe/0x40
[    9.378022]  [<ffffffff8152ba8e>] ? _raw_spin_unlock+0xe/0x30
[    9.378024]  [<ffffffff81125847>] ? __alloc_fd+0x97/0x120
[    9.378026]  [<ffffffff8110a606>] ? do_sys_open+0x126/0x210
[    9.378027]  [<ffffffff8152c926>] ? system_call_fastpath+0x1a/0x1f
[    9.378038] Code: bb ff ff ff ff ff ff ff 7f 48 83 e4 f0 48 83 ec 20 48 8b 47 10 48 89 14 24 65 4c 8b 2c 25 80 b8 00 00 48 89 67 10 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 c6 44 24 18 00 4c 89 e7 49 c7 45 00 02
[    9.378039] RIP  [<ffffffff8152aa19>] __down+0x3c/0x93
[    9.378040]  RSP <ffff8804474cfbd0>
[    9.378040] CR2: 0000000000000000
[    9.378041] ---[ end trace 8422aa58f6fd8a32 ]---
[    9.378043] note: nvidia-smi[1169] exited with preempt_count 1
[    9.767193] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[    9.793133] Console: switching to colour frame buffer device 240x67


What could that be? It smells like a problem with my nvidia driver or perhaps some options in the kernel. I have been messing aroudn with some options to try to get my bluetooth headset to work properly, I might have screwed up things there...?

My full dmesg log
My .config file

Thank you in advance
_________________
"Nolite arbitrari quia venerim mittere pacem in terram non veni pacem mittere sed gladium" (Yeshua Ha Mashiach)
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21490

PostPosted: Sun Apr 20, 2014 6:22 pm    Post subject: Reply with quote

Based on the callstack, that looks like a bug in the nVidia driver. Can you reproduce the problem on an untainted kernel?
Back to top
View user's profile Send private message
Holysword
l33t
l33t


Joined: 19 Nov 2006
Posts: 946
Location: Greece

PostPosted: Mon Apr 21, 2014 2:38 am    Post subject: Reply with quote

Hu wrote:
Based on the callstack, that looks like a bug in the nVidia driver. Can you reproduce the problem on an untainted kernel?

What do you mean with untainted kernel? I can try it if you give me a link with instructions.
Thanks.
_________________
"Nolite arbitrari quia venerim mittere pacem in terram non veni pacem mittere sed gladium" (Yeshua Ha Mashiach)
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21490

PostPosted: Mon Apr 21, 2014 2:53 am    Post subject: Reply with quote

A kernel becomes tainted when it loads any out-of-tree or proprietary modules, such as nvidia.ko. To use an untainted kernel, reboot and do not load any modules which would taint it.
Back to top
View user's profile Send private message
mir3x
Guru
Guru


Joined: 02 Jun 2012
Posts: 455

PostPosted: Mon Apr 21, 2014 4:52 pm    Post subject: Reply with quote

U wrote OpenRC N-times ... and then systemd jumps on logs ?? Maybe thats a problem

([ 9.377864] systemd-udevd[1118]: renamed network interface wlan0 to wlo1)
_________________
Sent from Windows
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Mon Apr 21, 2014 4:57 pm    Post subject: Reply with quote

mir3x wrote:
U wrote OpenRC N-times ... and then systemd jumps on logs ?? Maybe thats a problem

([ 9.377864] systemd-udevd[1118]: renamed network interface wlan0 to wlo1)

Your read buffer seems to have been truncated 6 chars early... try again.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21490

PostPosted: Tue Apr 22, 2014 1:44 am    Post subject: Reply with quote

mir3x wrote:
U wrote OpenRC N-times ... and then systemd jumps on logs ?? Maybe thats a problem
Although systemd has and causes many problems, it is irrelevant here. OP's bug stack clearly shows that an nVidia related configuration utility ran, attempted to access a character device managed by the nVidia proprietary driver, and the handler for that device then triggered a BUG event. The only way I can see to blame this on systemd is if the nVidia driver reacts badly to acquiring resources while systemd is busy renaming network interfaces. This seems unlikely.
Back to top
View user's profile Send private message
Holysword
l33t
l33t


Joined: 19 Nov 2006
Posts: 946
Location: Greece

PostPosted: Wed Apr 23, 2014 1:54 am    Post subject: Reply with quote

Hu wrote:
A kernel becomes tainted when it loads any out-of-tree or proprietary modules, such as nvidia.ko. To use an untainted kernel, reboot and do not load any modules which would taint it.

It is a bit hard to tell since it was random; it would happen very often, but not always. Anyway, I uninstalled nvidia-drivers-337.12 and rebooted several times, and it did not occur once. Then I installed version 334.21-r3 and rebooted some few times, also did not occur once. I do believe it is a problem with 337 driver then...
_________________
"Nolite arbitrari quia venerim mittere pacem in terram non veni pacem mittere sed gladium" (Yeshua Ha Mashiach)
Back to top
View user's profile Send private message
wvmmhxkh
n00b
n00b


Joined: 26 Feb 2013
Posts: 5

PostPosted: Wed May 07, 2014 1:18 pm    Post subject: Reply with quote

same thing here, both on 337.12 and .19

Code:

[    1.344957] nvidia: module license 'NVIDIA' taints kernel.
[    1.344958] Disabling lock debugging due to kernel taint
[    1.346458] hub 2-1:1.0: port 5 not reset yet, waiting 10ms
[    1.357727] BUG: unable to handle kernel NULL pointer dereference at           (null)
[    1.362264] IP: [<ffffffff814478e7>] __down_common+0x4e/0xe9
[    1.366632] PGD 2140e5067 PUD 2140e4067 PMD 0
[    1.371013] Oops: 0002 [#1] SMP
[    1.375360] Modules linked in: nvidia(PO+) snd_hda_intel(+) snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd
[    1.380027] CPU: 0 PID: 1096 Comm: nvidia-smi Tainted: P           O 3.12.13-gentoo #6
[    1.384733] Hardware name: System manufacturer System Product Name/P8Z77-V LX, BIOS 2303 12/05/2013
[    1.389509] task: ffff880215b42ae0 ti: ffff88021406e000 task.ti: ffff88021406e000
[    1.394282] RIP: 0010:[<ffffffff814478e7>]  [<ffffffff814478e7>] __down_common+0x4e/0xe9
[    1.399122] RSP: 0018:ffff88021406fb88  EFLAGS: 00010096
[    1.403916] RAX: 0000000000000000 RBX: ffffffffa0a54fc0 RCX: 0000000000000000
[    1.408714] RDX: ffff88021406fb88 RSI: 0000000000000002 RDI: ffffffffa0a54fc0
[    1.409526] usb 2-1.5: new low-speed USB device number 3 using ehci-pci
[    1.418360] RBP: 7fffffffffffffff R08: 0000000000018860 R09: 0000000000000000
[    1.420537] hub 2-1:1.0: port 5 not reset yet, waiting 10ms
[    1.428250] R10: 000000000000000b R11: ffffffffffffffd6 R12: ffff880215b42ae0
[    1.433248] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
[    1.438204] FS:  00007fee5297c700(0000) GS:ffff88021ec00000(0000) knlGS:0000000000000000
[    1.443216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.448174] CR2: 0000000000000000 CR3: 000000021591c000 CR4: 00000000001407f0
[    1.453133] Stack:
[    1.457968]  ffffffffa0a54fc8 0000000000000000 ffff88021ec14560 0000000000000001
[    1.463030]  0000000000000000 ffffffffa0a54fc0 ffff8800d9423100 ffff880213908000
[    1.468099]  ffff8800d97237a0 ffff8800d9423300 00000000000000ff ffffffff81063337
[    1.473133] Call Trace:
[    1.478125]  [<ffffffff81063337>] ? down+0x37/0x40
[    1.483138]  [<ffffffffa0661b73>] ? nvidia_open+0x563/0x8e0 [nvidia]
[    1.488118]  [<ffffffff810ef1dc>] ? exact_lock+0xc/0x20
[    1.493057]  [<ffffffff81296f82>] ? kobj_lookup+0x102/0x150
[    1.496766] usb 2-1.5: skipped 1 descriptor after interface
[    1.497136] usb 2-1.5: default language 0x0409
[    1.498616] usb 2-1.5: udev 3, busnum 2, minor = 130
[    1.498617] usb 2-1.5: New USB device found, idVendor=0458, idProduct=003a
[    1.498617] usb 2-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    1.498618] usb 2-1.5: Product: Optical Mouse
[    1.498618] usb 2-1.5: Manufacturer: Genius
[    1.498661] usb 2-1.5: usb_probe_device
[    1.498662] usb 2-1.5: configuration #1 chosen from 1 choice
[    1.499997] usb 2-1.5: adding 2-1.5:1.0 (config #1, interface 0)
[    1.500010] usbhid 2-1.5:1.0: usb_probe_interface
[    1.500011] usbhid 2-1.5:1.0: usb_probe_interface - got id
[    1.501958] input: Genius Optical Mouse as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.5/2-1.5:1.0/input/input1
[    1.502005] hid-generic 0003:0458:003A.0001: input,hidraw0: USB HID v1.10 Mouse [Genius Optical Mouse] on usb-0000:00:1d.0-1.5/input0
[    1.502018] hub 2-1:1.0: state 7 ports 8 chg 0000 evt 0020
[    1.574407]  [<ffffffffa066bc7d>] ? nvidia_frontend_open+0x4d/0xa0 [nvidia]
[    1.579905]  [<ffffffff810efa25>] ? chrdev_open+0x95/0x1a0
[    1.585390]  [<ffffffff810ef990>] ? cdev_put+0x30/0x30
[    1.590825]  [<ffffffff810e92fe>] ? do_dentry_open.isra.16+0x1ee/0x280
[    1.596276]  [<ffffffff810e93a5>] ? finish_open+0x15/0x20
[    1.601661]  [<ffffffff810f9ae1>] ? do_last.isra.72+0x7c1/0xd40
[    1.607002]  [<ffffffff810f6878>] ? link_path_walk+0x68/0x830
[    1.612284]  [<ffffffff810fa12c>] ? path_openat+0xcc/0x5f0
[    1.617524]  [<ffffffff81104e3c>] ? inode_change_ok+0x8c/0x180
[    1.622773]  [<ffffffff810faae5>] ? do_filp_open+0x45/0xb0
[    1.627882]  [<ffffffff811060f2>] ? __alloc_fd+0x42/0x110
[    1.632845]  [<ffffffff810ea700>] ? do_sys_open+0x140/0x230
[    1.637712]  [<ffffffff8144a762>] ? system_call_fastpath+0x16/0x1b
[    1.642496] Code: fb 48 83 ec 28 48 8b 47 10 48 8d 14 24 48 89 57 10 48 8d 57 08 48 89 14 24 48 8d 14 24 65 4c 8b 24 25 40 b8 00 00 48 89 44 24 08 <48> 89 10 4c 89 64 24 10 c6 44 24 18 00 4d 85 f6 74 5e 49 8b 44
[    1.652769] RIP  [<ffffffff814478e7>] __down_common+0x4e/0xe9
[    1.657732]  RSP <ffff88021406fb88>
[    1.662633] CR2: 0000000000000000
[    1.667443] ---[ end trace a5b398f35f50896a ]---
[    1.916166] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none
[    2.143544] Switched to clocksource tsc
[   30.794085] EXT4-fs (sda2): re-mounted. Opts: discard
[   30.893166] EXT4-fs (sda1): mounted filesystem without journal. Opts: discard
[   30.931112] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[   30.960667] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
Back to top
View user's profile Send private message
glv
n00b
n00b


Joined: 08 May 2014
Posts: 1

PostPosted: Thu May 08, 2014 3:45 pm    Post subject: Reply with quote

I have the same problem with version 337.12 and 337.19 of nvidia-drivers (kernel 3.14.1).

It seems to be caused by the nvidia-smi program called by the udev rule 99-nvidia.rules.

There's a workaround this bug (I found it there: https://bugs.gentoo.org/show_bug.cgi?id=504326).
The idea is to supersede the udev rule and not call nvidia-smi.

- Copy /lib/udev/rules.d/99-nvidia.rules to /etc/udev/rules.d/99-nvidia.rules
- Edit /etc/udev/rules.d/99-nvidia.rules and comment the first line:
#ACTION=="add", DEVPATH=="/module/nvidia", SUBSYSTEM=="module", RUN+="nvidia-udev.sh $env{ACTION}"
- Reboot and the kernel OOPS should not appear anymore
Back to top
View user's profile Send private message
Holysword
l33t
l33t


Joined: 19 Nov 2006
Posts: 946
Location: Greece

PostPosted: Fri May 09, 2014 5:43 pm    Post subject: Reply with quote

glv wrote:
I have the same problem with version 337.12 and 337.19 of nvidia-drivers (kernel 3.14.1).

It seems to be caused by the nvidia-smi program called by the udev rule 99-nvidia.rules.

There's a workaround this bug (I found it there: https://bugs.gentoo.org/show_bug.cgi?id=504326).
The idea is to supersede the udev rule and not call nvidia-smi.

- Copy /lib/udev/rules.d/99-nvidia.rules to /etc/udev/rules.d/99-nvidia.rules
- Edit /etc/udev/rules.d/99-nvidia.rules and comment the first line:
#ACTION=="add", DEVPATH=="/module/nvidia", SUBSYSTEM=="module", RUN+="nvidia-udev.sh $env{ACTION}"
- Reboot and the kernel OOPS should not appear anymore

Thank you for letting us know! I will test as soon as possible, but for the moment I simply downgraded nvidia driver. Is there anything outstanding with the newest version?
_________________
"Nolite arbitrari quia venerim mittere pacem in terram non veni pacem mittere sed gladium" (Yeshua Ha Mashiach)
Back to top
View user's profile Send private message
StevePER
n00b
n00b


Joined: 28 Jan 2004
Posts: 50
Location: Perth, Australia

PostPosted: Sat Jun 21, 2014 1:19 pm    Post subject: Reply with quote

I just started getting the same problem after upgrading to kernel 3.12.21-gentoo-r1 and nvidia-drivers 337-25. However the workaround isn't working. Downgrading to 334.21-r3 fixes it.
Back to top
View user's profile Send private message
poolshrk
n00b
n00b


Joined: 26 Apr 2007
Posts: 21

PostPosted: Wed Jun 25, 2014 9:01 pm    Post subject: Reply with quote

glv wrote:
I have the same problem with version 337.12 and 337.19 of nvidia-drivers (kernel 3.14.1).

It seems to be caused by the nvidia-smi program called by the udev rule 99-nvidia.rules.

There's a workaround this bug (I found it there: https://bugs.gentoo.org/show_bug.cgi?id=504326).
The idea is to supersede the udev rule and not call nvidia-smi.

- Copy /lib/udev/rules.d/99-nvidia.rules to /etc/udev/rules.d/99-nvidia.rules
- Edit /etc/udev/rules.d/99-nvidia.rules and comment the first line:
#ACTION=="add", DEVPATH=="/module/nvidia", SUBSYSTEM=="module", RUN+="nvidia-udev.sh $env{ACTION}"
- Reboot and the kernel OOPS should not appear anymore


This works for me, thanks!

kernel 3.12.22
nvidia-drivers 340.17
Back to top
View user's profile Send private message
hampelratte
Apprentice
Apprentice


Joined: 29 Jul 2005
Posts: 155

PostPosted: Wed Sep 24, 2014 11:33 am    Post subject: Reply with quote

I ran into the same problem with kernel 3.12.21 and nvidia-drivers above 334.21-r3. I'm wondering, if this is caused by a certain setup (hardware / software), since only a few users seem to be affected. I'm not willing to use the workaround from the bug report, because I tend to forget about such things and then they may cause problems later. So, for now I downgraded to 334 and masked all newer versions of nvidia-drivers.

If anybody of you successfully tests a newer version, let us know, so that we can get rid of the workaround / package mask.

BR
Henrik
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum