Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Nvidia drivers hunging UDEV resulting in one core at 100%
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
rafaelzigx
n00b
n00b


Joined: 09 Apr 2015
Posts: 11

PostPosted: Wed Nov 07, 2018 11:07 am    Post subject: Nvidia drivers hunging UDEV resulting in one core at 100% Reply with quote

Hello guys,

I'm having a issue with nvidia property drivers.
Every time I install it, it crashes with udev and make one of the cores going to 100% all the time.

The process consuming the processor:
/sbin/udev --daemon
I cant kill it.


I've already downgraded kernel versions down to 4.9.x and up to 4.19. Didnt solve.
I did the same tests with nvidia drivers.
upgraded also the eudev (even if I wasnt sure of it). Nothing.

If I use nouveau, I dont have this problem. But as soon as I install nvidia, it starts.

My Kernel log:

Code:
[   65.832518] udevd[2723]: slow: 'lmt-udev auto' [2786]
[   66.873927] udevd[2690]: worker [2724] /module/nvidia is taking a long time
[   66.873931] udevd[2690]: worker [2754] /devices/pci0000:00/0000:00:01.0/0000:01:00.0 is taking a long time
[   66.873933] udevd[2690]: worker [2723] /devices/system/machinecheck/machinecheck3 is taking a long time
[  185.917368] udevd[2724]: timeout 'nvidia-udev.sh add'
[  185.917378] udevd[2724]: slow: 'nvidia-udev.sh add' [2868]
[  186.918427] udevd[2724]: timeout: killing 'nvidia-udev.sh add' [2868]
[  186.918443] udevd[2724]: slow: 'nvidia-udev.sh add' [2868]
[  186.918626] udevd[2724]: 'nvidia-udev.sh add' [2868] terminated by signal 9 (Killed)
[  186.928568] udevd[2723]: timeout: killing 'lmt-udev auto' [2786]
[  186.928577] udevd[2723]: slow: 'lmt-udev auto' [2786]
[  186.928714] udevd[2723]: 'lmt-udev auto' [2786] terminated by signal 9 (Killed)
[  189.931852] udevd[2690]: worker [2754] /devices/pci0000:00/0000:00:01.0/0000:01:00.0 timeout; kill it
[  189.931861] udevd[2690]: seq 1837 '/devices/pci0000:00/0000:00:01.0/0000:01:00.0' killed
[  246.983258] INFO: task laptop_mode:5703 blocked for more than 120 seconds.
[  246.983260]       Tainted: P           OE     4.19.1-gentoo-vulkan #1
[  246.983261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.983262] laptop_mode     D    0  5703   3755 0x00000000
[  246.983264] Call Trace:
[  246.983269]  ? __schedule+0x250/0x800
[  246.983271]  schedule+0x28/0x80
[  246.983272]  schedule_preempt_disabled+0xa/0x10
[  246.983274]  __mutex_lock.isra.1+0x24d/0x490
[  246.983276]  ? wp_page_copy+0x318/0x640
[  246.983279]  ? control_store+0x20/0x80
[  246.983280]  control_store+0x20/0x80
[  246.983283]  kernfs_fop_write+0x105/0x180
[  246.983286]  __vfs_write+0x36/0x180
[  246.983288]  ? selinux_file_permission+0x11f/0x130
[  246.983289]  ? security_file_permission+0x2c/0xb0
[  246.983291]  vfs_write+0xb0/0x190
[  246.983293]  ksys_write+0x52/0xc0
[  246.983295]  do_syscall_64+0x5a/0x110
[  246.983297]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  246.983299] RIP: 0033:0x7f819f211da8
[  246.983303] Code: Bad RIP value.
[  246.983304] RSP: 002b:00007ffd95ab9370 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  246.983305] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f819f211da8
[  246.983306] RDX: 0000000000000003 RSI: 0000563fd80fbab0 RDI: 0000000000000001
[  246.983307] RBP: 0000563fd80fbab0 R08: 000000000000000a R09: 0000563fd8130270
[  246.983308] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f819f4e1760
[  246.983309] R13: 0000000000000003 R14: 00007f819f4dc760 R15: 0000000000000003
[  369.863272] INFO: task laptop_mode:5703 blocked for more than 120 seconds.
[  369.863274]       Tainted: P           OE     4.19.1-gentoo-vulkan #1
[  369.863274] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.863275] laptop_mode     D    0  5703   3755 0x00000000
[  369.863277] Call Trace:
[  369.863281]  ? __schedule+0x250/0x800
[  369.863282]  schedule+0x28/0x80
[  369.863283]  schedule_preempt_disabled+0xa/0x10
[  369.863284]  __mutex_lock.isra.1+0x24d/0x490
[  369.863287]  ? wp_page_copy+0x318/0x640
[  369.863289]  ? control_store+0x20/0x80
[  369.863290]  control_store+0x20/0x80
[  369.863292]  kernfs_fop_write+0x105/0x180
[  369.863294]  __vfs_write+0x36/0x180
[  369.863296]  ? selinux_file_permission+0x11f/0x130
[  369.863297]  ? security_file_permission+0x2c/0xb0
[  369.863299]  vfs_write+0xb0/0x190
[  369.863300]  ksys_write+0x52/0xc0
[  369.863302]  do_syscall_64+0x5a/0x110
[  369.863303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  369.863305] RIP: 0033:0x7f819f211da8
[  369.863308] Code: Bad RIP value.
[  369.863309] RSP: 002b:00007ffd95ab9370 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  369.863310] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f819f211da8
[  369.863310] RDX: 0000000000000003 RSI: 0000563fd80fbab0 RDI: 0000000000000001
[  369.863311] RBP: 0000563fd80fbab0 R08: 000000000000000a R09: 0000563fd8130270
[  369.863311] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f819f4e1760
[  369.863312] R13: 0000000000000003 R14: 00007f819f4dc760 R15: 0000000000000003
[  492.743282] INFO: task laptop_mode:5703 blocked for more than 120 seconds.
[  492.743283]       Tainted: P           OE     4.19.1-gentoo-vulkan #1
[  492.743284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  492.743285] laptop_mode     D    0  5703   3755 0x00000000
[  492.743286] Call Trace:
[  492.743291]  ? __schedule+0x250/0x800
[  492.743292]  schedule+0x28/0x80
[  492.743293]  schedule_preempt_disabled+0xa/0x10
[  492.743294]  __mutex_lock.isra.1+0x24d/0x490
[  492.743297]  ? wp_page_copy+0x318/0x640
[  492.743299]  ? control_store+0x20/0x80
[  492.743300]  control_store+0x20/0x80
[  492.743302]  kernfs_fop_write+0x105/0x180
[  492.743304]  __vfs_write+0x36/0x180
[  492.743306]  ? selinux_file_permission+0x11f/0x130
[  492.743307]  ? security_file_permission+0x2c/0xb0
[  492.743309]  vfs_write+0xb0/0x190
[  492.743310]  ksys_write+0x52/0xc0
[  492.743312]  do_syscall_64+0x5a/0x110
[  492.743326]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  492.743327] RIP: 0033:0x7f819f211da8
[  492.743330] Code: Bad RIP value.
[  492.743331] RSP: 002b:00007ffd95ab9370 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  492.743332] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f819f211da8
[  492.743333] RDX: 0000000000000003 RSI: 0000563fd80fbab0 RDI: 0000000000000001
[  492.743333] RBP: 0000563fd80fbab0 R08: 000000000000000a R09: 0000563fd8130270
[  492.743334] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f819f4e1760
[  492.743335] R13: 0000000000000003 R14: 00007f819f4dc760 R15: 0000000000000003
[  615.623274] INFO: task laptop_mode:5703 blocked for more than 120 seconds.
[  615.623276]       Tainted: P           OE     4.19.1-gentoo-vulkan #1
[  615.623276] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  615.623277] laptop_mode     D    0  5703   3755 0x00000000
[  615.623278] Call Trace:
[  615.623282]  ? __schedule+0x250/0x800
[  615.623283]  schedule+0x28/0x80
[  615.623284]  schedule_preempt_disabled+0xa/0x10
[  615.623286]  __mutex_lock.isra.1+0x24d/0x490
[  615.623288]  ? wp_page_copy+0x318/0x640
[  615.623290]  ? control_store+0x20/0x80
[  615.623291]  control_store+0x20/0x80
[  615.623293]  kernfs_fop_write+0x105/0x180
[  615.623295]  __vfs_write+0x36/0x180
[  615.623297]  ? selinux_file_permission+0x11f/0x130
[  615.623298]  ? security_file_permission+0x2c/0xb0
[  615.623300]  vfs_write+0xb0/0x190
[  615.623301]  ksys_write+0x52/0xc0
[  615.623303]  do_syscall_64+0x5a/0x110
[  615.623304]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  615.623306] RIP: 0033:0x7f819f211da8
[  615.623309] Code: Bad RIP value.
[  615.623309] RSP: 002b:00007ffd95ab9370 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  615.623311] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f819f211da8
[  615.623311] RDX: 0000000000000003 RSI: 0000563fd80fbab0 RDI: 0000000000000001
[  615.623312] RBP: 0000563fd80fbab0 R08: 000000000000000a R09: 0000563fd8130270
[  615.623312] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f819f4e1760
[  615.623313] R13: 0000000000000003 R14: 00007f819f4dc760 R15: 0000000000000003
[  738.503276] INFO: task laptop_mode:5703 blocked for more than 120 seconds.
[  738.503278]       Tainted: P           OE     4.19.1-gentoo-vulkan #1
[  738.503278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

My hardware:

Code:
┌─[rafael][vulkan][~]
└─▪ inxi -v 2
System:    Host: vulkan Kernel: 4.19.1-gentoo-vulkan x86_64 bits: 64 Desktop: Xfce 4.12.4
           Distro: Gentoo Base System release 2.4.1
Machine:   Device: laptop System: Dell product: XPS 15 9560 serial: N/A
           Mobo: Dell model: 05FFDN v: A00 serial: N/A UEFI: Dell v: 1.12.1 date: 10/02/2018
Battery    BAT0: charge: 36.4 Wh 74.2% condition: 49.0/56.0 Wh (88%)
CPU:       Quad core Intel Core i7-7700HQ (-MT-MCP-) speed/max: 3510/3800 MHz
Graphics:  Card-1: Intel Device 591b
           Card-2: NVIDIA GP107M [GeForce GTX 1050 Mobile]
           Display Server: X.Org 1.20.3 driver: modesetting Resolution: 1920x1080@59.93hz
           OpenGL: renderer: Mesa DRI Intel HD Graphics 630 (Kaby Lake GT2) version: 4.5 Mesa 18.2.4
Network:   Card-1: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter driver: ath10k_pci
           Card-2: Qualcomm Atheros
Drives:    HDD Total Size: 3024.6GB (2.5% used)
           ID-1: model: Samsung_SSD_960_PRO_1TB
           ID-2: model: Ultra_Slim_PL
Info:      Processes: 217 Uptime: 33 min Memory: 1097.2/15930.2MB Client: Shell (bash) inxi: 2.3.56


Thanks in advance.

[Moderator edit: added [code] tags to preserve output layout. -Hu]
Back to top
View user's profile Send private message
javeree
Guru
Guru


Joined: 29 Jan 2006
Posts: 453

PostPosted: Wed Nov 21, 2018 12:29 pm    Post subject: Reply with quote

I have the same issue since a few days.
bug https://bugs.gentoo.org/show_bug.cgi?id=454740 describes the cause, but the solution is a workaround, and I think what happens is a ratrace in the workaround of the script, as the error does not happen at all bootups. In between the time they check for existence of the nvidia module and the execution of nvidia-smi, the module is unloaded again.

I see that sometimes after say one hour, it suddenly succeeds and gets the module loaded. I also see that manually loading nvidia-drm can break the loop.
So for now as a poor workaround, I have added

Code:
cat > /etc/local.d/nvidia-break-udevd-lock.start <<EOF
#! / bin/sh
modprobe nvidia-drm
EOF

chmod +x etc/local.d/nvidia-break-udevd-lock.start
rc-update add local


The next thing I will check: I have seen at a given moment that a module 'nvidia' was loaded, but not nvidia-drm. So maybe the check in nvidia-udev.sh should not be lsmod | grep -iq nvidia, but rather lsmod | grep -iq nvidia-drm.
Back to top
View user's profile Send private message
javeree
Guru
Guru


Joined: 29 Jan 2006
Posts: 453

PostPosted: Wed Nov 21, 2018 12:37 pm    Post subject: Reply with quote

Also additional bug and proposed solution:
https://bugs.gentoo.org/504326
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Thu Nov 22, 2018 3:47 am    Post subject: Reply with quote

another one to have a look at https://forums.gentoo.org/viewtopic-p-8280144.html#8280144
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum