| View previous topic :: View next topic |
| Author |
Message |
gabrielg Tux's lil' helper

Joined: 16 Nov 2012 Posts: 134
|
Posted: Wed Jan 12, 2022 3:07 pm Post subject: gentoo-sources-5.15.11 (LTS/stable) shows some nfs issues |
|
|
Hi, all,
Since running the kernel in the subject of this post, I started experiencing odd kernel errors like this one:
| Code: |
Jan 12 08:54:10 nana kernel: BUG: kernel NULL pointer dereference, address: 0000000000000110
Jan 12 08:54:10 nana kernel: #PF: supervisor read access in kernel mode
Jan 12 08:54:10 nana kernel: #PF: error_code(0x0000) - not-present page
Jan 12 08:54:10 nana kernel: PGD 0 P4D 0
Jan 12 08:54:10 nana kernel: Oops: 0000 [#1] SMP NOPTI
Jan 12 08:54:10 nana kernel: CPU: 1 PID: 2864 Comm: lockd Not tainted 5.15.11-gentoo-x86_64 #2
Jan 12 08:54:10 nana kernel: Hardware name: HP ProLiant MicroServer, BIOS O41 07/29/2011
Jan 12 08:54:10 nana kernel: RIP: 0010:vfs_lock_file+0x5/0x30
Jan 12 08:54:10 nana kernel: Code: a3 fe ff ff 4d 89 e1 e9 a4 fd ff ff 66 0f 1f 84 00 00 00 00 00 e8 2b 0d d7 ff 48 8b 7f 20 e9 f2 f5 ff ff 66 90 e8 1b 0d d7 ff <48> 8b 47 28 49 89 d0 48 8b 80 98 00 00 00 48 85 c0 74 05 e9 43 b8
Jan 12 08:54:10 nana kernel: RSP: 0018:ffff9d3640997c80 EFLAGS: 00010246
Jan 12 08:54:10 nana kernel: RAX: 7fffffffffffffff RBX: 00000000000000e8 RCX: 0000000000000000
Jan 12 08:54:10 nana kernel: RDX: ffff9d3640997c88 RSI: 0000000000000006 RDI: 00000000000000e8
Jan 12 08:54:10 nana kernel: RBP: ffff8b754767b400 R08: ffff8b7549dcf000 R09: ffff8b754bef1a00
Jan 12 08:54:10 nana kernel: R10: 0000000000000000 R11: 000000000000f000 R12: ffffffff9c34bfd0
Jan 12 08:54:10 nana kernel: R13: ffff8b76a518e7a8 R14: ffff8b7549d60c10 R15: ffff8b754767b400
Jan 12 08:54:10 nana kernel: FS: 0000000000000000(0000) GS:ffff8b7860500000(0000) knlGS:0000000000000000
Jan 12 08:54:10 nana kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 08:54:10 nana kernel: CR2: 0000000000000110 CR3: 000000010ffd4000 CR4: 00000000000006e0
Jan 12 08:54:10 nana kernel: Call Trace:
Jan 12 08:54:10 nana kernel: <TASK>
Jan 12 08:54:10 nana kernel: nlm_unlock_files+0x6e/0xb0
Jan 12 08:54:10 nana kernel: ? _raw_spin_lock+0x5/0x20
Jan 12 08:54:10 nana kernel: ? trace_hardirqs_on+0x35/0xd0
Jan 12 08:54:10 nana kernel: ? __local_bh_enable_ip+0x44/0x80
Jan 12 08:54:10 nana kernel: ? trace_hardirqs_on+0x35/0xd0
Jan 12 08:54:10 nana kernel: ? mutex_lock+0x5/0x20
Jan 12 08:54:10 nana kernel: ? nlmsvc_traverse_blocks+0x36/0x120
Jan 12 08:54:10 nana kernel: nlm_traverse_files+0x14d/0x280
Jan 12 08:54:10 nana kernel: nlmsvc_free_host_resources+0x17/0x30
Jan 12 08:54:10 nana kernel: nlm_host_rebooted+0x23/0x90
Jan 12 08:54:10 nana kernel: nlmsvc_proc_sm_notify+0xa1/0x110
Jan 12 08:54:10 nana kernel: ? trace_hardirqs_on+0x35/0xd0
Jan 12 08:54:10 nana kernel: ? nlmsvc_decode_reboot+0x95/0xc0
Jan 12 08:54:10 nana kernel: nlmsvc_dispatch+0x89/0x180
Jan 12 08:54:10 nana kernel: svc_process_common+0x399/0x640
Jan 12 08:54:10 nana kernel: ? lockd_inet6addr_event+0xf0/0xf0
Jan 12 08:54:10 nana kernel: ? set_grace_period+0xb0/0xb0
Jan 12 08:54:10 nana kernel: svc_process+0xca/0xe0
Jan 12 08:54:10 nana kernel: lockd+0x8f/0x130
Jan 12 08:54:10 nana kernel: ? set_grace_period+0xb0/0xb0
Jan 12 08:54:10 nana kernel: kthread+0x10e/0x130
Jan 12 08:54:10 nana kernel: ? set_kthread_struct+0x40/0x40
Jan 12 08:54:10 nana kernel: ret_from_fork+0x22/0x30
Jan 12 08:54:10 nana kernel: </TASK>
Jan 12 08:54:10 nana kernel: Modules linked in: ecb xts dm_crypt dm_mod tun bridge stp llc ipt_REJECT nf_reject_ipv4 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter iptable_mangle iptable_raw ip_tables radeon i2c_algo_bit drm_ttm_helper ttm kvm_amd drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt w83795 kvm fb_sys_fops uas cfbcopyarea drm usb_storage irqbypass tg3 drm_panel_orientation_quirks pata_atiixp libphy pcspkr i2c_piix4
Jan 12 08:54:10 nana kernel: CR2: 0000000000000110
Jan 12 08:54:10 nana kernel: ---[ end trace 6ac413c9433d0bd8 ]---
Jan 12 08:54:10 nana kernel: RIP: 0010:vfs_lock_file+0x5/0x30
Jan 12 08:54:10 nana kernel: Code: a3 fe ff ff 4d 89 e1 e9 a4 fd ff ff 66 0f 1f 84 00 00 00 00 00 e8 2b 0d d7 ff 48 8b 7f 20 e9 f2 f5 ff ff 66 90 e8 1b 0d d7 ff <48> 8b 47 28 49 89 d0 48 8b 80 98 00 00 00 48 85 c0 74 05 e9 43 b8
Jan 12 08:54:10 nana kernel: RSP: 0018:ffff9d3640997c80 EFLAGS: 00010246
Jan 12 08:54:10 nana kernel: RAX: 7fffffffffffffff RBX: 00000000000000e8 RCX: 0000000000000000
Jan 12 08:54:10 nana kernel: RDX: ffff9d3640997c88 RSI: 0000000000000006 RDI: 00000000000000e8
Jan 12 08:54:10 nana kernel: RBP: ffff8b754767b400 R08: ffff8b7549dcf000 R09: ffff8b754bef1a00
Jan 12 08:54:10 nana kernel: R10: 0000000000000000 R11: 000000000000f000 R12: ffffffff9c34bfd0
Jan 12 08:54:10 nana kernel: R13: ffff8b76a518e7a8 R14: ffff8b7549d60c10 R15: ffff8b754767b400
Jan 12 08:54:10 nana kernel: FS: 0000000000000000(0000) GS:ffff8b7860500000(0000) knlGS:0000000000000000
Jan 12 08:54:10 nana kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 08:54:10 nana kernel: CR2: 0000000000000110 CR3: 000000010ffd4000 CR4: 00000000000006e0
|
The system doesn't crash, but the NFS server certainly becomes unstable. The reboot process works until it times out stopping nfs/rpc processes and I have to power off the system completely.
Today I booted on the previous kernel (5.10.x) and it seems to be working well.
It's worth mentioning here that I'm using the NFS server for a number of things, including a time machine HFS+ sparsebundle for a mac (I know, I know, but it's work so it isn't like I have options). Triggering time machine operations seems to cause this breakage, while other Gentoo Linux clients play nicely.
Obligatory `emerge --info`: https://cloud.gagv.org.uk/s/TnfytcBARAifdkx
Obligatory kernel config: https://cloud.gagv.org.uk/s/FDT9gYfP5zbADb9
A brief internet search didn't yield too many results about this, but I admit it was very brief and took the time instead to post this here in case somebody else knows something I don't or can guide me a bit.
Thanks!
Gabriel |
|
| Back to top |
|
 |
mike155 Advocate

Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Wed Jan 12, 2022 3:32 pm Post subject: |
|
|
It seems you use a Gentoo kernel. You could open a bug at https://bugs.gentoo.org. Maybe one of the Gentoo kernel developers will be able to help you.
At least for testing purposes, I would switch to the latest vanilla kernel 5.15.14. If you see the error there, you could ask one of the kernel maintainers for help. They are interested in bugs like the one you reported and they will be able to help you.
Last edited by mike155 on Wed Jan 12, 2022 3:37 pm; edited 1 time in total |
|
| Back to top |
|
 |
alamahant Advocate

Joined: 23 Mar 2019 Posts: 3882
|
Posted: Wed Jan 12, 2022 3:34 pm Post subject: |
|
|
Do you see something similar in your config
| Code: |
CONFIG_NFS_FS=m
CONFIG_NFS_V2=m
CONFIG_NFS_V3=m
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
CONFIG_NFS_SWAP=y
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_PNFS_FILE_LAYOUT=m
CONFIG_PNFS_BLOCK=m
CONFIG_PNFS_FLEXFILE_LAYOUT=m
CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
CONFIG_NFS_V4_1_MIGRATION=y
CONFIG_NFS_V4_SECURITY_LABEL=y
CONFIG_NFS_FSCACHE=y
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_NFSD_PNFS=y
CONFIG_NFSD_BLOCKLAYOUT=y
CONFIG_NFSD_SCSILAYOUT=y
CONFIG_NFSD_V4_2_INTER_SSC=y
CONFIG_NFSD_V4_SECURITY_LABEL=y
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_NFS_V4_2_SSC_HELPER=y
|
?
What does grep -i nfs /var/log/messages show? _________________
 |
|
| Back to top |
|
 |
gabrielg Tux's lil' helper

Joined: 16 Nov 2012 Posts: 134
|
Posted: Sun Jan 30, 2022 11:38 am Post subject: |
|
|
mike155: thanks for the advice - I have tried that but yielded not good results. Indeed, I found another issue that makes it harder to fix this one: https://bugs.gentoo.org/show_bug.cgi?id=832367
alamahant: my config is very similar:
| Code: |
CONFIG_KERNFS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V2=m
CONFIG_NFS_V3=m
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_PNFS_FILE_LAYOUT=m
CONFIG_PNFS_BLOCK=m
CONFIG_PNFS_FLEXFILE_LAYOUT=m
CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
CONFIG_NFS_V4_SECURITY_LABEL=y
CONFIG_NFS_FSCACHE=y
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFSD=y
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_NFSD_PNFS=y
CONFIG_NFSD_BLOCKLAYOUT=y
CONFIG_NFSD_SCSILAYOUT=y
CONFIG_NFSD_FLEXFILELAYOUT=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
|
Things like migration and SELinux I would prefer to have off, unless you really believe that this could be the issue? I am trying to compile the daemon as a module now, you never know...
| Quote: | | What does grep -i nfs /var/log/messages show? |
I will get that next time I try this, which might be a while (see bug above). |
|
| Back to top |
|
 |
gabrielg Tux's lil' helper

Joined: 16 Nov 2012 Posts: 134
|
Posted: Sun Feb 13, 2022 9:32 am Post subject: [SOLVED] gentoo-sources-5.15.11 (LTS/stable) shows some nfs |
|
|
Kernel 5.15.19 doesn't present this issue, so I'll mark this as solved.
As a side note, NFSD as a module caused me a few problems since it ignored sysctl's ports for lockd until I restarted the module, so it went back to being compiled in the kernel for simplicity  |
|
| Back to top |
|
 |
gabrielg Tux's lil' helper

Joined: 16 Nov 2012 Posts: 134
|
|
| Back to top |
|
 |
|