View previous topic :: View next topic |
Author |
Message |
mrbassie l33t
Joined: 31 May 2013 Posts: 772 Location: over here
|
Posted: Fri Aug 29, 2014 6:41 pm Post subject: |
|
|
Have you tried deleting /home before mounting the dataset rather than just umounting it?
I don't have anything else to suggest, I don't experience this on my set up. I have /home and /home/myusername as seperate datasets, if you don't then I guess that could be the problem, in that your /home is finding files within the subdirectory that it doesn't expect to be there, whereas with the user subdirectory as a seperate set, it wouldn't care (...I think, I'm no expert). |
|
Back to top |
|
|
Aonoa Guru
Joined: 23 May 2002 Posts: 589
|
Posted: Fri Aug 29, 2014 11:56 pm Post subject: |
|
|
mrbassie wrote: | Have you tried deleting /home before mounting the dataset rather than just umounting it?
I don't have anything else to suggest, I don't experience this on my set up. I have /home and /home/myusername as seperate datasets, if you don't then I guess that could be the problem, in that your /home is finding files within the subdirectory that it doesn't expect to be there, whereas with the user subdirectory as a seperate set, it wouldn't care (...I think, I'm no expert). |
I will look into deleting /home entirely prior to mounting, but it affects /root too (also it's own dataset) if root is the active user when I reboot. I can add that I didn't always have this issue, I think it may have begun once I put pulseaudio on my system, do you have pulseaudio installed and in use?
I'm also wondering if there is any point in a L2ARC device if I have fast SSD's and a lot of RAM? |
|
Back to top |
|
|
mrbassie l33t
Joined: 31 May 2013 Posts: 772 Location: over here
|
Posted: Sat Aug 30, 2014 10:30 am Post subject: |
|
|
I don't use pulse-audio, no. I don't see why that would cause this though, wouldn't that mean that pulse-audio tries to write data to an unmounted filesystem during shutdown and so the system creates a new directory for it to write to?
Is /home/username also a seperate dataset to /home?
Please post exactly how you manually mount the dataset after boot. I'm guessing you umount /home and then zfs mount the dataset.
As for l2arc...if it's a desktop or a laptop, I don't see the point, I don't think there would be enough data access to necessitate such a large fast cache. Likewise a home server.
Afaik it's a feature more oriented at big production servers with terrabytes of data constantly being hammered by tons of users which require very good iops. |
|
Back to top |
|
|
Aonoa Guru
Joined: 23 May 2002 Posts: 589
|
Posted: Sat Aug 30, 2014 7:33 pm Post subject: |
|
|
mrbassie wrote: | I don't use pulse-audio, no. I don't see why that would cause this though, wouldn't that mean that pulse-audio tries to write data to an unmounted filesystem during shutdown and so the system creates a new directory for it to write to? |
Yes, kind of. The datasets are probably unmounted while some process(es) aren't done shutting down entirely, while / is still available.
mrbassie wrote: | Is /home/username also a seperate dataset to /home? |
Yes, it is. My desktop hasn't been operational (broken hardware) for a while, so the laptop I have been using in the meantime has thus far only had issues mounting /root.
mrbassie wrote: | Please post exactly how you manually mount the dataset after boot. I'm guessing you umount /home and then zfs mount the dataset. |
I don't need to unmount anything. For /root I just "rm -rf /root/.* ; zfs mount rpool/HOME/root", seeing as it's just a few dot files/folders there's no harm in removing. As long as /root is empty, the mounting works.
mrbassie wrote: | As for l2arc...if it's a desktop or a laptop, I don't see the point, I don't think there would be enough data access to necessitate such a large fast cache. Likewise a home server.
Afaik it's a feature more oriented at big production servers with terrabytes of data constantly being hammered by tons of users which require very good iops. |
Indeed, I don't think I will bother using L2ARC at all. |
|
Back to top |
|
|
mrbassie l33t
Joined: 31 May 2013 Posts: 772 Location: over here
|
Posted: Sun Aug 31, 2014 11:20 am Post subject: |
|
|
You could try rebuilding your initramfs once all the datasets are mounted. I don't know if that will work or not. I'm just wondering if it's something to do with the zpool cache. That's just a wild guess.
Other than that I don't know what to suggest. |
|
Back to top |
|
|
Aonoa Guru
Joined: 23 May 2002 Posts: 589
|
Posted: Wed Sep 10, 2014 7:33 pm Post subject: |
|
|
mrbassie wrote: | You could try rebuilding your initramfs once all the datasets are mounted. I don't know if that will work or not. I'm just wondering if it's something to do with the zpool cache. That's just a wild guess.
Other than that I don't know what to suggest. |
I have been experimenting on my newly built system, and only /root is currently having a problem mounting during startup. The directory "/root/.pulse" is created during boot along with two files inside it, somehow prior to /root being mounted. The /root directory is it's own zfs dataset, and /home/user is similarly it's own dataset, but it mounts properly. Funnily there is a similar "/home/user/.pulse" folder with more files in it.
The culprit seems related to alsa-utils because /root mounts properly if I unmerge alsa-utils. With alsa-utils gone, there is no "/root/.pulse" folder being created at all. By the way, I don't have alsa stuff in my runlevels. I guess I currently don't really need alsa-utils, but I would like to figure out exactly what is going on and why /home/user mounts, but not /root.
Rebuilding the initramfs file or /etc/zfs/zpool.cache has no effect at all. |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
Posted: Thu Oct 02, 2014 1:24 am Post subject: |
|
|
Hi guys,
is it normal that (small correctable) errors occur from time to time ?
Quote: | pool: WD30EFRX
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: resilvered 320K in 0h0m with 0 errors on Mon Aug 25 17:50:39 2014
config:
NAME STATE READ WRITE CKSUM
WD30EFRX ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wd30efrx_002 ONLINE 0 0 0
wd30efrx ONLINE 0 0 2
cache
intelSSD180 ONLINE 0 0 0
errors: No known data errors |
Quote: |
zpool status -v
pool: WD30EFRX
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub repaired 140K in 6h6m with 0 errors on Thu Oct 2 03:20:51 2014
config:
NAME STATE READ WRITE CKSUM
WD30EFRX ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wd30efrx_002 ONLINE 0 0 4
wd30efrx ONLINE 0 0 0
cache
intelSSD180 ONLINE 0 0 0
errors: No known data errors
|
this system is being run by a Xeon CPU with ECC RAM
seems like I might have to give the RAM a check during the weekend ... _________________ https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa
Hardcore Gentoo Linux user since 2004 |
|
Back to top |
|
|
Aonoa Guru
Joined: 23 May 2002 Posts: 589
|
Posted: Mon Oct 13, 2014 12:57 pm Post subject: |
|
|
kernelOfTruth wrote: | Hi guys,
is it normal that (small correctable) errors occur from time to time ? |
I do not get "normal" errors on my ZFS pools, at least not yet. I had errors occurring some times on an old ZFS pool, but the SSD's were dying and hardware problems was the cause. |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
Posted: Mon Oct 13, 2014 2:57 pm Post subject: |
|
|
@Aonoa:
oh - indeed ! I've had 2 harddrives already dying after showing these kind of symptoms by the dozens - however this pattern is surprising
haven't tested the RAM yet (low priority as it's ECC it surely would have either displayed an error or hung the system)
in retrospect now it either seems to occur during period of large stress (scrubbing, transferring of all the data [closed to 2TB]) or longer uptime
although self-tests don't indicate any errors this is kind of strange - longer uptime could be related due to regressions of issues introduced by patches in the kernel
I'll take a look
thank you ! _________________ https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa
Hardcore Gentoo Linux user since 2004 |
|
Back to top |
|
|
WWWW Tux's lil' helper
Joined: 30 Nov 2014 Posts: 143
|
Posted: Sun Nov 30, 2014 6:25 pm Post subject: inquiry |
|
|
hello,
ZFS is interesting but I am still struggling to get it working smoothly. There are a few rough edges which I hope they can be overcome.
First question is about the scheduler option in the kernel. On them interwebs say that the best is to use:
NO-OP
Is this true? It's even more difficult to have the correct linux option because multiple guides from over the years and different OSes saying conflicting things.
What about CFQ? What's the best scheduler under a modern linux kernel to use with ZFS?
One last question. Why isn't ZFS working on kernel 3.17?? I want kernel latest DRM to be honest.
As I said, ZFS is a curious solution. Having burned several times with BTRFS with disastrous corruptions was nice to see ZFS behaving like LVM with volums as well.
Another question, can the partition 9 be deleted and grow ZFS to use that last bit? I heard that's Solaris' remnant.
thank you. |
|
Back to top |
|
|
mrbassie l33t
Joined: 31 May 2013 Posts: 772 Location: over here
|
Posted: Mon Dec 01, 2014 1:46 pm Post subject: |
|
|
noop probably is indeed the best way to go. zfs is very independent and I believe it handles scheduling itself.
I have a little issue myself I'd like to put out there:
I've got two gentoo boxes on zfs, one is a replica of the other (stage4), they both have a zvol for swap, on one swapon fails during boot but I can do it manually after logon. |
|
Back to top |
|
|
WWWW Tux's lil' helper
Joined: 30 Nov 2014 Posts: 143
|
Posted: Mon Dec 01, 2014 4:34 pm Post subject: |
|
|
mrbassie wrote: | noop probably is indeed the best way to go. zfs is very independent and I believe it handles scheduling itself.
I have a little issue myself I'd like to put out there:
I've got two gentoo boxes on zfs, one is a replica of the other (stage4), they both have a zvol for swap, on one swapon fails during boot but I can do it manually after logon. |
Sounds like init order, or it isn't included in initramfs.
Did you add swap in /etc/fstab?
Maybe fstab is run before zfs activation or something. Check this:
rc-update show |
|
Back to top |
|
|
mrbassie l33t
Joined: 31 May 2013 Posts: 772 Location: over here
|
Posted: Mon Dec 01, 2014 4:57 pm Post subject: |
|
|
identical on the two machines.
Code: | # <fs> <mountpoint> <type> <opts> <dump/pass>
# NOTE: If your BOOT partition is ReiserFS, add the notail option to opts.
/dev/sda1 /boot ext2 defaults 0 2
/dev/zvol/tank/swap none swap sw 0 0
/dev/cdrom /mnt/cdrom auto users,exec,rw 0 0
/dev/sdb1 /media/usb auto rw,users,noauto,nodev,nosuid 1 2
tmpfs /tmp tmpfs rw,nodev,nosuid,size=128M |
Code: | bootmisc | boot
consolekit | default
dbus | default
devfs | sysinit
dmesg | sysinit
hostname | boot
keymaps | boot
killprocs | shutdown
loopback | boot
microcode_ctl | boot
modules | boot
mount-ro | shutdown
mtab | boot
net.lo | boot
net.wlan0 | default
numlock | default
preload | boot
procfs | boot
root | boot
savecache | shutdown
swap | boot
swapfiles | boot
sysctl | boot
sysfs | sysinit
termencoding | boot
tmpfiles.dev | sysinit
tmpfiles.setup | boot
udev | default
udev-mount | sysinit
udev-postmount | default
urandom | boot
xdm | default
zfs | boot |
|
|
Back to top |
|
|
WWWW Tux's lil' helper
Joined: 30 Nov 2014 Posts: 143
|
Posted: Mon Dec 01, 2014 7:02 pm Post subject: |
|
|
I can think two more things.
hostid
zpool.cache
re-create them for the new box. |
|
Back to top |
|
|
mrbassie l33t
Joined: 31 May 2013 Posts: 772 Location: over here
|
Posted: Mon Dec 01, 2014 7:31 pm Post subject: |
|
|
WWWW wrote: | I can think two more things.
hostid
zpool.cache
re-create them for the new box. |
I don't know how to do that. They're both single disk workstations with 120G ssd's. |
|
Back to top |
|
|
WWWW Tux's lil' helper
Joined: 30 Nov 2014 Posts: 143
|
Posted: Mon Dec 01, 2014 8:51 pm Post subject: |
|
|
zpool set cachefile=/etc/zfs/zpool.cache <pool>
Also check your hostid with
zpool status
force export and force import (with liveCD), then reboot.
Here's a good link with a checklist:
https://github.com/zfsonlinux/zfs/issues/599
Another suspect could be udev. Try re-generating your initramfs after re-creating zpool.cache file.
Gone are the days when you could simply dd a system from hdd to another hdd and linux would boot without a hitch. Now with the plethora of UUIDS down to your fingerprints cloning aint that straight forward.
To date I have not tried clonning a gpt formated drive. I wonder how could that work since uefi is the king of UUIDS. |
|
Back to top |
|
|
mrbassie l33t
Joined: 31 May 2013 Posts: 772 Location: over here
|
Posted: Tue Dec 02, 2014 9:42 am Post subject: |
|
|
I'll have a look at that when I get home from work. Not sure about what you mean by recreating hostid. Are you talking about the name of the pool?
I didn't actually clone one to the other. Originally my laptop was on jfs. I built a machine at work (which is now at home) and wanted to play with zfs so I built Gentoo from scratch on a zpool copying my config files across from the laptop with a usb stick (everything in /etc/portage, kernel config, world etc.) So when I say they're identical I don't mean bit for bit, I mean they're configured identically (other than a couple of in kernel drivers)
When I decided it was useful and stable enough I made a stage4 of my home laptop installation, created a pool and datasets on the disk, mounted them all and just untarred the stage4, built the kernel and that was that.
Didn't dd anything. |
|
Back to top |
|
|
WWWW Tux's lil' helper
Joined: 30 Nov 2014 Posts: 143
|
Posted: Tue Dec 02, 2014 10:30 am Post subject: |
|
|
New version out!!
zfs-0.6.3-1.1
Are gentoo maintainers sleeping or something?
There are many goodies and 3.17 support!!
please add an ebuild within the next 8 hours. Exactly in 8 hours I will sync portage to confirm.
thanks |
|
Back to top |
|
|
peje Tux's lil' helper
Joined: 11 Jan 2003 Posts: 100
|
Posted: Tue Dec 02, 2014 10:56 am Post subject: |
|
|
@WWWW please mind your words:
Quote: | New version out!!
zfs-0.6.3-1.1
Are gentoo maintainers sleeping or something?
There are many goodies and 3.17 support!!
please add an ebuild within the next 8 hours. Exactly in 8 hours I will sync portage to confirm.
thanks |
thats not the way it works, no one earth anything with doing work for gentoo
cu Peje |
|
Back to top |
|
|
WWWW Tux's lil' helper
Joined: 30 Nov 2014 Posts: 143
|
Posted: Fri Dec 05, 2014 12:31 pm Post subject: |
|
|
I wanted to know whether zfs-0.6.3-r1 fixes the 4GB hard coded for max ram?
I have the problem where any value over 512MB fills ALL RAM eventually. This leads to numerous performance issues.
Since I have a nice amount of ram I wanted to leverage this fact to have zfs performing optimally and more.
Ram fills quickly when traversing half a million indexed with mysql.
I am not using ARC cache.
When I set the limit to 4GB zfs seems to understand 8GB. I don't mind zfs using all ram it wants as long as it releases some.
Qemu is not able to start because it can't find contiguous ram allocation.
Some curious fact that after emerging something rams empties out suddenly.
I don't know how to control this behavior.
thanks. |
|
Back to top |
|
|
Ant P. Watchman
Joined: 18 Apr 2009 Posts: 6920
|
Posted: Fri Dec 05, 2014 3:54 pm Post subject: |
|
|
WWWW wrote: | New version out!!
zfs-0.6.3-1.1
Are gentoo maintainers sleeping or something?
There are many goodies and 3.17 support!!
please add an ebuild within the next 8 hours. Exactly in 8 hours I will sync portage to confirm.
thanks |
Patches welcome. |
|
Back to top |
|
|
WWWW Tux's lil' helper
Joined: 30 Nov 2014 Posts: 143
|
Posted: Fri Dec 05, 2014 9:30 pm Post subject: |
|
|
Ant P. wrote: |
Patches welcome. |
It's in portage already and 3.17 compatible.
What about my last question? |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
|
Back to top |
|
|
traq9 n00b
Joined: 16 Jul 2013 Posts: 2 Location: Mesa, AZ
|
Posted: Tue Dec 09, 2014 5:27 pm Post subject: Kernel Oops using spl-0.6.3-r1/zfs-kmod-0.6.3-r1 |
|
|
Issues with ZFS spl-0.6.3-r1/zfs-kmod-0.6.3-r1.
Quote: | [285511.517924] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[285511.517955] IP: [] feature_do_action+0x23/0x2b0 [zfs]
[285511.517989] PGD 74ad3067 PUD 74ad6067 PMD 0
[285511.518005] Oops: 0000 [#1] SMP |
See https://github.com/zfsonlinux/zfs/issues/2946 for details.
Step lightly for those of you depending on your ZFS pools. |
|
Back to top |
|
|
WWWW Tux's lil' helper
Joined: 30 Nov 2014 Posts: 143
|
Posted: Thu Dec 11, 2014 8:09 pm Post subject: |
|
|
Oh man!! I got one even scarier!!
Code: |
INFO: rcu_sched self-detected stall on CPU
\x092: (84003 ticks this GP) idle=cbf/140000000000001/0 softirq=2798109/2798109
\x09 (t=84004 jiffies g=1276523 c=1276522 q=18648)
Task dump for CPU 2:
zvol/26 R running task 12768 2646 2 0x00000008
06edf9e7cf55bbf4 ffff88009c204200 ffffffff945fa800 ffff88023ed03db8
ffffffff9407b671 0000000000000002 ffffffff945fa800 ffff88023ed03dd0
ffffffff9407dd04 0000000000000003 ffff88023ed03e00 ffffffff94098cc0
Call Trace:
<IRQ> [<ffffffff9407b671>] sched_show_task+0xc1/0x130
[<ffffffff9407dd04>] dump_cpu_task+0x34/0x40
[<ffffffff94098cc0>] rcu_dump_cpu_stacks+0x90/0xd0
[<ffffffff9409c13c>] rcu_check_callbacks+0x44c/0x6d0
[<ffffffff9407eaea>] ? account_system_time+0x8a/0x160
[<ffffffff9409e883>] update_process_times+0x43/0x70
[<ffffffff940ad331>] tick_sched_handle.isra.18+0x41/0x50
[<ffffffff940ad379>] tick_sched_timer+0x39/0x60
[<ffffffff9409eea1>] __run_hrtimer.isra.34+0x41/0xf0
[<ffffffff9409f715>] hrtimer_interrupt+0xe5/0x220
[<ffffffff940227e2>] local_apic_timer_interrupt+0x32/0x60
[<ffffffff94022d9f>] smp_apic_timer_interrupt+0x3f/0x60
[<ffffffff94428c7b>] apic_timer_interrupt+0x6b/0x70
<EOI> [<ffffffff94427a0e>] ? _raw_spin_lock+0x1e/0x30
[<ffffffff94425f47>] __mutex_unlock_slowpath+0x17/0x40
[<ffffffff94425f8d>] mutex_unlock+0x1d/0x20
[<ffffffffc0405cb9>] dbuf_clear+0xd9/0x160 [zfs]
[<ffffffffc0405d50>] dbuf_evict+0x10/0x400 [zfs]
[<ffffffffc0405911>] dbuf_rele_and_unlock+0xb1/0x350 [zfs]
[<ffffffffc0405ca2>] dbuf_clear+0xc2/0x160 [zfs]
[<ffffffffc0405d50>] dbuf_evict+0x10/0x400 [zfs]
[<ffffffffc0405911>] dbuf_rele_and_unlock+0xb1/0x350 [zfs]
[<ffffffffc04a8f70>] ? dsl_dataset_get_holds+0x17b0/0x2fe1e [zfs]
[<ffffffffc0405bd1>] dmu_buf_rele+0x21/0x30 [zfs]
[<ffffffffc0419f58>] dmu_tx_assign+0x8e8/0xc60 [zfs]
[<ffffffffc041a30c>] dmu_tx_hold_write+0x3c/0x50 [zfs]
[<ffffffffc04a27e8>] zrl_is_locked+0xa78/0x1880 [zfs]
[<ffffffffc0292b66>] taskq_cancel_id+0x2a6/0x5b0 [spl]
[<ffffffff9407bb10>] ? wake_up_state+0x20/0x20
[<ffffffffc02929d0>] ? taskq_cancel_id+0x110/0x5b0 [spl]
[<ffffffff940733e4>] kthread+0xc4/0xe0
[<ffffffff94073320>] ? kthread_create_on_node+0x160/0x160
[<ffffffff94427ec4>] ret_from_fork+0x74/0xa0
[<ffffffff94073320>] ? kthread_create_on_node+0x160/0x160
INFO: rcu_sched self-detected stall on CPU
\x092: (20999 ticks this GP) idle=cbf/140000000000001/0 softirq=2798109/2798109
\x09 (t=21000 jiffies g=1276523 c=1276522 q=5491)
Task dump for CPU 2:
zvol/26 R running task 12768 2646 2 0x00000008
06edf9e7cf55bbf4 ffff88009c204200 ffffffff945fa800 ffff88023ed03db8
ffffffff9407b671 0000000000000002 ffffffff945fa800 ffff88023ed03dd0
ffffffff9407dd04 0000000000000003 ffff88023ed03e00 ffffffff94098cc0
Call Trace:
<IRQ> [<ffffffff9407b671>] sched_show_task+0xc1/0x130
[<ffffffff9407dd04>] dump_cpu_task+0x34/0x40
[<ffffffff94098cc0>] rcu_dump_cpu_stacks+0x90/0xd0
[<ffffffff9409c13c>] rcu_check_callbacks+0x44c/0x6d0
[<ffffffff9407eaea>] ? account_system_time+0x8a/0x160
[<ffffffff9409e883>] update_process_times+0x43/0x70
[<ffffffff940ad331>] tick_sched_handle.isra.18+0x41/0x50
[<ffffffff940ad379>] tick_sched_timer+0x39/0x60
[<ffffffff9409eea1>] __run_hrtimer.isra.34+0x41/0xf0
[<ffffffff9409f715>] hrtimer_interrupt+0xe5/0x220
[<ffffffff940227e2>] local_apic_timer_interrupt+0x32/0x60
[<ffffffff94022d9f>] smp_apic_timer_interrupt+0x3f/0x60
[<ffffffff94428c7b>] apic_timer_interrupt+0x6b/0x70
<EOI> [<ffffffff94427a0e>] ? _raw_spin_lock+0x1e/0x30
[<ffffffff94425f47>] __mutex_unlock_slowpath+0x17/0x40
[<ffffffff94425f8d>] mutex_unlock+0x1d/0x20
[<ffffffffc0405cb9>] dbuf_clear+0xd9/0x160 [zfs]
[<ffffffffc0405d50>] dbuf_evict+0x10/0x400 [zfs]
[<ffffffffc0405911>] dbuf_rele_and_unlock+0xb1/0x350 [zfs]
[<ffffffffc0405ca2>] dbuf_clear+0xc2/0x160 [zfs]
[<ffffffffc0405d50>] dbuf_evict+0x10/0x400 [zfs]
[<ffffffffc0405911>] dbuf_rele_and_unlock+0xb1/0x350 [zfs]
[<ffffffffc04a8f70>] ? dsl_dataset_get_holds+0x17b0/0x2fe1e [zfs]
[<ffffffffc0405bd1>] dmu_buf_rele+0x21/0x30 [zfs]
[<ffffffffc0419f58>] dmu_tx_assign+0x8e8/0xc60 [zfs]
[<ffffffffc041a30c>] dmu_tx_hold_write+0x3c/0x50 [zfs]
[<ffffffffc04a27e8>] zrl_is_locked+0xa78/0x1880 [zfs]
[<ffffffffc0292b66>] taskq_cancel_id+0x2a6/0x5b0 [spl]
[<ffffffff9407bb10>] ? wake_up_state+0x20/0x20
[<ffffffffc02929d0>] ? taskq_cancel_id+0x110/0x5b0 [spl]
[<ffffffff940733e4>] kthread+0xc4/0xe0
[<ffffffff94073320>] ? kthread_create_on_node+0x160/0x160
[<ffffffff94427ec4>] ret_from_fork+0x74/0xa0
[<ffffffff94073320>] ? kthread_create_on_node+0x160/0x160
general protection fault: 0000 [#4] SMP
CPU: 3 PID: 2625 Comm: zvol/5 Tainted:
task: ffff88009c1b1080 ti: ffff88009c1b1608 task.ti: ffff88009c1b1608
RIP: 0010:[<ffffffff94425f54>] [<ffffffff94425f54>] __mutex_unlock_slowpath+0x24/0x40
RSP: 0000:ffffc90015cb3b78 EFLAGS: 00010283
RAX: fefefefefefefefe RBX: ffff8801e52ab1b0 RCX: ffff880225a820c0
RDX: ffff8801e52ab1b8 RSI: ffff8800734d6b90 RDI: ffff8801e52ab1b4
RBP: ffffc90015cb3b80 R08: 00000000000823d1 R09: 0000000000000000
R10: ffff880225ac1818 R11: 000000000000000e R12: 0000000000000002
R13: ffff880226537930 R14: ffff880225feaa68 R15: ffff880226537948
FS: 00007137d8f89740(0000) GS:ffff88023ed80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00006ff3987db000 CR3: 00000001dc357000 CR4: 00000000000407f0
Stack:
ffff8801e52ab1b0 ffffc90015cb3b98 ffffffff94425f8d ffff8801e52ab158
ffffc90015cb3bb8 ffffffffc04059d9 ffff8800734d6b90 ffff8801e52ab158
ffffc90015cb3be8 ffffffffc0405ca2 ffff8800734d6b90 0000000000000000
Call Trace:
[<ffffffff94425f8d>] mutex_unlock+0x1d/0x20
[<ffffffffc04059d9>] dbuf_rele_and_unlock+0x179/0x350 [zfs]
[<ffffffffc0405ca2>] dbuf_clear+0xc2/0x160 [zfs]
[<ffffffffc0405d50>] dbuf_evict+0x10/0x400 [zfs]
[<ffffffffc0405911>] dbuf_rele_and_unlock+0xb1/0x350 [zfs]
[<ffffffffc04a8f70>] ? dsl_dataset_get_holds+0x17b0/0x2fe1e [zfs]
[<ffffffffc0405bd1>] dmu_buf_rele+0x21/0x30 [zfs]
[<ffffffffc0419f58>] dmu_tx_assign+0x8e8/0xc60 [zfs]
[<ffffffffc041a30c>] dmu_tx_hold_write+0x3c/0x50 [zfs]
[<ffffffffc04a27e8>] zrl_is_locked+0xa78/0x1880 [zfs]
[<ffffffffc0292b66>] taskq_cancel_id+0x2a6/0x5b0 [spl]
[<ffffffff9407bb10>] ? wake_up_state+0x20/0x20
[<ffffffffc02929d0>] ? taskq_cancel_id+0x110/0x5b0 [spl]
[<ffffffff940733e4>] kthread+0xc4/0xe0
[<ffffffff94073320>] ? kthread_create_on_node+0x160/0x160
[<ffffffff94427ec4>] ret_from_fork+0x74/0xa0
[<ffffffff94073320>] ? kthread_create_on_node+0x160/0x160
Code: 1f 84 00 00 00 00 00 55 48 89 e5 53 48 89 fb 48 8d 7b 04 c7 03 01 00 00 00 e8 a9 1a 00 00 48 8b 43 08 48 8d 53 08 48 39 d0 74 09 <48> 8b 78 10 e8 53 5b c5 ff 80 43 04 01 5b 5d c3 66 66 66 2e 0f
RIP [<ffffffff94425f54>] __mutex_unlock_slowpath+0x24/0x40
RSP <ffffc90015cb3b78>
---[ end trace 8fc20d6e09e2d611 ]---
general protection fault: 0000 [#3] SMP
CPU: 1 PID: 2623 Comm: zvol/3 Tainted:
task: ffff88009c1b0000 ti: ffff88009c1b0588 task.ti: ffff88009c1b0588
RIP: 0010:[<ffffffff94425f54>] [<ffffffff94425f54>] __mutex_unlock_slowpath+0x24/0x40
RSP: 0000:ffffc90015ca3b78 EFLAGS: 00010287
RAX: fefefefefefefefe RBX: ffff8800934e22a8 RCX: ffff880225a820c0
RDX: ffff8800934e22b0 RSI: ffff8801e268dbc0 RDI: ffff8800934e22ac
RBP: ffffc90015ca3b80 R08: 000000000007e19b R09: 0000000000000000
R10: ffff880225ac1818 R11: 000000000000000e R12: 0000000000000002
R13: ffff880226537930 R14: ffff880225feaa68 R15: ffff880226537948
FS: 00007137d8f89740(0000) GS:ffff88023ec80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00006ff3987db000 CR3: 00000001dc357000 CR4: 00000000000407f0
Stack:
ffff8800934e22a8 ffffc90015ca3b98 ffffffff94425f8d ffff8800934e2250
ffffc90015ca3bb8 ffffffffc04059d9 ffff8801e268dbc0 ffff8800934e2250
ffffc90015ca3be8 ffffffffc0405ca2 ffff8801e268dbc0 0000000000000000
Call Trace:
[<ffffffff94425f8d>] mutex_unlock+0x1d/0x20
[<ffffffffc04059d9>] dbuf_rele_and_unlock+0x179/0x350 [zfs]
[<ffffffffc0405ca2>] dbuf_clear+0xc2/0x160 [zfs]
[<ffffffffc0405d50>] dbuf_evict+0x10/0x400 [zfs]
[<ffffffffc0405911>] dbuf_rele_and_unlock+0xb1/0x350 [zfs]
[<ffffffffc04a8f70>] ? dsl_dataset_get_holds+0x17b0/0x2fe1e [zfs]
[<ffffffffc0405bd1>] dmu_buf_rele+0x21/0x30 [zfs]
[<ffffffffc0419f58>] dmu_tx_assign+0x8e8/0xc60 [zfs]
[<ffffffffc041a30c>] dmu_tx_hold_write+0x3c/0x50 [zfs]
[<ffffffffc04a27e8>] zrl_is_locked+0xa78/0x1880 [zfs]
[<ffffffffc0292b66>] taskq_cancel_id+0x2a6/0x5b0 [spl]
[<ffffffff9407bb10>] ? wake_up_state+0x20/0x20
[<ffffffffc02929d0>] ? taskq_cancel_id+0x110/0x5b0 [spl]
[<ffffffff940733e4>] kthread+0xc4/0xe0
[<ffffffff94073320>] ? kthread_create_on_node+0x160/0x160
[<ffffffff94427ec4>] ret_from_fork+0x74/0xa0
[<ffffffff94073320>] ? kthread_create_on_node+0x160/0x160
Code: 1f 84 00 00 00 00 00 55 48 89 e5 53 48 89 fb 48 8d 7b 04 c7 03 01 00 00 00 e8 a9 1a 00 00 48 8b 43 08 48 8d 53 08 48 39 d0 74 09 <48> 8b 78 10 e8 53 5b c5 ff 80 43 04 01 5b 5d c3 66 66 66 2e 0f
RIP [<ffffffff94425f54>] __mutex_unlock_slowpath+0x24/0x40
RSP <ffffc90015ca3b78>
---[ end trace 8fc20d6e09e2d610 ]---
|
This is what happened:
Installed spl-0.6.3-r1/zfs-kmod-0.6.3-r1. Then upgraded to kernel 3.17.
During this upgrade I decided to forgo low latency pre-emption and with voluntary pre-emption or no pre-emption. Perhaps doesn't matter because with kernel 3.16 was fully pre-emptable and never segfaulted or OOPed.
Upon seeing that scary segfault and reading rcu_sched self-detected stall on CPU in dmesg I backtracked as fast as possible.
I thought that had to be related to RCU and pre-emption since RCU options in the kernel change according to pre-emption model.
But the problem persists )':
This zvol is formated with ext4.
Anybody know if this is fixable with proper option under RCU? To be honest I don't know how to configure the RCU options best for zfs.
thanks |
|
Back to top |
|
|
|