Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Disk I/O locks up system
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
chaoscommander
Tux's lil' helper
Tux's lil' helper


Joined: 15 Oct 2012
Posts: 106

PostPosted: Tue Mar 10, 2015 11:14 am    Post subject: Disk I/O locks up system Reply with quote

I often get lockups/hangs that last for a few seconds to several minutes, whenever a process has to do a lot of disk I/O. I took a look with iotop to see if it's one specific process hogging the disk. Sometimes it's a regular update process, like mlocate, makewhatis, baloo etc., but sometimes (especially when waking up from hibernation) it's a bunch of userspace processes (firefox, libreoffice, kde itself, whatever has been hibernating) doing whatever it is they are doing with the complete HDD bandwidth.

I first suspected it was due to my machine being a little older, but it's always the disk, not the CPU that is at 100% capacity. It also happens with a fairly fast ThinkPad that has a SSD in it (albeit more rarely, so far only when large packages are being merged). Is there a way to reconfigure the IO scheduler so it will allocate bandwidth more equally instead of letting single threads block the entire disk? I'm using CFQ I/O scheduler.
Or am I looking for the problem in an entirely wrong direction?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Tue Mar 10, 2015 2:00 pm    Post subject: Reply with quote

If your machine completely hangs, I'd say it's a hardware issue and perhaps it's time the machine needs to be replaced. My machines can take 100% disk i/o load just fine. Is it just Gentoo that exhibits this failure mode? When you run 'dmesg' if it comes back from hanging, does it report something?

If it's just "so slow that there's virtually no forward progress" that's a different case. There's ionice or you can make cgroups to throttle but you can't use this on swapping and kernel (resume from disk) so if it's crashing during either of these, see above.

Make sure you're using the right kernel disk io driver too, or try a different one (use specific driver instead of generic, or vice versa).
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
chithanh
Developer
Developer


Joined: 05 Aug 2006
Posts: 2158
Location: Berlin, Germany

PostPosted: Tue Mar 10, 2015 2:50 pm    Post subject: Reply with quote

The system should not lock up fully, even during disk I/O.

Try to enable voluntary or full kernel preemption. Also in my experience, the deadline scheduler behaves better with heavy I/O load and interactive processes.
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2006

PostPosted: Wed Mar 11, 2015 11:02 am    Post subject: Reply with quote

I suspect the problem is swapping. Using firefox and libreoffice on my (1 Gb RAM) laptop shows typical symptoms: you can use one or the other continuously, but changing between them causes the disk light to come on hard, and a minute or so before it the system starts to respond again.

I have a suspicion that Javascript is implicated somewhere in firefox - my laptop frequently slows right down on some web pages that appear pretty innocuous, so I wonder if there are memory leaks or alternatively wasteful algorithms that run fast on machines with lots of RAM but slow where RAM is constrained.
_________________
Greybeard
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Wed Mar 11, 2015 7:43 pm    Post subject: Re: Disk I/O locks up system Reply with quote

chaoscommander wrote:
Is there a way to reconfigure the IO scheduler so it will allocate bandwidth more equally instead of letting single threads block the entire disk? I'm using CFQ I/O scheduler.
Or am I looking for the problem in an entirely wrong direction?

No, you're looking in the right place. CFQ is an utter trainwreck for desktop use; switch to BFQ (available in any sufficiently advanced patchset - genpatches has it) and these problems will be far less frequent.
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Wed Mar 11, 2015 9:06 pm    Post subject: Reply with quote

Also, check the power management setup for your Thinkpad; Some of them, esp. newer ones, have known issues with ASPM which can cause weird problems like this.

I had to disable ASPM completely because every time I plugged in an ExpressCard it would make the whole system 'pause' every couple of seconds.

(Mine is an X230 for reference)
Back to top
View user's profile Send private message
Mr. M
Tux's lil' helper
Tux's lil' helper


Joined: 18 Sep 2004
Posts: 89
Location: USA

PostPosted: Thu Mar 12, 2015 4:22 pm    Post subject: Reply with quote

I'm also seeing lock-ups on a thinkpad t420s. This started happening after upgrading my system a few days ago (kernel from 3.14.33 to 3.18.7; I hadn't updated my system for a while). The system seems to hang freeze completely when e.g. emerging packages in a KDE terminal. The emerge job continues (I see the HD led blinking) but everything else is frozen (mouse pointer does not move, cannot switch windows using Alt-Tab, cannot switch VT using e.g. Alt-F1). The system does come back after a while (5-10 minutes).

I never had this problem before to this degree, so I wonder if it is just the IO scheduler (did it get worse for desktop use recently?). I will try using the BFQ scheduler. What is the best way for enabling this? Is it just USE=experimental for gentoo-sources and then enable it in the kernel config (where?)?
Back to top
View user's profile Send private message
chaoscommander
Tux's lil' helper
Tux's lil' helper


Joined: 15 Oct 2012
Posts: 106

PostPosted: Thu Mar 12, 2015 10:33 pm    Post subject: Reply with quote

Quote:
If it's just "so slow that there's virtually no forward progress" that's a different case.

This is the case. I can still move the mouse, but it will take seconds to react and/or feel like moving through syrup. The desktop doesn't respond, switching to a TTY takes up to tens of seconds.

Quote:
Try to enable voluntary or full kernel preemption. Also in my experience, the deadline scheduler behaves better with heavy I/O load and interactive processes.

Kernel is fully preemptible.

Quote:
No, you're looking in the right place. CFQ is an utter trainwreck for desktop use; switch to BFQ (available in any sufficiently advanced patchset - genpatches has it) and these problems will be far less frequent.


Okay. Trying BFQ and/or deadline goes to the to-do list, I will report..

Mr. M's problem sounds very similar to mine, it also just started a while ago.

Quote:
I will try using the BFQ scheduler. What is the best way for enabling this? Is it just USE=experimental for gentoo-sources and then enable it in the kernel config (where?)?

Yes, that's it, found it at "Enable the Block Layer -> IO Schedulers"
Back to top
View user's profile Send private message
Mr. M
Tux's lil' helper
Tux's lil' helper


Joined: 18 Sep 2004
Posts: 89
Location: USA

PostPosted: Fri Mar 13, 2015 12:47 pm    Post subject: Reply with quote

I noticed that the "baloo_file_extr" process was constantly running and creating a lot of IO. After disabling it, my system seems to be more responsive again.

You can disable it by adding "Indexing-Enabled=false" to ~/.kde4/share/config/baloofilerc
Back to top
View user's profile Send private message
Mr. M
Tux's lil' helper
Tux's lil' helper


Joined: 18 Sep 2004
Posts: 89
Location: USA

PostPosted: Mon Mar 16, 2015 6:35 pm    Post subject: Reply with quote

Switching to the BFQ scheduler made a huge difference in terms of responsiveness :).
Back to top
View user's profile Send private message
chaoscommander
Tux's lil' helper
Tux's lil' helper


Joined: 15 Oct 2012
Posts: 106

PostPosted: Sat Mar 28, 2015 12:32 am    Post subject: Reply with quote

I second that. It still isn't perfect, but much, much better. Waking up from hibernation now is about 5 times faster. (1 minute instead of 5).
Back to top
View user's profile Send private message
chaoscommander
Tux's lil' helper
Tux's lil' helper


Joined: 15 Oct 2012
Posts: 106

PostPosted: Tue Mar 31, 2015 9:12 am    Post subject: Reply with quote

Hm.. it still hangs massively on big emerges, though.

I just found this in dmesg, about 20sec after wakeup from hibernation, but I'm not sure if it's related.

Code:
[192649.197026] ------------[ cut here ]------------
[192649.197039] WARNING: CPU: 1 PID: 2917 at drivers/gpu/drm/i915/intel_display.c:901 intel_wait_for_vblank+0x1ed/0x200()
[192649.197042] vblank wait on pipe A timed out
[192649.197046] CPU: 1 PID: 2917 Comm: upowerd Tainted: G        W      3.18.9-gentoo_chaosbox64 #1
[192649.197048] Hardware name: Zepto Orion/Zepto, BIOS A15 27/08/2008
[192649.197051]  0000000000000009 ffff88013a60bb18 ffffffff818ed125 0000000000000001
[192649.197056]  ffff88013a60bb68 ffff88013a60bb58 ffffffff8104c637 0000000000000000
[192649.197061]  ffff8800bac10000 0000000000070040 000000000000000a 000000010b77040c
[192649.197066] Call Trace:
[192649.197073]  [<ffffffff818ed125>] dump_stack+0x4f/0x7c
[192649.197079]  [<ffffffff8104c637>] warn_slowpath_common+0x77/0xa0
[192649.197083]  [<ffffffff8104c6d1>] warn_slowpath_fmt+0x41/0x50
[192649.197087]  [<ffffffff8149248d>] intel_wait_for_vblank+0x1ed/0x200
[192649.197092]  [<ffffffff814cceef>] intel_tv_detect+0x21f/0x590
[192649.197098]  [<ffffffff8142703c>] status_show+0x3c/0x80
[192649.197103]  [<ffffffff814d66db>] dev_attr_show+0x1b/0x60
[192649.197108]  [<ffffffff81338897>] ? debug_smp_processor_id+0x17/0x20
[192649.197114]  [<ffffffff811c69df>] sysfs_kf_seq_show+0xaf/0x140
[192649.197118]  [<ffffffff811c561b>] kernfs_seq_show+0x1b/0x20
[192649.197123]  [<ffffffff811758aa>] seq_read+0xea/0x370
[192649.197127]  [<ffffffff811c5d55>] kernfs_fop_read+0xf5/0x160
[192649.197137]  [<ffffffff81153b87>] vfs_read+0x97/0x180
[192649.197140]  [<ffffffff81154131>] SyS_read+0x41/0xb0
[192649.197143]  [<ffffffff818f61d6>] system_call_fastpath+0x16/0x1b
[192649.197145] ---[ end trace 6303ad86b167098e ]---
Back to top
View user's profile Send private message
chaoscommander
Tux's lil' helper
Tux's lil' helper


Joined: 15 Oct 2012
Posts: 106

PostPosted: Sat Apr 04, 2015 9:35 pm    Post subject: Reply with quote

That trace dump has not recurred -> seems unrelated.
Still: my responsiveness issues persist, albeit to a lesser extent than before. Does anyone have any other ideas?
Back to top
View user's profile Send private message
Yamakuzure
Advocate
Advocate


Joined: 21 Jun 2006
Posts: 2284
Location: Adendorf, Germany

PostPosted: Fri May 22, 2015 9:53 am    Post subject: Reply with quote

Mr. M wrote:
Switching to the BFQ scheduler made a huge difference in terms of responsiveness :).
I can not concur. My system is on ZFS and I have a VMware Workstation with Windows 7 running when I am at work, and using BFQ is a huge pain. Just switching between the vmware and any other window takes 20-30 seconds with BFQ, but <1 second with CFQ.

Has anybody had the same experience?
_________________
Important German:
  1. "Aha" - German reaction to pretend that you are really interested while giving no f*ck.
  2. "Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6111
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Fri May 22, 2015 7:04 pm    Post subject: Reply with quote

Yamakuzure wrote:
Mr. M wrote:
Switching to the BFQ scheduler made a huge difference in terms of responsiveness :).
I can not concur. My system is on ZFS and I have a VMware Workstation with Windows 7 running when I am at work, and using BFQ is a huge pain. Just switching between the vmware and any other window takes 20-30 seconds with BFQ, but <1 second with CFQ.

Has anybody had the same experience?


Looks like you're experiencing quite the opposite


Can you please report this to the BFQ (bfq-iosched) Mailing list ?

Ideally with a reproducer ?

https://groups.google.com/forum/#!forum/bfq-iosched


Paolo Valente (who is probably the most widely known in relation to BFQ) and the others working on it & using it would highly appreciate (me too :wink: ), I'm sure
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
shazeal
Apprentice
Apprentice


Joined: 03 May 2006
Posts: 206
Location: New Zealand

PostPosted: Fri May 22, 2015 8:17 pm    Post subject: Reply with quote

Yamakuzure wrote:
Mr. M wrote:
Switching to the BFQ scheduler made a huge difference in terms of responsiveness :).
I can not concur. My system is on ZFS and I have a VMware Workstation with Windows 7 running when I am at work, and using BFQ is a huge pain. Just switching between the vmware and any other window takes 20-30 seconds with BFQ, but <1 second with CFQ.

Has anybody had the same experience?


I have the same issue just not really looked into it as Ive been coding most of the time recently so not run into it much. I switched to ZFS a few weeks ago for all my disks root included. And have noticed when the computer is under heavy heavy load with Virtualbox running I get large delays before actions are completed, windows/bash/other IO tasks etc.
I never thought to attribute it to BFQ since its always solved the problem ;)
_________________
CFLAGS="-OmgWTFR1CE --fun-lol-loops --march=asmx86go"
Back to top
View user's profile Send private message
shazeal
Apprentice
Apprentice


Joined: 03 May 2006
Posts: 206
Location: New Zealand

PostPosted: Fri May 22, 2015 8:26 pm    Post subject: Reply with quote

Found the solution to the ZFS issue! Its not BFQ's fault at all.

https://bbs.archlinux.org/viewtopic.php?id=196439

Quote:
Thanks for the quick response and suggestion. Quick googling revealed it is ZFS that sets "noop" scheduler as long as its pools are occupying the whole disk device.


So if you have ZFS as a partition rather than full disk you need to force the noop scheduler.

I have root/swap/zfs partitions on my root drive, hence it was selecting BFQ as the scheduler.
_________________
CFLAGS="-OmgWTFR1CE --fun-lol-loops --march=asmx86go"
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6111
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Fri May 22, 2015 9:54 pm    Post subject: Reply with quote

shazeal wrote:
Found the solution to the ZFS issue! Its not BFQ's fault at all.

https://bbs.archlinux.org/viewtopic.php?id=196439

Quote:
Thanks for the quick response and suggestion. Quick googling revealed it is ZFS that sets "noop" scheduler as long as its pools are occupying the whole disk device.


So if you have ZFS as a partition rather than full disk you need to force the noop scheduler.

I have root/swap/zfs partitions on my root drive, hence it was selecting BFQ as the scheduler.



Oh - I thought this was known or given :oops:

I forgot that this also took me quite some time to stumble upon this :lol:


Code:
echo "bfq" > /sys/module/zfs/parameters/zfs_vdev_scheduler

_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
shazeal
Apprentice
Apprentice


Joined: 03 May 2006
Posts: 206
Location: New Zealand

PostPosted: Sat May 23, 2015 7:05 am    Post subject: Reply with quote

kernelOfTruth wrote:

Oh - I thought this was known or given :oops:

I forgot that this also took me quite some time to stumble upon this :lol:


Code:
echo "bfq" > /sys/module/zfs/parameters/zfs_vdev_scheduler


This confuses me more :lol:

So the system scheduler should be noop, but you can set the vdev scheduler to bfq??

What does the vdev scheduler normally use? I thought ZFS had its own scheduler? I dont see any documentation on this, apart from the man page which only contains this...

Quote:
zfs_vdev_scheduler (charp)
I/O scheduler

Default value: noop.


So verbose :roll:
_________________
CFLAGS="-OmgWTFR1CE --fun-lol-loops --march=asmx86go"
Back to top
View user's profile Send private message
energyman76b
Advocate
Advocate


Joined: 26 Mar 2003
Posts: 2048
Location: Germany

PostPosted: Mon May 25, 2015 8:50 pm    Post subject: Reply with quote

Yamakuzure wrote:
Mr. M wrote:
Switching to the BFQ scheduler made a huge difference in terms of responsiveness :).
I can not concur. My system is on ZFS and I have a VMware Workstation with Windows 7 running when I am at work, and using BFQ is a huge pain. Just switching between the vmware and any other window takes 20-30 seconds with BFQ, but <1 second with CFQ.

Has anybody had the same experience?


with zfs please use NO io scheduler.
_________________
Study finds stunning lack of racial, gender, and economic diversity among middle-class white males

I identify as a dirty penismensch.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6111
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Mon May 25, 2015 9:42 pm    Post subject: Reply with quote

energyman76b wrote:
Yamakuzure wrote:
Mr. M wrote:
Switching to the BFQ scheduler made a huge difference in terms of responsiveness :).
I can not concur. My system is on ZFS and I have a VMware Workstation with Windows 7 running when I am at work, and using BFQ is a huge pain. Just switching between the vmware and any other window takes 20-30 seconds with BFQ, but <1 second with CFQ.

Has anybody had the same experience?


with zfs please use NO io scheduler.


meaning ?

blk-mq ?

setting it to noop ? (noop still does some minor ordering)


Thanks !
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
energyman76b
Advocate
Advocate


Joined: 26 Mar 2003
Posts: 2048
Location: Germany

PostPosted: Mon May 25, 2015 9:54 pm    Post subject: Reply with quote

noop
_________________
Study finds stunning lack of racial, gender, and economic diversity among middle-class white males

I identify as a dirty penismensch.
Back to top
View user's profile Send private message
Yamakuzure
Advocate
Advocate


Joined: 21 Jun 2006
Posts: 2284
Location: Adendorf, Germany

PostPosted: Wed May 27, 2015 8:38 am    Post subject: Reply with quote

Strange. I will try with noop, then. But why is CFQ working just fine?

Has anybody ever compared using noop versus injecting using bfq to zfs_vdev_scheduler?

Edit: KernelOfTruth wrote: "when using BFQ, switching the vdev-scheduler to BFQ also should significantly reduce latency: echo bfq > /sys/module/zfs/parameters/zfs_vdev_scheduler"

So, better than noop? Riddles...
_________________
Important German:
  1. "Aha" - German reaction to pretend that you are really interested while giving no f*ck.
  2. "Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.
Back to top
View user's profile Send private message
haarp
Guru
Guru


Joined: 31 Oct 2007
Posts: 535

PostPosted: Wed May 27, 2015 9:40 am    Post subject: Reply with quote

Also, if you're using an SSD and ext4 wit hthe discard option, deleting lots of files can cause considerable delays as they get TRIMed from the disk.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Wed May 27, 2015 12:43 pm    Post subject: Reply with quote

Or maybe you are blaming the wrong thing?

While scheduler should switch to each task and give it an amount of time to work, if the scheduler is unable to switch, you can use whatever scheduler you wish, it won't switch.

Why it couldn't switch, if your hardware is busy and block interrupt, scheduler (the one you wish) must wait the hardware to stop doing that.
What you should look at? Stuff like enabling MSI, use APIC instead of PIC, irq conflict for older hw (one using PIC only)...
If you have IRQ conflict and two devices try to work, one device may hide the other device request and worst may re-run the first one.
Say a mouse and hdd, when your hdd is writing and the mouse is moved, the mouse movement may get just ignore because every time the mouse move, the conflict might gives hand to the hdd again. Result: hdd is busy writing, mouse move, hdd is ask to continue while mouse is ignore. And user may see hdd doing its work, while mouse get no answer at all, in the mean time, poor scheduler is waiting the hdd to end before it can work.
Of course a mouse have no IRQ, but usb mouse are using the usb controller irq, and lame motherboard are used to shared usb controller with hdd controller.
You should also look at broken apic/pic option in kernel (CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS), and maybe using hpet that would gave a finer timer resolution and may help scheduler balancing.
If you lack memory and the task was swap, there's nothing you can do, the hdd must read the swap to restore the memory because the scheduler is switching to it, in this case, i suppose if noop scheduler is doing less switching, then it would be the best to use.

For your information, i just never had any problem like this using deadline scheduler, however i don't think the scheduler type makes any difference there (as long as memory is not an issue).
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum