Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
AMD64 system slow/unresponsive during disk access (Part 2)
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
tallica
Apprentice
Apprentice


Joined: 27 Jul 2007
Posts: 152
Location: Lublin, POL

PostPosted: Tue Nov 23, 2010 8:52 am    Post subject: Reply with quote

Looks like there are some problems with patch v4: http://lkml.org/lkml/2010/11/21/41
_________________
Gentoo ~AMD64 | Audacious
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5696
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Fri Dec 03, 2010 6:36 pm    Post subject: Reply with quote

if you're rsyncing regularly and apps are suffering during heavy reads, try the following:

[PATCH v3 0/3] f/madivse(DONTNEED) support
http://marc.info/?l=linux-kernel&m=129104424110018&w=2
http://marc.info/?l=linux-kernel&m=129104424210023&w=2
http://marc.info/?l=linux-kernel&m=129104424210027&w=2

there are several other potential improvements in this area but the most if not all, that are applicable to 2.6.36 to the current date are incorporated
already in the zen-kernel so give it a try:

git.zen-kernel.org

www.zen-kernel.org

dm crypt: scale to multiple CPUs
might be useful if you need it - in my experience it increases latency probably a little

there also seems to be some potential filesystem corruption being triggered with ext4 and 2.6.37-rc* right now which is under investigation so don't use it with 2.6.37-rc* - yet
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
pste
Tux's lil' helper
Tux's lil' helper


Joined: 14 Dec 2004
Posts: 90

PostPosted: Tue Mar 01, 2011 6:44 am    Post subject: What about 64-bit kernels and I/O... Reply with quote

I think this is a bug, an inconsistency, an unfortunate system design or something similar that has been around for quite a while, which makes me very annoyed, not least because it pokes a hole in the hallmark of linux - speed and stability! - and it seems never to get fixed!?!? Unfortunately I'm in no position (not timewise, nor knowledgewise) to pursue this problem myself, which leaves me with no other option than to provide background, some additional information and experiences and hope that someone clever can get something out of it and accept the challenge!

Quick summary:
Probably since kernel 2.6.18 (googling seems to converge on this kernel) there's been some kind of problem with I/O (disk I/O, ...??) on 64-bit kernels, resulting in high cpu load and a lagging or even hanging system. It seems like the problem begins when I/O reaches a certain amount, like copying many large files (taking a backup...) or doing many things concurrently, like copying files while rsync'ing through a vpn tunnel (high cpu load in general). It seems like it has something to do with the I/O scheduler, but I experience problems even if I use the "simplest" deadline scheduler. To me, zen-sources also works slightly better that gentoo-sources.

It's not hardware problems, because it works fine with windows (which is kind of extra annoying, isn't it?), and my impression (haven't tested thoroughly though - but my home servers, both with gentoo-sources-32bit with several usb drives on usb hubs seems to work flawlessly) is that there's no problem with 32-bit kernels. The external usb drives I'm using are of different brands and models, all have the same behavior. I've tried -a lot- of different kernel settings, including many minimal ones, but I cannot find a pattern.

What happens?
Personally I think it's connected with usb and usb-harddrives (it's at least a failsafe way to make the problem show!). Every time I make a backup, copying goes at 20MB/s plus for a while, but then it starts to slow down. I cannot say exactly when, but either when copied size reaches - say 8GB (just a guess - sometimes more), or when the file count is big! (no number...). Then transfer speed falls down to a few MB/s (2-5MB/s perhaps), system start to lag and cpu is 100%. Often (not always) I get something about "reset usb-device" in the log, sometimes "I/O-error on device" forcing me to shut-down drives and computer, start over, and fsck. Occasionally I get a complete system lock-up (my comp. freezes and all leds on keyb shine).

Making backups between two usb-drives on the same usb-hub, seems to create most problems, including total system hang-ups

Hopefully, someone that thinks this is something that is required to be fixed can make someting out of this start digging - I'm cheering loudly in that case!

Good luck!

/pste
Back to top
View user's profile Send private message
idella4
Veteran
Veteran


Joined: 09 Jun 2006
Posts: 1587
Location: Australia, Perth

PostPosted: Tue Mar 01, 2011 9:48 am    Post subject: Reply with quote

pste,

very general. use top, iotop, and setup the conditions that it occurs. You can at least capture some snapshots of the system state with the tops. Then dmesg content. Need some sort of baseline.
_________________
idella4@aus
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2448
Location: Germany

PostPosted: Tue Mar 01, 2011 9:59 am    Post subject: Re: What about 64-bit kernels and I/O... Reply with quote

pste wrote:
Making backups between two usb-drives on the same usb-hub, seems to create most problems


That's a total bottleneck on the hardware side though. Even in ideal conditions you shouldn't see more than 10MB/s transfer speeds for usb to usb especially with a hub involved. You're talking to both disks on a line that can ideally transfer 40MB/s, that means 20MB/s for each drive, then in comes the protocol overhead, filesystem overhead, hub overhead, context switches overhead, and you end up with extremely slow speed as is typical for USB... Add unreliable hardware to that (such as an overheating usb hub) and you're in data corruption land...

Performance issues in the kernel, it's a possibility, happens all the time, but if you also get strange stuff in dmesg, it's much more likely that your hardware is the culprit somehow
Back to top
View user's profile Send private message
ppurka
Advocate
Advocate


Joined: 26 Dec 2004
Posts: 3205

PostPosted: Tue Mar 01, 2011 10:14 am    Post subject: Reply with quote

See http://forums.gentoo.org/viewtopic-t-793263.html
_________________
emerge --quiet redefined | E17 vids: I, II
Back to top
View user's profile Send private message
tomk
Administrator
Administrator


Joined: 23 Sep 2003
Posts: 7219
Location: Sat in front of my computer

PostPosted: Tue Mar 01, 2011 10:40 am    Post subject: Reply with quote

Merged from here.
_________________
Search | Read | Answer | Report | Strip
Back to top
View user's profile Send private message
pste
Tux's lil' helper
Tux's lil' helper


Joined: 14 Dec 2004
Posts: 90

PostPosted: Tue Mar 01, 2011 11:17 am    Post subject: Reply with quote

ppurka - yes, that (this) thread is one of the sources that made me say that this is a since long problem! Google gives you more...

frostschutz - yes I know that the setup with usb-drives is a hardware bottleneck, but I do believe that it should not mean anything else than that the backup takes a long time, it should -not- make the entire system lag, or hang! I do think this is caused by some kind of race condition that occur in 64-bit kernels... And, NO! it's not the hardware, I wrote above that the same setup works fine in windows and (similar setups) with 32-bit kernels! - but yes, hardware related, meaning (kernel) driver problems, perhaps? I do agree that overheating is a possible explanation for the I/O-error situations, but I find it strange that this differ between OS:es (or kernel types). Furthermore, the problem does also occur without the hub (e.g. copying from system drive to usb drive or between usb drives on different usb ports of the comp.), I stated the example because my impression is that it's the quickest way to create problems...

idella4 - sure, I'll try to capture something, although not today... (I need to recompile the kernel with a few new flags - the iotop emerge told me, but I need my comp running a while longer...) But a problem is that for the worst case (the most interesting one) I must try to create one of these total lock-ups and then hand-copy (or photograph) the tops and dmesg screens precisely because the system is frozen, and it feels a little risky to recover the filesystem(s) everytime it hangs... A concrete example (for anyone to try): try starting a rsync -avh --progress /home /media/your-usb-drive/ (or similar) and wait (of course, /home must be many GB large!). I'm doing precisely this at the moment! For me this command keeps showing about 15-20MB/s for every file. But after a while the system gets lagged (rsync keeps running at the same speed though), then if I for instance try starting a movie in vlc (having to read a big file from the harddrive) - this is (naturally) really slow, but sometimes the movie hangs, and closing vlc takes about 5 minutes! When sync is finished, system is back to normal responsivess...

Thanks for the response!

/pste
Back to top
View user's profile Send private message
joeklow
n00b
n00b


Joined: 23 Jan 2011
Posts: 46

PostPosted: Tue Mar 01, 2011 6:29 pm    Post subject: Reply with quote

Reporting 2.6.36 ck-sources running at multicore Phenom II.

Recompiled this kernel, changing deprecated SATA support ("ATA/ATAPI support") over to new (serial ATA/PATA drivers).
I/O scheduler: BFQ (was CFQ)
Profile: Desktop (was Server)
CPU scheduler: CFS+autogroups
Timer: 1000 (was 200)

Also, /etc/init.d/local.start has the following to disable cache (stupid XFS loves to flush data once in hour, and it would be stupid to let the flushed data stay in cache).
Quote:

hdparm -W0 /dev/sda


Now can emerge -u world at host, in virtual machine and run Windows virtual machine simultaneously, and the remaining resources are sufficient to have a far better response (can surf/code).
Without those tricks was unable to do anything while merging something, and system was almost unresponsible while emerge --sync'ing.
Back to top
View user's profile Send private message
Yamakuzure
Veteran
Veteran


Joined: 21 Jun 2006
Posts: 1385
Location: Bardowick, Germany

PostPosted: Wed Mar 02, 2011 4:32 pm    Post subject: Reply with quote

Huh? I haven't had any lag since gentoo-sources-2.6.36-rsomething and with gentoo-sources-2.6.37 (okay, with cgroups hack) I have no lag even if I do a huge parallel merge (load between 25 and 40 on an i7 Dualcore laptop with HT) and have VMWare with WindowsXP open.

Is it just this cgroups stuff? I am basically using what is described here: http://forums.gentoo.org/viewtopic-t-852922.html
(And no, I do not have any problems with Amarok, DragonPlayer or any other multi media stuff)
_________________
I *do* know that I easily aggravate people due to my condensed writing. Rule of thumb: If I wrote anything that can be understood in two different ways, and one way offends you, then I meant the other! ;)
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2734
Location: Bay Area, CA

PostPosted: Tue Mar 29, 2011 7:19 am    Post subject: Reply with quote

2.6.38 with AUTOGROUP helps a lot with this issue.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2734
Location: Bay Area, CA

PostPosted: Tue Apr 19, 2011 6:34 am    Post subject: Reply with quote

Any news on this front? Does AUTOGROUP help people with this issue? Or this is a non-issue now?
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5696
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Tue Apr 19, 2011 6:26 pm    Post subject: Reply with quote

devsk wrote:
Any news on this front? Does AUTOGROUP help people with this issue? Or this is a non-issue now?


autogroup definitely does help

deactivate invalidated pages, too with heavy rsync jobs (fadvise support)


but there are still hickups and short interruptions when listening to music while it's heavily flushing to disk

so it's still present but got a lot lighter
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
TimeManx
n00b
n00b


Joined: 11 Jul 2011
Posts: 55

PostPosted: Sun Jan 01, 2012 11:02 am    Post subject: Reply with quote

I've configured 3.1.6 with autogroup, cfq, zram (128 MB as swap), zcache, transparent huge pages (madvise), memory compaction, preemptile kernel on a system with 2 GB of RAM. The system is quite responsive in the first half hour after boot but the performance keeps deteriorating.
Also, copying large amounts of data from one drive to another is sporadic and the drives become inaccesible during that time which causes dolphin to freeze.
Back to top
View user's profile Send private message
TinheadNed
Guru
Guru


Joined: 05 Apr 2003
Posts: 339
Location: Farnborough, UK

PostPosted: Thu Jan 05, 2012 8:45 pm    Post subject: Reply with quote

From the changelog of the 3.2 kernel: "desktop reponsiveness in presence of heavy writes has been improved"
Back to top
View user's profile Send private message
Holysword
l33t
l33t


Joined: 19 Nov 2006
Posts: 792
Location: Greece

PostPosted: Fri Feb 17, 2012 6:05 pm    Post subject: Reply with quote

Is this problem still up? I'm suffering from unresponsiveness very often; even stupid facebook flash games can bring my i7 with 4GB down - and let's say, I've got an infinite swap.

I've tried to blame the kernel (was zen-sources-something, can't remember but it doesn't have more than 2 weeks that I've updated it), but gentoo-sources also freezes/slows down. Was using BFQ+BFS, and now I'm with CFQ+CFS; same. SLUB to SLAB? Same. Remarking that BFQ is incompatible with cgroups, one would see that I was not using cgroups initially. Now I am using cgroups+autogroups. Nothing. My main system lies in a ReiserFS3 partition, not ext4 though.

I wouldn't claim this is annoying; this is being dangerous for me, since at least once a day my system crashes hopelessly and I have to hard-reboot it. Have lost a couple of files so far. I was considering a depclean + emerge -e world but now I am wondering if this is worth to try. 4 months ago I didn't have this problem (was using zen-sources back then) and some of you seem to have been having this for years...
_________________
"Nolite arbitrari quia venerim mittere pacem in terram non veni pacem mittere sed gladium" (Yeshua Ha Mashiach)
Back to top
View user's profile Send private message
depontius
Advocate
Advocate


Joined: 05 May 2004
Posts: 2518

PostPosted: Fri Feb 17, 2012 6:38 pm    Post subject: Reply with quote

Are you running /home from a network drive?

My performance problems were from /home being mounted on nfsv4, and were related to firefox and its sqlite sync() behavior. A year or two back I moved .mozilla and .thunderbird to local disk, then symlinked the nfs-mounted .mozilla and .thunderbird directories to the local ones. Problem gone.

Some time after moving that system to 3.2.x I saw the notice of improved responsiveness, and tried moving .mozilla back to nfs. My performance problems came back, though they didn't seem quite as bad. The other night I moved .mozilla back to local disk.

Other than that, I'm happy and even with that I wasn't having problems with crashing. Have you tried memtest86+?
_________________
.sigs waste space and bandwidth
Back to top
View user's profile Send private message
Holysword
l33t
l33t


Joined: 19 Nov 2006
Posts: 792
Location: Greece

PostPosted: Fri Feb 17, 2012 6:55 pm    Post subject: Reply with quote

depontius wrote:
Are you running /home from a network drive?

My performance problems were from /home being mounted on nfsv4, and were related to firefox and its sqlite sync() behavior. A year or two back I moved .mozilla and .thunderbird to local disk, then symlinked the nfs-mounted .mozilla and .thunderbird directories to the local ones. Problem gone.

Some time after moving that system to 3.2.x I saw the notice of improved responsiveness, and tried moving .mozilla back to nfs. My performance problems came back, though they didn't seem quite as bad. The other night I moved .mozilla back to local disk.

Other than that, I'm happy and even with that I wasn't having problems with crashing. Have you tried memtest86+?

No, everything is local. I also use chrome, not firefox, but have tested a few times with firefox, it seems to have the same behaviour.
_________________
"Nolite arbitrari quia venerim mittere pacem in terram non veni pacem mittere sed gladium" (Yeshua Ha Mashiach)
Back to top
View user's profile Send private message
Holysword
l33t
l33t


Joined: 19 Nov 2006
Posts: 792
Location: Greece

PostPosted: Fri Feb 24, 2012 2:16 pm    Post subject: Reply with quote

Okay folks... it seems it was both my kernel (it was zen-sources with BFS+BFQ), as seen in this thread, and a memory leak problem with my wm (as seen here).
_________________
"Nolite arbitrari quia venerim mittere pacem in terram non veni pacem mittere sed gladium" (Yeshua Ha Mashiach)
Back to top
View user's profile Send private message
xman1
n00b
n00b


Joined: 11 Apr 2004
Posts: 58

PostPosted: Thu Apr 26, 2012 4:14 pm    Post subject: Reply with quote

Has this been solved yet? I had these same issues and it turned out my Western Digital hard drive has a bug with APM. Pop into PM-utils default config and set APM to 255 to disable it and all works well now.

You can also do this with hdparm:

Code:
hdparm -B 255 /dev/sda


Maybe this will help someone as the pauses are quite annoying.

-X

PS. I forgot to mention the pauses were affecting things system wide. The whole system would wait on the APM bug. Thanks WD.
Back to top
View user's profile Send private message
smlbstcbr
n00b
n00b


Joined: 08 Apr 2006
Posts: 34

PostPosted: Wed Jun 27, 2012 4:52 pm    Post subject: Reply with quote

Bump. I still have those issues in 3.3.8-gentoo.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5696
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Wed Jul 04, 2012 10:37 am    Post subject: Reply with quote

in total we're trading in some throughput for interactivity & responsiveness:


you guys having the problems could give following tweaks a try:

Code:
echo cfq > /sys/block/sda/queue/scheduler
echo 10000 > /sys/block/sda/queue/iosched/fifo_expire_async
echo 250 > /sys/block/sda/queue/iosched/fifo_expire_sync
echo 80 > /sys/block/sda/queue/iosched/slice_async
echo 1 > /sys/block/sda/queue/iosched/low_latency
echo 6 > /sys/block/sda/queue/iosched/quantum
echo 5 > /sys/block/sda/queue/iosched/slice_async_rq
echo 3 > /sys/block/sda/queue/iosched/slice_idle
echo 100 > /sys/block/sda/queue/iosched/slice_sync
hdparm -q -M 254 /dev/sda


(source: http://unix.stackexchange.com/questions/30286/can-i-configure-my-linux-system-for-more-aggressive-file-system-caching)


I'm currently using all except the last one



in addition I'm using ck2 patchset for 3.4* kernel, patched BFS cpu scheduler up to version 424

and added Chen's O(1) tweak:

http://pastebin.com/ixw9PXAw


(thread: http://phoronix.com/forums/showthread.php?71658-RIFS-ES-Linux-Kernel-Scheduler-Released/page7 )


this helps A LOT


edit:

some additional stuff

when your system uses swap heavily raise page-cluster:

Code:
echo "12" > /proc/sys/vm/page-cluster


or

Code:
echo "10" > /proc/sys/vm/page-cluster


helps with interactivity issues for me



keep swapping low if possible:

Code:
echo "15" > /proc/sys/vm/swappiness


Con Kolivas afaik recommends 10

Code:
echo "10" > /proc/sys/vm/swappiness




keep

dirty_background_ratio and dirty_ratio low


Code:
echo "5" > /proc/sys/vm/dirty_background_ratio


and


Code:
echo "9"   > /proc/sys/vm/dirty_ratio



also make sure that pdflush/bdflush don't write out stuff too seldom

Code:
echo "300"  > /proc/sys/vm/dirty_writeback_centisecs


300 (3 seconds) should be the default, afaik powertop and other tools recommend 1500 (15 seconds)


edit:

added some settings I'm currently playing around with

edit2:

set

Code:
echo "300"  > /proc/sys/vm/dirty_writeback_centisecs


instead of 500 that seems to improve stalls
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D


Last edited by kernelOfTruth on Sat Jul 07, 2012 11:40 am; edited 1 time in total
Back to top
View user's profile Send private message
smlbstcbr
n00b
n00b


Joined: 08 Apr 2006
Posts: 34

PostPosted: Sat Jul 07, 2012 2:09 am    Post subject: Reply with quote

I'll see how that works in my machine. How unfortunate to have such issues in the Gentoo Kernel. It seems to me that it has slowed since the change to 3.XX kernels.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5696
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Sat Jul 07, 2012 11:39 am    Post subject: Reply with quote

smlbstcbr wrote:
I'll see how that works in my machine. How unfortunate to have such issues in the Gentoo Kernel. It seems to me that it has slowed since the change to 3.XX kernels.


try setting dirty_writeback_centisecs even lower

I just set it to 300 yesterday and it seems to play a very important role


Code:
echo "300"  > /proc/sys/vm/dirty_writeback_centisecs

_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
smlbstcbr
n00b
n00b


Joined: 08 Apr 2006
Posts: 34

PostPosted: Sun Jul 08, 2012 3:24 pm    Post subject: Reply with quote

Well, I'm trying your solution (thank you for posting them). There's a slight improvement. Not as smooth as it used to be.
EDIT: I have been using a value of 200 for the last parameter and my system has improved significantly, though there's still some lag when swapping windows or opening some documents.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next
Page 5 of 7

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum