Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Optimizing for system responsiveness
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
helmers
Guru
Guru


Joined: 16 Sep 2002
Posts: 548
Location: Oslo, Norway

PostPosted: Mon Oct 21, 2002 9:07 pm    Post subject: Optimizing for system responsiveness Reply with quote

To increase your system responsiveness, you can lower the latency of the kernel scheduler. (with lolo-sources, you can set a numerical value. The default is 100, I use 1500 right now, which might be a bit too much.)

The second thing, is to set a shorter read-ahed value with hdparm. A value of 2 will be okay.

"hdparm -a2 /dev/hda", where /dev/hda is your hard drive.

You should be awere, that this increases system overhead, and overall speed. If you want it the other way, set read-ahed much higher, up to 255, which is max. 128 is a good value for servers.

"hdparm -a128 /dev/hda", where /dev/hda is your hard drive.

And finally, since the biggest slowdown in most systems is the hard drive, the "-Os" compiler flag is a very good one. It makes smaller executables, which means less memory and less to read from the HD. To prove that it is a good one, the gentoo-sources also uses this flag.

Please let me know if you have any comments, the reason I'm posting is because I've been wondering about these things myself, hope you find it useful.


--
Regards,
Helmers
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16090
Location: Colorado

PostPosted: Mon Oct 21, 2002 10:05 pm    Post subject: Reply with quote

Wouldn't compiling for speed, as opposed to Os, be more of an advantage with modern hardware?
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
rac
Bodhisattva
Bodhisattva


Joined: 30 May 2002
Posts: 6553
Location: Japanifornia

PostPosted: Mon Oct 21, 2002 10:43 pm    Post subject: Reply with quote

kanuslupus wrote:
Wouldn't compiling for speed, as opposed to Os, be more of an advantage with modern hardware?

People can and probably have written some PhD theses on similar subjects. Optimization is hard. Two places I'm aware of where you typically see compilers trade speed for space is in inlining functions and unrolling loops. The idea is to avoid the overhead of subroutine calls, setting up local stack frames, jumping, paging, etc. in the inlining case, and to avoid the comparison and branch steps in the loop unrolling.

Loops are not as costly as they used to be on earlier processors, thanks to speculative execution and branch prediction. Overaggressive inlining and unrolling can actually hurt perfomance, if things start spilling out of caches.

And every time I've tried to really optimize code, it came down to a few functions that needed special treatment. Choices that might be appropriate for those few inner loops would not be so for the entire program - it might bloat it so badly that it would take forever to load from disk and would cause the system to thrash unnecessarily.

helmers, do you have information as to why the gentoo-kernel patches (18_gcc3-compile-opts in particular) prefers -Os to -O2? I wonder if it might be because of GCC 3.x bugs wrt optimization?
_________________
For every higher wall, there is a taller ladder
Back to top
View user's profile Send private message
borenson
n00b
n00b


Joined: 16 Jul 2002
Posts: 17

PostPosted: Tue Oct 22, 2002 12:48 am    Post subject: Reply with quote

its set for -Os because it minimises the overhead due to cache invalidation of the kernel butting in 100 (or 1500!) times a second
Back to top
View user's profile Send private message
ghetto
Guru
Guru


Joined: 10 Jul 2002
Posts: 369
Location: BC, Canada

PostPosted: Sun Jan 05, 2003 10:14 pm    Post subject: like usual Reply with quote

This is just my humble opinion but usual when it comes to such things the sensible path lies somewhere in the middle.

Options such as -funroll-loops and -O3 supposedly speed stuff up by optimizing the binarry itself, which has the sorry side effect of causing a great amount of bloat. The -Os optimization makes smaller binary's and thus they can load much faster because of the reduced footprint, but the program itself will not be as responsive when compared to something that is compiled with -O3. In either case, personally, i can barely "notice" the difference on a modern piece of hardware. Although i know there is a difference because I have experimented with different compiler options and compared the resulting binary sizes and compared how long it take to load the program, however judging the program responsivness once loaded is a bit harder as it would require some actual benchmarking.

Im not a compiler buff, feel free to refute me if im wrong. However please provide explaination if you do so.

my cflags:
"-march=athlon -O2 -pipe -frerun-loop-opt -frerun-cse-after-loop
-fexpensive-optimizations -fprefetch-loop-arrays -falign-functions=4
-Wno-deprecated"
With these options i believe ive managed to create a binary that is about 1/10th smaller but just as responsive once loaded(they even load faster) than something compiled with plain:
"-march=athlon -03 -pipe"
The flag im most suspicios of is '-falign-functions=4' ive read what it says about it in the manual but if someone could actually put it in simple terms and give an example of how this one works i would be much obliged, but i do know it is doing something good because my binarys are more responsive with out size increase.
_________________
Blizzard you suck.
Back to top
View user's profile Send private message
kerframil
l33t
l33t


Joined: 19 Apr 2002
Posts: 710
Location: London, UK

PostPosted: Sun Jan 05, 2003 10:43 pm    Post subject: Reply with quote

1500 sounds a bit excessive. YMMV, but I wouldn't go above 1000 unless there's a strong indication that it is beneficial, and would recommend starting at around the 500 mark. Red Hat uses 512hz for i686 kernels and personally, I trust Red Hat to pick something sensible.
Back to top
View user's profile Send private message
idl
Retired Dev
Retired Dev


Joined: 24 Dec 2002
Posts: 1728
Location: Nottingham, UK

PostPosted: Sun Jan 05, 2003 11:37 pm    Post subject: Reply with quote

You can also try using a different filesystem ReiserFS and XFS are fast popular alternatives to the extX filesystem.
Back to top
View user's profile Send private message
PhilCl
n00b
n00b


Joined: 02 Jan 2003
Posts: 15

PostPosted: Mon Jan 06, 2003 6:07 pm    Post subject: Reply with quote

to illustrate the point - I used all the optimisations I could. -funroll-loops etc inc options like foptimizesibling calls, It's great fome some samll routines but when applied to Xfree - It had a mem footprint of 80Mb - This all comes down to the tradeoff between cache size, cache architecture (hence alignment ) and the latency difference between registers cache memory and storage systems.

My feeling is that for a small function or program which fits onto a few cache lines - optimize it to hell but as soon as it becomes part of a larger program it causes more problems that it's worth, <it's only really better if the routine is run many times hence the advantage is worthwile otherwise the penalty for a cache miss is too large>

hope that makes a bit of sense - it's the computer architecture problem
Back to top
View user's profile Send private message
keratos68
Guru
Guru


Joined: 27 Dec 2002
Posts: 561
Location: Blackpool, Lancashire, UK.

PostPosted: Thu Jan 09, 2003 1:57 pm    Post subject: Reply with quote

rac wrote:
kanuslupus wrote:
Wouldn't compiling for speed, as opposed to Os, be more of an advantage with modern hardware?

People can and probably have written some PhD theses on similar subjects. Optimization is hard. Two places I'm aware of where you typically see compilers trade speed for space is in inlining functions and unrolling loops. The idea is to avoid the overhead of subroutine calls, setting up local stack frames, jumping, paging, etc. in the inlining case, and to avoid the comparison and branch steps in the loop unrolling.

Loops are not as costly as they used to be on earlier processors, thanks to speculative execution and branch prediction. Overaggressive inlining and unrolling can actually hurt perfomance, if things start spilling out of caches.

And every time I've tried to really optimize code, it came down to a few functions that needed special treatment. Choices that might be appropriate for those few inner loops would not be so for the entire program - it might bloat it so badly that it would take forever to load from disk and would cause the system to thrash unnecessarily.

helmers, do you have information as to why the gentoo-kernel patches (18_gcc3-compile-opts in particular) prefers -Os to -O2? I wonder if it might be because of GCC 3.x bugs wrt optimization?


Consider the following please:

o That larger (optimised via inline/unroll) code requires significant symbol resolution and lookups by the dynamic linker.

o Larger code occupies more disk space. Disk blocks may not (usually not) contiguous however this can be mitigaed to a point by sound partitioning principles - e.g 'tmp' directories mounted on different filesystem than 'data' directories. More blocks=longer load times=more disk/CPU/swap activity.

o Optimisations can be overly-aggressive, leading to seg faults or unstable O/S. Not always the case, but "-O4" has demonstrated (to me) that a number of servers/workstations here 'can' become 'unfriendly'.


Of course, all these factors can be mitigated by additional/upgraded hardware - I find that 2GB RAM and DualCPU mbo's assists in reducing such overheads. For those on the other side of the bleeding-edge, may I suggest, as our colleauge above does, the "-Os" option.

I am currently in the position of documenting this topic as part of my PhD, a white-paper to be published mid-year at Kings College London (KCL), Department of Computer Science. Naturally this addresses not just Gentoo, not even Linux, but current methods & practice employed by todays O/S's.
_________________
Someone told me that "..they only ever made one mistake...."

...and that's when they said they were wrong!!
Back to top
View user's profile Send private message
red_over_blue
Guru
Guru


Joined: 16 Dec 2002
Posts: 310

PostPosted: Thu Jan 09, 2003 2:20 pm    Post subject: Reply with quote

Dazzle68,

You obviously seem to know what you are talking about. What cflags would you recommend for a 1.4GHz Athlon T-Bird with 512 megs of ram and 1024 megs of swap? I would like stability over speed, but currently only use

CFLAGS="-mcpu=i686 -O2 -pipe -fomit-frame-pointer"

since I was reading another post about -O3 introducing some kind of software/hardware performance deficiency as compared to -O2.

I know I could read the entire man page for gcc... but it is very cryptic to someone who is not a fulltime/hobby programmer. I have programing knowledge, but not to that extent.



Thanks for any reply.
Back to top
View user's profile Send private message
kerframil
l33t
l33t


Joined: 19 Apr 2002
Posts: 710
Location: London, UK

PostPosted: Thu Jan 09, 2003 8:15 pm    Post subject: Reply with quote

Quote:
o That larger (optimised via inline/unroll) code requires significant symbol resolution and lookups by the dynamic linker.

Very interesting indeed.
Quote:
o Larger code occupies more disk space. Disk blocks may not (usually not) contiguous however this can be mitigaed to a point by sound partitioning principles - e.g 'tmp' directories mounted on different filesystem than 'data' directories. More blocks=longer load times=more disk/CPU/swap activity.

I agree with this. It's all too easy to just put the entire root filesystem on one partition, but given enough space there are some pretty legitimate reasons for doing so. I think some people are put off by a lack of basic grounding in terms of choosing appropriate partition sizes for the various elements of the filesystem layout. Furthermore, I am slightly bothered by Gentoo's habit of keeping certain things outside of /var when it probably shouldn't - although one can modify things easily enough.

Quote:
Of course, all these factors can be mitigated by additional/upgraded hardware - I find that 2GB RAM and DualCPU mbo's assists in reducing such overheads. For those on the other side of the bleeding-edge, may I suggest, as our colleauge above does, the "-Os" option.

I was wondering, is it not the case that simply "-O" can be a reasonable compromise also?
Back to top
View user's profile Send private message
keratos68
Guru
Guru


Joined: 27 Dec 2002
Posts: 561
Location: Blackpool, Lancashire, UK.

PostPosted: Thu Jan 09, 2003 9:41 pm    Post subject: Reply with quote

red_over_blue wrote:
Dazzle68,

You obviously seem to know what you are talking about. What cflags would you recommend for a 1.4GHz Athlon T-Bird with 512 megs of ram and 1024 megs of swap?


Gosh, I'm only one of many SW and Sys Engineers here, the beauty of Engineering, is that it is artistic, innovative and a TEAM EFFORT. I'm new to Gentoo but do have experience in *nix,AS400,RISC,Windows,CPM and PRIME. I think it would be innappropriate for me to recommend a configuration for a system that I have had little experience with - in terms of architectural analysis. I think what you have red_over_blue is perhaps inline with your desire/goal for stability+performance. There are many flags that can be employed to inact various performances by the GCC compiler, and you are spot-on - there's so many I think one could devote a "lifetime" - I've spent the best part of 26 months to-date on it, but I'll be throwing the towel in soon :)

kerframil wrote:
I agree with this. It's all too easy to just put the entire root filesystem on one partition, but given enough space there are some pretty legitimate reasons for doing so. I think some people are put off by a lack of basic grounding in terms of choosing appropriate partition sizes for the various elements of the filesystem layout. Furthermore, I am slightly bothered by Gentoo's habit of keeping certain things outside of /var when it probably shouldn't - although one can modify things easily enough.


Totally agree - further, I don't know about you Kerframil but I believe that perhaps Gentoo relies too heavily on use of */share , /etc directories in the "end-user" configuration rather than leaving these for BASE Gentoo System stuff. For example, perhaps components such like DNS,NAT,NETFILTER,IPCFG,Sound etc. would be better located under /var ??? Maybe the Gentoo Devs might consider this in a future release.

Oh, and "-O" flag alone, correct me if I am wrong, but on GNU GCC, this should instruct the compiler phases to reduce the cost of compilation (CPU/HDD/SWAP) and ensure that debugging is simplified by the inclusion of certain run-time enablers, such as contexts & frames. I'm not sure this would be the same as "-O[1-5]", personally I use "-Os" to reduce disk usage, load time and swap activity. To mitigate any loss in performance, I installed a dual CPU Mbo!! It seems reasonable.

Thanks for an interesting thread guys :)
_________________
Someone told me that "..they only ever made one mistake...."

...and that's when they said they were wrong!!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum