Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Is -O3 in speed increase worth the cost in disk size?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
Antimatter
Guru
Guru


Joined: 11 Aug 2003
Posts: 463

PostPosted: Mon Jan 31, 2005 10:39 am    Post subject: Is -O3 in speed increase worth the cost in disk size? Reply with quote

Right now i got my system compiled on an pretty minimalist userflag setting along with an reasonable imho cflag setting.

my cflag setting is mainly "mcpu=pentium4, march=pentium4, O3, pipe" and one more cflag i think removal of debugging symbols.

user flag is similar to this "-* ssl pam mysql .... " and few others i mainly use local userflag for my system, its much more flexability than an global userflag settings.

and the overall system spec is pentium 4 2.8ghz w/ hyperthreading, 2 gig of ram, and right now 36 gig 10k raptor sata harddrive for the gentoo installation.

anyway its an fairly standard user system, got open office installed, a game or two, and some programing IDE such as eclipse and so forth, running fluxbox as WM on xorg and so forth.

anyway when i have some time and have my hand on my new firewall/router/fileserver i'll be doing an rebuild of the system by removing certain userflags and certain packadges that aren't needed on this computer aka it will be transfered over to the fileserver computer. i thought it could be good time to maybe review my cflags setting, and do a little bit of researching on the O# settings.

on my old laptop 366 mhz i ran the system with Os setting and it ran wonderful.

anyway what i want to know is the larger library and larger binary worth the speed increase in using O3 on my system or would it be better off for me to try something more sane such as O2 or maybe even Os for the O flag.

is there any good type of general benchmark that compares the difference between Os, O1, O2, O3 in performance at runtime, vs loading time, vs diskspace?
Back to top
View user's profile Send private message
Jinidog
Guru
Guru


Joined: 26 Nov 2003
Posts: 593
Location: Berlin

PostPosted: Mon Jan 31, 2005 11:11 am    Post subject: Reply with quote

So, let's do it.

Quote:

AMD2800+ bin # CFLAGS="-march=athlon-xp -Os -pipe -fomit-frame-pointer" emerge nbench
...
AMD2800+ bin # ls -l nbench
-rwxr-xr-x 1 root root 36060 31. Jan 11:52 nbench
AMD2800+ bin # nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 1632.4 : 41.86 : 13.75
STRING SORT : 106.92 : 47.77 : 7.39
BITFIELD : 2.0411e+08 : 35.01 : 7.31
FP EMULATION : 97.96 : 47.01 : 10.85
FOURIER : 17067 : 19.41 : 10.90
ASSIGNMENT : 18.847 : 71.72 : 18.60
IDEA : 2357.2 : 36.05 : 10.70
HUFFMAN : 1239.5 : 34.37 : 10.98
NEURAL NET : 25.529 : 41.01 : 17.25
LU DECOMPOSITION : 576.24 : 29.85 : 21.56
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 43.453
FLOATING-POINT INDEX: 28.749
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : AuthenticAMD AMD Athlon(tm) XP 2800+ 2140MHz
L2 Cache : 512 KB
OS : Linux 2.6.10-gentoo-r4
C compiler : 3.4.3
libc :
MEMORY INDEX : 10.020
INTEGER INDEX : 11.505
FLOATING-POINT INDEX: 15.945
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38


Quote:

AMD2800+ ktvtoday # CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer" emerge nbench
AMD2800+ bin # ls -l nbench
-rwxr-xr-x 1 root root 48348 31. Jan 11:58 nbench
AMD2800+ bin # nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 1768.3 : 45.35 : 14.89
STRING SORT : 132.53 : 59.22 : 9.17
BITFIELD : 4.3283e+08 : 74.25 : 15.51
FP EMULATION : 135.72 : 65.12 : 15.03
FOURIER : 19637 : 22.33 : 12.54
ASSIGNMENT : 23.475 : 89.33 : 23.17
IDEA : 3362.6 : 51.43 : 15.27
HUFFMAN : 1392.2 : 38.61 : 12.33
NEURAL NET : 25.648 : 41.20 : 17.33
LU DECOMPOSITION : 1097.5 : 56.86 : 41.06
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 58.350
FLOATING-POINT INDEX: 37.400
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : AuthenticAMD AMD Athlon(tm) XP 2800+ 2140MHz
L2 Cache : 512 KB
OS : Linux 2.6.10-gentoo-r4
C compiler : 3.4.3
libc :
MEMORY INDEX : 14.878
INTEGER INDEX : 14.327
FLOATING-POINT INDEX: 20.743
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38


As we see, the binary was 30% bigger with O3 than Os.
The MEMORY INDEX and INTEGER INDEX (last one) ran 40% faster, the FLOATING POINT INDEX (both) and the INTEGER INDEX (first one) were 30% faster.
So 30% more binary size made the binarys around 30% faster.

Let's look from another side.
Big binaries causes higher loadingtimes and boottimes, but when they are in memory (and you have enough) they stay there and their size is no problem.
I'm personally using -O3, but this benchmark surprised me a bit.

So, if somebody can proove me, that -Os on a P3 650 with a Reading Speed of 17 MB/s is faster than -O3, I will switch.
_________________
Just unused Microsoft-Software is good Microsoft-Software
Back to top
View user's profile Send private message
Gentree
Watchman
Watchman


Joined: 01 Jul 2003
Posts: 5350
Location: France, Old Europe

PostPosted: Mon Jan 31, 2005 6:17 pm    Post subject: Reply with quote

general wisdom on this suggests that -O2 plus a couple of others is a better trade off than either extreme

-Os would probably only be used for a v.small specialised system where you dont have a choise due to limited hardware.

I spent about a month playing with cflags about a year ago and "wasted" more time than I will ever gain from having a faster system. But it was fun.

Then I moved from gcc 3.3 to 3.4 and all my analysis became invalid. :lol:


@Antimatter

It seems you may be a bit confused about the USE variable. This is not like CFLAGS. In general it just select some build options. (Though sometimes ones like -kde -gnome can lighten things considerably.) One that is worth looking at is ntpl and ntplonly. You can search for the details.

HTH
_________________
Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86
Back to top
View user's profile Send private message
Antimatter
Guru
Guru


Joined: 11 Aug 2003
Posts: 463

PostPosted: Mon Jan 31, 2005 6:53 pm    Post subject: Reply with quote

jinidog: ooo cool test i'll have to check that out and play around with it to see what sort of information i can get out of it and so forth.

gentree: i know but overally an system with minimal useflags will ultimatly be more secure because there are less code and less code means less bug thus less chance of an export and so forth. now its not always extreme but in the end imho any execess that is chopped off is an min well spent imho. i myself like to run minimalist system and it tends to pain me if i'm forced to emerge heaps of packadges to get something to work.


i'm just curious as of what compile options are avaiable and when i rebuild that system i am planning on looking though the gcc manual to see what option are avaiable to me but in the end i probably will ultimatly settle for my default cflags that i already had
Back to top
View user's profile Send private message
Jinidog
Guru
Guru


Joined: 26 Nov 2003
Posts: 593
Location: Berlin

PostPosted: Mon Jan 31, 2005 8:25 pm    Post subject: Reply with quote

Here you find the flags:
http://gcc.gnu.org/onlinedocs/gcc-3.4.3/gcc/Optimize-Options.html#Optimize-Options

The ones not implied by any -O level are at the bottom.
Generally, anything else than -O3 -fomit-frame-pointer and -funroll-loops will not improve the performance that much.
I benchmarked a lot and ended with theses CFLAGS:
CFLAGS="-march=athlon-xp -O3 -pipe -ftracer -fomit-frame-pointer -frerun-cse-after-loop -ffast-math -funroll-loops -fgcse -mfpmath=387,sse -fforce-addr -frerun-loop-opt -fmove-all-movables -funit-at-a-time"
They are around 10% faster in nbench than the just -march, O3 and fomit-frame-pointer.
But who knows wether they are faster in really world applications, I cannot really benchmark it.
Perhaps I really should lower them.
With them, the binary is 73 KB. (what is 100% more than with Os)

That is caused by -funroll-loops
_________________
Just unused Microsoft-Software is good Microsoft-Software
Back to top
View user's profile Send private message
Gentree
Watchman
Watchman


Joined: 01 Jul 2003
Posts: 5350
Location: France, Old Europe

PostPosted: Mon Jan 31, 2005 8:30 pm    Post subject: Reply with quote

1) less items in USE variable does not mean less code. Banal eg. if you compile qtparted with -xfs -jfs it has less code. No USE will include this support.

equally , and less banally, if you dont use kde, adding -kde -kde-games -kde-edu -kde-multimedia will really lighten your load.

Also default behaviour is now to include both linux-threads AND nptl so adding what I suggest will cut major cruft if you are on nptl.


2) to keep things tidy use /etc/portage/package.use to supply USE specs for individual packages . Many of the USE values only apply to one specific package , eg. 3dnowex and sse2 are only applied to mplayer so little point in bloating USE in make.conf.



3) if you want to experiment , check out emerge acovea (see forum for more info) and also on the acovea web site is a nice table listing all the compiler options and exactly what is includes by Os, O1.,O2 adn O3. So well worth a read if you want to play.


HTH with your minimalist aspirations. :wink:
_________________
Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86
Back to top
View user's profile Send private message
spb
Retired Dev
Retired Dev


Joined: 02 Jan 2004
Posts: 2135
Location: Cambridge, UK

PostPosted: Mon Jan 31, 2005 8:35 pm    Post subject: Reply with quote

The best CFLAGS setting depends a lot upon the circumstances. If you're running SMP on x86, then caching is a major issue and you want your binaries to be as small as possible, so -Os is probably best there. That's mainly a result of the fact that x86 as an architecture sucks. If you're running on P4s or other chips with massively long pipelines, then larger binaries might be helpful since you want the pipeline to be as full as possible. Again, that's mainly a result of x86 being a crappy architecture. Other flags will vary drastically depending on the particular code-- a flag that makes one program fast will likely not affect another program, and might well slow it down.

Bottom line: don't do anything fancy with CFLAGS unless you know exactly what you are doing, and 99% of people don't. Stick with "-march=<blah> -O2 -fomit-frame-pointer -pipe" unless you have a very good reason not to.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum