View previous topic :: View next topic |
Author |
Message |
yngwin Retired Dev
Joined: 19 Dec 2002 Posts: 4572 Location: Suzhou, China
|
Posted: Thu Aug 26, 2004 6:01 pm Post subject: Re: Why specify -fomit-frame-pointer when its implied -O/-O2 |
|
|
carpman wrote: | From the comments and reading i have done -02 would be best! |
I don't think so! There is no such thing as -02... You probably mean -O2, this is a common mistake... _________________ "Those who deny freedom to others deserve it not for themselves." - Abraham Lincoln
Free Culture | Defective by Design | EFF |
|
Back to top |
|
|
augury l33t
Joined: 22 May 2004 Posts: 722 Location: philadelphia
|
Posted: Sat Aug 28, 2004 6:47 pm Post subject: |
|
|
CFLAGS="--param max-unrolled-insns=16 -fsingle-precision-constant -DNO_DEBUG -g0 -funroll-loops -ffast-math -finline-limit=4096 -march=pentium4 -fomit-frame-pointer -pipe -ftracer -O3 -fmerge-all-constants -fprefetch-loop-arrays"
CXXFLAGS="-fabi-version=0 -ffunction-sections -DNO_DEBUG -g0 --param max-unrolled-insns=16 -funroll-loops -ffast-math -finline-limit=4096 -march=pentium4 -fomit-frame-pointer -fprefetch-loop-arrays -pipe -fsingle-precision-constant -ftracer -O3 -fmerge-all-constants -fpermissive -fno-enforce-eh-specs"
I've been used these flags to compile boxes with gcc-3.3.3 and gcc-3.3.4 and they've worked well on almost all ebuilds.
gcc won't build with -DNO_DEBUG and -g0 because it uses the debugger to build itsself backwards. i dont think it allows -fsingle-precision-constant to go through the autoconf either.
glibc won't let ffast-math through and --param's get bjorked in its makefiles somehow. same thing happens in openoffice but only on one section.
usually if a module specificly needs double precision it stops and the the compile script can be cut and pasted minus -fsingle-precision-constant. when the modules built the ebuild can be resumed with no harm done (anyone know if theres a way around this?). java-script decides weather or not to use single or double precision at run time though so it can be broken but not stop the build. kjs in kdelibs and mozkjs (i think) in mozilla (and thunderbird and firefox) are the only js engine i know of.
tetex can be broken by O3 sometimes.
i use --param max-unrolled-insns=4 for pentium3's , only because intel says that those are the most loops that do any good and i figure they'd know. i dont even know if its loops or instructions in the that im limiting but it keep the bloat down a little and should still allow for loop prediction. my desktop runs nicer than windows ever did so i guess im happy enough compiling this way. |
|
Back to top |
|
|
augury l33t
Joined: 22 May 2004 Posts: 722 Location: philadelphia
|
Posted: Sat Aug 28, 2004 6:55 pm Post subject: |
|
|
oh yeah. i tryed gcc-3.4.0. it compiles much faster and everything runs faster but it seem like things got mixed up with two different runtimes. i remember kdedev couldn't link properly. i dont know if its something i did or if the aclocal files and such needed to be rebuilt or what. |
|
Back to top |
|
|
blackcat4 n00b
Joined: 27 Nov 2003 Posts: 8 Location: UK
|
Posted: Sat Aug 28, 2004 11:39 pm Post subject: Benchmark script for cflags and compilers. |
|
|
I've written a small script to test different cflags combinations (or even differnent versions of gcc). If anyone is interested, it's at:
http://blackcat.ca/dist/bench_gcc
To see how to use it, run it with:
|
|
Back to top |
|
|
Isaiah Guru
Joined: 25 Feb 2003 Posts: 359
|
Posted: Sun Sep 05, 2004 2:27 am Post subject: |
|
|
nmcsween wrote: | Remember these are HIGHLY optimized flags and some packages won't compile with them. You'll need to take out a flag or two to get some packages to compile, try this on compiles that fail -march=X -O2 -pipe -ftracer -fno-crossjumping -maccumulate-outgoing-args -fmove-all-movables -freduce-all-givs if it's still giving you problems then you know it has to do with one of the last three. |
One needs only to remove "-freduce-all-givs" from your "highly" optimized flags for glibc-2.3.4.20040808 to compile cleanly - for what it's worth, I'm rebulding this system and haven't had to use the "reduced" flags yet
Edit: The same goes for "libfame-0.9.0-r1" (remove "-freduce-all-givs")
Edit: Just found out compiling "binutils" with "-freduce-all-givs" is a bad thing (will get errors compiling kernels)
Edit: Emerging bash-3.0-r6 with "-freduce-all-givs" is a VERY BAD THING (emerge is not working)
Last edited by Isaiah on Thu Sep 30, 2004 6:01 pm; edited 1 time in total |
|
Back to top |
|
|
drakkan Apprentice
Joined: 21 Jun 2004 Posts: 232
|
Posted: Sun Sep 12, 2004 7:45 am Post subject: |
|
|
hi all
I'm buying a new laptop I choseed a pentium M 735 (1.70 GHz, 2 MB L2
Cache, 400 MHz FSB), my problen is what cflags I must use.
I read a lot of things, my conclusion is that for pentium M the best cflags
are the following:
CHOST="i686-pc-linux-gnu"
CFLAGS="-march=pentium3 -pipe -O2 -fomit-frame-pointer -fforce-addr
-frename-registers -fprefetch-loop-arrays -falign-functions=64"
CXXFLAGS="${CFLAGS}"
I would like to choise -O2 because I read that the best optimization (-O3)
produce binary of great dimension and so the application are slow to start;
If I use gcc3.4 (I change make.profile link
to /usr/portage/profiles/default-linux/x86/2004.2/) I can use
-march=penitium-m ?
Actually I have a P3 1Ghz 512Mb Ram Geforce2 Go 32Mb, I have compiled gentoo
from stage1 with the following make.conf:
CFLAGS="-O3 -march=pentium3 -pipe -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
CXXFLAGS="${CFLAGS}"
MAKEOPTS="-j2"
USE="-gtk -gnome qt kde alsa samba acl php perl cups mysql pam-mysql imap
libwww maildir sasl ssl"
FEATURES="sandbox strict sfperms"
LINGUAS="it"
ALSA_CARDS="maestro3"
GENTOO_MIRRORS="http://www.die.unipd.it/pub/Linux/distributions/gentoo-sources/"
some application such as mozilla seems slow to start (dma is enabled); I
thinks for -O3 option. Is corretct to use -O3 or -O2 in make.conf and then
build application such as mozilla, openoffice with -Os to reduce startup
time? For other huge package such as xorg or kde the default make.conf option
are ok or also in this case -Os is better then -O3?
Somebody tells pkg are the best choise because the system is up in few time
and this packages are built by expert developers and so they have package
specific build options that optimize them and make a very speed system, isn't
it?.
I'm very confused so I would like to have the opinion of
gentoo's expert.
Thanks
drakkan |
|
Back to top |
|
|
d4n1el Tux's lil' helper
Joined: 21 Jun 2004 Posts: 76
|
Posted: Sun Sep 12, 2004 8:59 am Post subject: |
|
|
I think I don't have optimizied settings... if someone knows what is best, i will be glad for some advices...
I've got a Athlon xp 2500+ Barthon
My setting is:
Quote: | CFLAGS="-O2 -mcpu=i686 -fomit-frame-pointer"
CHOST="i386-pc-linux-gnu"
CXXFLAGS="${CFLAGS}"
|
Some tips? |
|
Back to top |
|
|
spb Retired Dev
Joined: 02 Jan 2004 Posts: 2135 Location: Cambridge, UK
|
Posted: Sun Sep 12, 2004 12:11 pm Post subject: |
|
|
To start with, ditch -mcpu=i686, and replace it with -march=athlon-xp -mcpu=athlon-xp (normally this would be redundant (-march= implies -mcpu), but some ebuilds filter -march and don't replace it with -mcpu). -O2 is probably a decent enough setting if you don't have a particular reason for wanting anything else. -fomit-frame-pointer is a good option to have, so keep that. Then add -pipe to speed up compilation. So, we end up with "-O2 -march=athlon-xp -mcpu=athlon-xp -fomit-frame-pointer -pipe".
Next thing: your CHOST. Really that ought to be i686-pc-linux-gnu, but changing it is a non-trivial task. If you want to change it, then do so, but you might have to rebootstrap and rebuild some stuff afterwards. It's probably not worth it really. |
|
Back to top |
|
|
nxsty Veteran
Joined: 23 Jun 2004 Posts: 1556 Location: .se
|
Posted: Sun Sep 12, 2004 6:18 pm Post subject: Re: Benchmark script for cflags and compilers. |
|
|
blackcat4 wrote: | I've written a small script to test different cflags combinations (or even differnent versions of gcc). If anyone is interested, it's at:
http://blackcat.ca/dist/bench_gcc
To see how to use it, run it with:
|
Cool! Thanks
It would be nice if it also could display changes in binary size! |
|
Back to top |
|
|
el_compa n00b
Joined: 28 Jan 2004 Posts: 65 Location: France
|
Posted: Mon Sep 13, 2004 6:11 pm Post subject: Help/comments on specs for a Celeron (P4 core) |
|
|
Hi, I've got a Celeron @2.2GHZ (P4 core) system, 628MB in RAM.
I'm using gcc 3.4.2-r1
My CFLAGS are:
CFLAGS="-O2 -pipe -ftracer -march=pentium3 -fmove-all-movables -fomit-frame-pointer -fprefetch-loop-arrays"
I'm using -march=pentium3 because of crashes with qt apps.
Please, can anyone comment on these CFLAGS. I used to use these (and more flags) because I found them using acovea, but for gcc 3.3.x. Right now I dont want to let the machine run acovea for three days to find the best possible flags, so I'm hoping someone else has .
Thnks,
Mario |
|
Back to top |
|
|
AceOfAces_TS n00b
Joined: 13 Sep 2004 Posts: 31 Location: Minnesota, USA
|
Posted: Mon Sep 13, 2004 9:38 pm Post subject: AthlonXP-m |
|
|
What would you do for a Athlon XP-m (moble) with 512mb ram on GCC 3.3?
Heres a /proc/cpuinfo
Code: | processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 10
model name : mobile AMD Athlon(tm) XP2800+
stepping : 0
cpu MHz : 2120.194
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow
bogomips : 4194.30 |
I have been running
Code: | CFLAGS="-O3 -mcpu=athlon-xp -march=athlon-xp -pipe -mfpmath=sse -mmmx -msse -m3dnow -fforce-addr -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -falign-functions=4 -maccumulate-outgoing-args -ffast-math -fprefetch-loop-arrays" |
Any thing better i can do to it? |
|
Back to top |
|
|
nxsty Veteran
Joined: 23 Jun 2004 Posts: 1556 Location: .se
|
Posted: Wed Sep 15, 2004 11:43 am Post subject: Re: AthlonXP-m |
|
|
AceOfAces_TS wrote: | Code: | CFLAGS="-O3 -mcpu=athlon-xp -march=athlon-xp -pipe -mfpmath=sse -mmmx -msse -m3dnow -fforce-addr -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -falign-functions=4 -maccumulate-outgoing-args -ffast-math -fprefetch-loop-arrays" |
Any thing better i can do to it? |
Having a lot of optimizations only bloats your binaries and makes them slower. Remove -funroll-loops and replace -O3 with -O2 for a start.
These flags can also be removed since they are implied by -O* and -march:
-mmmx -msse -m3dnow -frerun-loop-opt -maccumulate-outgoing-arg -mcpu=athlon-xp -frerun-cse-after-loop |
|
Back to top |
|
|
Deranger Veteran
Joined: 26 Aug 2004 Posts: 1215
|
Posted: Thu Sep 16, 2004 1:23 pm Post subject: |
|
|
Currently using:
Code: |
CFLAGS="-O2 -march=athlon-xp -fomit-frame-pointer -ftracer -fprefetch-loop-arrays -pipe"
CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden"
|
Well-tested flags for GCC 3.4, posted by robmoss, one post below
Last edited by Deranger on Sat Oct 02, 2004 11:58 am; edited 2 times in total |
|
Back to top |
|
|
robmoss Retired Dev
Joined: 27 May 2003 Posts: 2634 Location: Jesus College, Oxford
|
Posted: Sat Sep 18, 2004 8:43 am Post subject: |
|
|
Don't know if anyone's bothered, but if you're using GCC 3.4, these are very well tested, as I'm a gcc-porting dev and these are mine. Nothing should break. For my Athlon XP:
Code: | CFLAGS="-O2 -march=athlon-xp -fomit-frame-pointer -ftracer -fprefetch-loop-arrays -pipe"
CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden" |
And my Athlon 64:
Code: | CFLAGS="-O2 -march=athlon64 -ftracer -fprefetch-loop-arrays -pipe"
CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden" |
You can probably replace -O2 with -O3 if you like. But the increase in compile time is nontrivial, and considering how much time I spend compiling stuff, that's not acceptable for me... _________________ Reality is for those who can't face Science Fiction.
emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts |
|
Back to top |
|
|
LockeAverame Tux's lil' helper
Joined: 14 Jul 2003 Posts: 108
|
Posted: Sat Sep 18, 2004 10:57 am Post subject: |
|
|
i would suggest -funroll-loops in nearly all benchmarks it gives a 10% or more speedup.
mostly -O3 is faster than -O2 but not in all cases. with some benchmark rotations where i test with freebench and nbench with different cflagcombinations automatically, i found that -mfpmath=sse gives a 3% optimization. all numbers mentioned here are averages over all testcases in these benchmarks. -ftracer is mostly doing not very much, the same is right for -fmove-all-movables.
-ffast-math should only be used if you know what you are doing.
my cflags for athlon-xp and gcc-3.3.4 are:
-O3 -march=athlon-xp -mfpmath=sse -ffast-math -fmove-all-movables -fomit-frame-pointer -fprefetch-loop-arrays -ftracer -funroll-loops -pipe
everything works fine. with gcc-4.0.0 -fmove-all-movables should be erased from this, because it has a new vectorizing and peeling loop-unroller.
the binaries get bigger, but many people underrestimate a 512KB cache of a barton core, it's huge if you think of functions. gcc-4.0.0 will improve many parts of code, but its a pity that some optimizations wouldn't make it in, especially the aligning part. i hope that the vectorizer will do a good job. i will do some tests after a stable release comes out. upgrading to gcc-3.4 would be done if some big regressions are fixed. |
|
Back to top |
|
|
KayZee Apprentice
Joined: 15 Oct 2003 Posts: 202 Location: Arlington, VA
|
Posted: Sun Sep 19, 2004 7:23 pm Post subject: |
|
|
If I make changes to CGLAG values in make.conf, do I have re-compile everything with the new CFLAG values? Or can the existing packages use the CFLAGs they were compiled with and new or updated packages use the new CFLAG values? Will Gentoo become unstale if mixed CFLAG values are used?
--Karl |
|
Back to top |
|
|
LockeAverame Tux's lil' helper
Joined: 14 Jul 2003 Posts: 108
|
Posted: Mon Sep 20, 2004 3:23 pm Post subject: |
|
|
no, mostly it won't affect your system. only some flags like -malign-double or other aligns which destroy the normal binaryformats will lead to problems in this manner, but you shouldn't use these flags in your system either, they should only be used for special applications like numbercrunching which don't need to link to existing libraries. |
|
Back to top |
|
|
Gentree Watchman
Joined: 01 Jul 2003 Posts: 5350 Location: France, Old Europe
|
Posted: Wed Sep 22, 2004 3:37 pm Post subject: |
|
|
robmoss wrote: | Don't know if anyone's bothered, but if you're using GCC 3.4, these are very well tested, as I'm a gcc-porting dev and these are mine. Nothing should break. For my Athlon XP:
Code: | CFLAGS="-O2 -march=athlon-xp -fomit-frame-pointer -ftracer -fprefetch-loop-arrays -pipe"
CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden" |
And my Athlon 64:
Code: | CFLAGS="-O2 -march=athlon64 -ftracer -fprefetch-loop-arrays -pipe"
CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden" |
You can probably replace -O2 with -O3 if you like. But the increase in compile time is nontrivial, and considering how much time I spend compiling stuff, that's not acceptable for me... |
Hi Rob,
re GCC 3.4 on athlon-xp : what happened to -mtune=athlon-xp ? Before the summer I thought this was considered hot flags.
Has it fallen into illrepute?
Thanks. _________________ Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86 |
|
Back to top |
|
|
tcbounce Tux's lil' helper
Joined: 18 Nov 2003 Posts: 86 Location: South Korea
|
Posted: Wed Sep 22, 2004 4:46 pm Post subject: -fstack-protector and -O3 or higher? |
|
|
Hello,
I heard -fstack-protector is always on in hardened gentoo profiles.
Is it safe to use this option with higher optimisation than -O2. I read that using higher than -O2 can break things at runtime and during linking. The Debian and LinuxfromScratch boys and girls have been talking about it.
http://www.trl.ibm.com/projects/security/ssp/node4.html <- see optimization section update June 2004
Cheers,
Luke |
|
Back to top |
|
|
asph l33t
Joined: 25 Aug 2003 Posts: 741 Location: Barcelona, Spain
|
Posted: Wed Sep 22, 2004 5:28 pm Post subject: |
|
|
my new cflags for gcc 3.4.1-r2
Code: | CFLAGS="-O3 -march=pentium-m -mtune=pentium-m -pipe -ftracer -fomit-frame-pointer" |
finally i can use pentium-m, all packages compile without problems and run smoothly _________________ gentoo sex is updatedb; locate; talk; date; cd; strip; look; touch; finger; unzip; uptime; gawk; head; emerge --oneshot condom; mount; fsck; gasp; more; yes; yes; yes; more; umount; emerge -C condom; make clean; sleep |
|
Back to top |
|
|
LockeAverame Tux's lil' helper
Joined: 14 Jul 2003 Posts: 108
|
Posted: Wed Sep 22, 2004 11:44 pm Post subject: |
|
|
http://www.trl.ibm.com/projects/security/ssp/node4.html only says that some protections can't be established, and mostly the code doesn't get crappy except of xorg-x11 which doesn't like -fstack-protector at all (modules fail).
most of the optimizations shouldn't kill the stack-protection (even though it is not of very much help, but better than nothing).
-mtune is the same as -mcpu, there is no magic behind it.
and after all it's not a good idea to tune your system to a behaviour you don't know. mostly it's very annoying if people use flags like -ffast-math or -malign-double without knowing anything about their meaning. |
|
Back to top |
|
|
Gentree Watchman
Joined: 01 Jul 2003 Posts: 5350 Location: France, Old Europe
|
Posted: Thu Sep 23, 2004 12:33 am Post subject: |
|
|
Quote: | -mtune is the same as -mcpu, there is no magic behind it. |
I'm sure I knew that when I was doing all that stuff, but it's a while back. Thanks for reminding me.
I spent quite a lot of time tuning with acovea tests to optimise the flags. Then moved to gcc 3.4 and it all became irrelevant.
But I did establish you quickly waste more time than you will ever gain with a faster computer even if it's fun trying.
-o2 -pipe plus a couple of others can give immediate gains but after that it's the law of deminishing returns.
_________________ Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86 |
|
Back to top |
|
|
blackcat4 n00b
Joined: 27 Nov 2003 Posts: 8 Location: UK
|
Posted: Thu Sep 23, 2004 1:48 am Post subject: Re: Benchmark script for cflags and compilers. |
|
|
nxsty wrote: | It would be nice if it also could display changes in binary size! |
Done, the new version gives output like:
Code: |
Stats Method: average
Number of iterations: 3
Base Path: default
Base Options: -O1
Base Compile Time: 40.37 seconds
Base Run Time: 33.91 seconds
Base Code Size: 319191 bytes
Peak Path: default
Peak Options: -O3
Peak Compile Time: 48.57 seconds
Peak Run Time: 30.86 seconds
Peak Code Size: 328284 bytes
Diff Compile 8.20 seconds (16.88% increase)
Diff Run: 3.05 seconds (-9.87% decrease)
Diff Code Size: -9093 bytes (2.77% increase)
|
Available from: http://blackcat.ca/lifeline/query.php?tag=BENCHGCC |
|
Back to top |
|
|
Larcen Apprentice
Joined: 21 Mar 2004 Posts: 174
|
Posted: Thu Sep 23, 2004 3:34 am Post subject: |
|
|
I've been using the same CFLAGS since I first built my Gentoo, sadly I never thought about changing them. Other than the -fomit flag, can anyone else recommend flags to help speed my Machine along? Also, should I upgrade GCC to a masked version, other than latest stable? If so, which ebuild?
Current Flags:
Code: | CFLAGS="-O2 -march=pentium4"
CHOST="i686-pc-linux-gnu"
CXXFLAGS="${CFLAGS}" |
Computer Specs:
Code: | P4 1.6
1.5mb PC2100 DDR RAM
Maxtor 160gig HDD, 8mb cache @ 7200rpm |
She isn't exactly slow as it is, but any performance enhancement would be well appreciated. |
|
Back to top |
|
|
gentood Apprentice
Joined: 16 Mar 2004 Posts: 157 Location: Sweden
|
Posted: Thu Sep 23, 2004 8:03 pm Post subject: |
|
|
Quote: |
i would suggest -funroll-loops in nearly all benchmarks it gives a 10% or more speedup
|
I dont agree with you there, well not according to the gentoo-wiki:
Quote: |
Layout Pentium-4
I've also tested acovea (http://www.coyotegulch.com/products/acovea/index.html) against the Pentium-4 layout, so here are the results:
optimistic options:
-fno-if-conversion2 (1.291)
-foptimize-sibling-calls (1.0
-fcse-follow-jumps (1.417)
-fgcse (2.261)
-frerun-cse-after-loop (1.46)
-fschedule-insns (1.164)
-fstrict-aliasing (1.333)
-freorder-functions (1.0
-frename-registers (1.417)
-mno-align-stringops (1.164)
-minline-all-stringops (1.544)
pessimistic options:
-fno-if-conversion (-1.619)
-fstrength-reduce (-1.071)
-fpeephole2 (-1.534)
-fschedule-insns2 (-1.197)
-falign-labels (-1.113)
-funroll-loops (-1.703)
-funroll-all-loops (-1.703)
-mfpmath=sse (-1.956)
-mfpmath=sse,387 (-1.914)
-fomit-frame-pointer (-1.619)
-momit-leaf-frame-pointer (-1.534)
-funsafe-math-optimizations (-1.028)
|
acovea is the benchmark program used.
But then again, you use different bench programs, so that could be the reason. |
|
Back to top |
|
|
|