View previous topic :: View next topic |
Author |
Message |
sleek n00b
Joined: 09 Jan 2003 Posts: 71
|
Posted: Mon Feb 09, 2004 2:42 am Post subject: |
|
|
What would be the best CFLAGS line for my CPU based on the information below:
Code: | craig@sleekdesign craig $ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Celeron (Coppermine)
stepping : 3
cpu MHz : 593.202
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1182.92 |
_________________ Yesterday was the deadline for all complaints |
|
Back to top |
|
|
soaringcondor Tux's lil' helper
Joined: 16 Dec 2003 Posts: 103
|
Posted: Mon Feb 09, 2004 6:58 am Post subject: |
|
|
Actually -Os will disable a couple of functions that -O2 enabled, other than that they are basically the same. Those functions speed up the program but in doing so increase the file size. |
|
Back to top |
|
|
robmoss Retired Dev
Joined: 27 May 2003 Posts: 2634 Location: Jesus College, Oxford
|
Posted: Mon Feb 09, 2004 4:50 pm Post subject: |
|
|
I'd be interested to see that gimp benchmark run with -O3 and with -O3 -ftracer and would wager that a significant speed boost would result. The -O* optimization levels are correct, and -O2 is still faster than -Os on a P4 in general. Check out Scott Robert Ladd's ACOVEA website. It should be clear to anyone that no amount of trying to actually work out how GCC works will result in a definitive answer even for a simple piece of code as to what will run the fastest; in fact, the interoperation of optimizations is enough to confuse a hardened GCC dev, no matter how much he/she knows about the architecture and operation of GCC. Evolutionary analysis is the only way to go... _________________ Reality is for those who can't face Science Fiction.
emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts |
|
Back to top |
|
|
sapphirecat Guru
Joined: 15 Jan 2003 Posts: 376
|
Posted: Mon Feb 09, 2004 6:47 pm Post subject: |
|
|
soaringcondor wrote: | Actually -Os will disable a couple of functions that -O2 enabled, other than that they are basically the same. Those functions speed up the program but in doing so increase the file size. |
Got a reference? I don't really know my way around the gcc source; I got to toplev.c from the Freshmeat article which calls it topolev.c. _________________ Former Gentoo user; switched to Kubuntu 7.04 when I got sick of waiting on gcc. Chance of thread necro if you reply now approaching 100%... |
|
Back to top |
|
|
rosinante n00b
Joined: 26 Dec 2003 Posts: 5
|
Posted: Mon Feb 09, 2004 8:37 pm Post subject: |
|
|
Some questions:
Which is the default compiler with gentoo: gcc 3.2.3 of 3.3.2 ?
Is -fpic mandatory for prelinking?
I have a pentium4 celeron with SSE/SSE2/MMX capabilities; does GCC *effectively* passes these options with '-march=pentium4' ?
I have a very small L2 cache (128KB) and limited ram (128MB, shared for video), what would be optimum:
-'Os' with '-mftpmath=sse' and '-funroll-loops'
or
-'O2' with '-mftpmath=sse'
Are '-finline-functions' and '-frename-registers' any good on my low-performance box?
Any pointers are greatly appreciated - this gcc-tweaking is a mess :) |
|
Back to top |
|
|
robmoss Retired Dev
Joined: 27 May 2003 Posts: 2634 Location: Jesus College, Oxford
|
Posted: Mon Feb 09, 2004 9:14 pm Post subject: |
|
|
rosinante wrote: | Some questions:
Which is the default compiler with gentoo: gcc 3.2.3 of 3.3.2 ?
Is -fpic mandatory for prelinking?
I have a pentium4 celeron with SSE/SSE2/MMX capabilities; does GCC *effectively* passes these options with '-march=pentium4' ?
I have a very small L2 cache (128KB) and limited ram (128MB, shared for video), what would be optimum:
-'Os' with '-mfpmath=sse' and '-funroll-loops'
or
-'O2' with '-mfpmath=sse'
Are '-finline-functions' and '-frename-registers' any good on my low-performance box?
Any pointers are greatly appreciated - this gcc-tweaking is a mess |
People will give you all sorts of silly answers to this, so I'll try and pre-empt them and give you something useful.
The default compiler for Gentoo is now GCC 3.3.2. This shift occurred only a couple of days ago, so there are still some teething troubles.
With regards your CFLAGS - I don't know anything about pre-linking, so I can't help you there.
If you want to work out which CFLAGS work best for your machine, I'd suggest going to this site (NB - site currently down, Google's cache is here) and running some tests on the example source files. I don't know which of the -mmmx -msse -msse2 flags are switched on by your arch flag, but this code will tell you exactly what is:
Code: | touch ~/input.c
gcc -Q -v `emerge info | grep CFLAGS | sed -e s/CFLAGS// | sed -e s/'\"'// | sed -e s/'\"'// | sed -e s/'\='//` -c input.c |
Look for the options following "options passed:" for what is and isn't enabled. Don't guess and don't listen to anyone else; this varies from architecture to architecture and from GCC version to GCC version (even between minor versions!) so experiment is the only way to go here.
FWIW, here's the output from mine:
Code: | options passed: -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=2
-march=athlon-xp -msse -mmmx -m3dnow -momit-leaf-frame-pointer
-mfpmath=387 -auxbase -O3 -fomit-frame-pointer -funroll-loops -ffast-math
-fprefetch-loop-arrays -freduce-all-givs -finline-limit=600
options enabled: -fdefer-pop -fomit-frame-pointer -foptimize-sibling-calls
-fcse-follow-jumps -fcse-skip-blocks -fexpensive-optimizations
-fthread-jumps -fstrength-reduce -funroll-loops -fprefetch-loop-arrays
-freduce-all-givs -fpeephole -fforce-mem -ffunction-cse
-fkeep-static-consts -fcaller-saves -fpcc-struct-return -fgcse -fgcse-lm
-fgcse-sm -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2
-frerun-cse-after-loop -frerun-loop-opt -fdelete-null-pointer-checks
-fschedule-insns2 -fsched-interblock -fsched-spec -fbranch-count-reg
-freorder-blocks -freorder-functions -frename-registers -fcprop-registers
-fcommon -fgnu-linker -fregmove -foptimize-register-move -fargument-alias
-fstrict-aliasing -fmerge-constants -fzero-initialized-in-bss -fident
-fpeephole2 -ffinite-math-only -fguess-branch-probability
-funsafe-math-optimizations -m80387 -mhard-float -mno-soft-float
-mfp-ret-in-387 -momit-leaf-frame-pointer -maccumulate-outgoing-args -mmmx
-m3dnow -msse -mcpu=athlon-xp -mfpmath=387 -march=athlon-xp |
Here's my current CFLAGS:
Code: | CFLAGS="-O3 -march=athlon-xp -msse -mmmx -m3dnow -momit-leaf-frame-pointer -fomit-frame-pointer -funroll-loops -ffast-math -fprefetch-loop-arrays -freduce-all-givs -finline-limit=600 -mfpmath=387 -pipe" |
Those CFLAGS are probably pretty close to the fastest available as generic optimizations as determined by acovea for my breed of Athlon XP. But since there are such vast differences made even by doing something so trivial as changing the speed of the FSB, if I had a different breed of Athlon XP, my CFLAGS would probably be completely different!
[Note: I should put -ftracer back in my CFLAGS (I have done now, but I'm not going through all that again simply for the sake of correctness) - I was testing GCC 3.2 compatibility with a bit of code, and as such had to take -ftracer, probably the best optimization there is that isn't included in -O3 and much more beneficial in particular than -ffast-math on almost all code, even maths-intensive stuff, temporarily out.]
Have fun experimenting! _________________ Reality is for those who can't face Science Fiction.
emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts |
|
Back to top |
|
|
sapphirecat Guru
Joined: 15 Jan 2003 Posts: 376
|
Posted: Tue Feb 10, 2004 5:54 am Post subject: |
|
|
robmoss2k wrote: | I'd be interested to see that gimp benchmark run with -O3 and with -O3 -ftracer and would wager that a significant speed boost would result. |
Well, I tried. I can't get any difference among any of the combinations today. Not between the configurations I saw before, and not even between GCC 3.3 -O3 -ftracer and the runs with the binary where I accidentally left out -march=athlon-xp on GCC 3.2 -O2. This suggests I've done something wrong today, but I have no idea what it might be.
I watched the compilations, and they definitely used the right flags. I did the GCC3.2 builds before upgrading the compiler, and -ftracer worked, so the compilers were right. I've looked at ldd and watched top with full paths. Everything uses the appropriate plugins and libraries. I am going to bed before my head explodes. _________________ Former Gentoo user; switched to Kubuntu 7.04 when I got sick of waiting on gcc. Chance of thread necro if you reply now approaching 100%... |
|
Back to top |
|
|
rosinante n00b
Joined: 26 Dec 2003 Posts: 5
|
Posted: Wed Feb 11, 2004 3:24 pm Post subject: |
|
|
robmoss2k wrote: |
With regards your CFLAGS - I don't know anything about pre-linking, so I can't help you there.
|
I found out about prelinking; the -fPIC (which is different from -fpic) is only used for shared libraries (the key of prelinking), or, by the hardened-sources for compiling the et_dyn elf binaries.
As far as I found out, all shared libraries are compiled with -fPIC by default. Can anyone confirm this?
robmoss2k wrote: |
If you want to work out which CFLAGS work best for your machine, I'd suggest going to this site (NB - site currently down, Google's cache is here) and running some tests on the example source files. I don't know which of the -mmmx -msse -msse2 flags are switched on by your arch flag, but this code will tell you exactly what is:
Code: | touch ~/input.c
gcc -Q -v `emerge info | grep CFLAGS | sed -e s/CFLAGS// | sed -e s/'\"'// | sed -e s/'\"'// | sed -e s/'\='//` -c input.c |
|
Nice code, will try this out - as well as the acovea utility.
I'm currently skimming the intel docs on my processor to see if I can tweak something more, currently I have the following possible combinations:
Code: |
CFLAGS="-march=pentium4 -mcpu=pentium4 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer
-mmmx -msse -msse2 -mfpmath=sse
-Os"
CFLAGS="-march=pentium4 -mcpu=pentium4 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer
-mmmx -msse -msse2 -mfpmath=sse
-Os -finline-functions -frename-registers"
CFLAGS="-march=pentium4 -mcpu=pentium4 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer
-mmmx -msse -msse2 -mfpmath=sse
-O3 -fprefetch-loop-arrays -funroll-loops"
|
I also noticed that SSE has a USE flag, so I guess that even compiling with defaults (as in: use 387 for floating point thingies) doesn't give you guarantees on what processes will use effectively.
I could also give up on theoretical research and simply compile several times to see the difference :) |
|
Back to top |
|
|
Nutterpc Tux's lil' helper
Joined: 02 Feb 2004 Posts: 83
|
Posted: Fri Feb 13, 2004 1:13 am Post subject: |
|
|
Good day to one, good day to all
I reckon making the move to Gentoo for me was wise, considering my tweaking background on windows, I thought gentoo would be the best one for me to use .......as that's what it's all about, tweaking your linux distro to how you like it
Anywayz, my CFLAGS I use are as follows
-march=athlon-xp -O3 -fforce-addr -fomit-frame-pointer -foptimize-sibling-calls -fthread-jumps -fgcse-lm -fgcse-sm -frename-registers -mmmx -m3dnow -msse -mfpmath=387 -ffast-math -fmerge-constants -fnocprop-registers --param max-gcse-memory=512
I've had no probs with this string, me own one....and on a Barton core XP2500 it absolutely hoons , plus having a Gb of DDR400 Kingmax in Dual Channel might also have something to do with it, heeheeheeheehee
Will modify it as I see more ways to get speed outta gentoo
Nutterpc _________________ If it isn't broke, you ain't tweaked it right
Registered Linux User 353232 |
|
Back to top |
|
|
nmcsween Guru
Joined: 12 Nov 2003 Posts: 381
|
Posted: Mon Feb 16, 2004 9:30 am Post subject: |
|
|
Alright Im not a gcc developer but wouldn't it be better to -falign-functions to equal 64? the L1 cache on the athlon xp is 64kb and since most code will be exchanged in maybe 1 or 2 cycles with the L2 cache so why go with 16? |
|
Back to top |
|
|
tapted Tux's lil' helper
Joined: 02 Dec 2003 Posts: 122 Location: Sydney, Australia
|
Posted: Wed Feb 18, 2004 10:12 pm Post subject: |
|
|
Ultraoctane.com wrote: | Alright Im not a gcc developer but wouldn't it be better to -falign-functions to equal 64? the L1 cache on the athlon xp is 64kb and since most code will be exchanged in maybe 1 or 2 cycles with the L2 cache so why go with 16? |
... I still don't see the point of -falign-*
64 is silly though. Anything above the size of a 'word' (32 or 64 _bits_ today -- or 4-8 bytes) doesn't make sense at all -- it's just a waste of cache. -falign-* uses _bytes_ [not bits or kB].
So maybe use 4 or 8 ... but why does having things word-aligned help?
If you're optimising for size, set it to 1, so you can save a few bytes here and there.
On RISC processors, all instructions are word-aligned anyway, so meh.
On x86, they don't _need_ to be, but why does making it word-aligned help on a processor that doesn't really care about having things word-aligned?
Anyone? |
|
Back to top |
|
|
robmoss Retired Dev
Joined: 27 May 2003 Posts: 2634 Location: Jesus College, Oxford
|
Posted: Thu Feb 19, 2004 12:07 am Post subject: |
|
|
A brief attack on the acovea sources and a recompile suggests precisely no difference whatsoever with -falign-*=2^n, n integer, n <= 6, * functions / loops / jumps, except for a performance hit when they're all set equal to 1.
So set them to whatever the hell you want, it makes no difference. _________________ Reality is for those who can't face Science Fiction.
emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts |
|
Back to top |
|
|
tapted Tux's lil' helper
Joined: 02 Dec 2003 Posts: 122 Location: Sydney, Australia
|
Posted: Thu Feb 19, 2004 3:34 am Post subject: |
|
|
Nutterpc wrote: | Good day to one, good day to all
Anywayz, my CFLAGS I use are as follows
-march=athlon-xp -O3 -fforce-addr -fomit-frame-pointer -foptimize-sibling-calls -fthread-jumps -fgcse-lm -fgcse-sm -frename-registers -mmmx -m3dnow -msse -mfpmath=387 -ffast-math -fmerge-constants -fnocprop-registers --param max-gcse-memory=512
Nutterpc |
Although gcc doesn't care, some of these are redundant with -O3. This may seem anal, but just so nobody thinks "hey, I didn't use -fsomethingorother. I should recompile my system", here are the redundancies:
-fmerge-constants comes in with -O
-fthread-jumps is meant to as well, if it works on your architecture.
-foptimize-sibling-calls comes in with -O2
-frename-registers comes in with -O3
mmx, 3dnow, sse should come in with -march, IF gcc can use them on your architecture. AFAIK, the get automatically disabled if you try on anything but P4/Athlon64 because only they support 64-bit floats.
The following are new to me, but docs say they could be useful:
-fgcse-lm
-fgcse-sm
-fnocprop-registers
Anyone got benchmarks for these?
The rest have been looked at pretty thoroughly and are good to use. |
|
Back to top |
|
|
robmoss Retired Dev
Joined: 27 May 2003 Posts: 2634 Location: Jesus College, Oxford
|
Posted: Fri Feb 20, 2004 4:49 pm Post subject: |
|
|
Code: | touch ~/input.c
gcc -Q -v `emerge info | grep CFLAGS | sed -e s/CFLAGS// | sed -e s/'\"'// | sed -e s/'\"'// | sed -e s/'\='//` -c input.c |
Don't forget to use that horrible little bit of code (I'm sure it can be made much neater, I was just trying to be quick) to check what your CFLAGS are actually implying when you change them from one set to another. If there's no difference in what gcc actually uses, then there's no point recompiling everything... _________________ Reality is for those who can't face Science Fiction.
emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts |
|
Back to top |
|
|
C1REX l33t
Joined: 02 Jan 2004 Posts: 774 Location: Poland/UK
|
Posted: Mon Feb 23, 2004 12:24 am Post subject: |
|
|
Hi
Here In Poland there's a bit different fashion as far as flags are concerned, and you can come across a bit different solutions. For instance I'll give mine for Duron 800. They look a bit exotic here, but in Poland they can be regarded as pretty standard. It isn't very popular to meet O3. Os and O2 mostly.
Code: |
CFLAGS="-O2 -march=athlon-tbird -mcpu=athlon-tbird -falign-loops -ffast-math -frename-registers -funroll-all-loops -funroll-loops -pipe -fomit-frame-pointer ${LDFLAGS} -DNDEBUG -DG_DISABLE_ASSERT -DG_DISABLE_CHECKS -DG_DISABLE_CAST_CHECKS"
LDFLAGS="-s -z comreloc"
CXXFLAGS="${CFLAGS}"
|
Greetings from Poland _________________ CLICK HERE to help move gentoo up on distrowatch.
If you like Gentoo you can thank devs here - https://www.gentoo.org/donate/ |
|
Back to top |
|
|
t0mcat Tux's lil' helper
Joined: 12 Feb 2004 Posts: 111 Location: Catania, Italy
|
Posted: Mon Feb 23, 2004 6:28 pm Post subject: |
|
|
lo all,
i'm a noob to the penguin os, i've recently installed gentoo and made the bootstrap with CFLAGS="-march=athlon-xp -O2 -pipe".
i've read so many stuff about CFLAGS that i'm really confused, btw after many hours spent reading everywhere, now i'm about to "emerge -e world" with this:
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer -ffast-math -fmove-all-movables -funroll-loops -fprefetch-loop-arrays -fforce-addr -mmmx -msse -m3dnow -mfpmath=387 -frename-registers -maccumulate-outgoing-args"
i've got an athlon-xp barton 2500@3200.
hope everything will be ok.
btw i've got a n00b question about recompiling the whole mess:
can i "emerge -e world" from an X terminal (gnome)? i'd be quite bored having the pc occupied for more than a whole day, so i'd like to surf or chat while the cpu does such a huge work. Or shall i do it necessary from the console after a simple boot, without loading any D.E.?
ta for any reply. _________________ il gattaccio
a.k.a etienne |
|
Back to top |
|
|
AnonimoVeneziano n00b
Joined: 25 May 2003 Posts: 65
|
Posted: Mon Feb 23, 2004 8:20 pm Post subject: Tip: Styles can be applied quickly to selected text. |
|
|
Hi all
I'm just installing Gentoo Now , and I'm doing the bootstrap in this moment (I'm posting from Links )
Anyway , I've compiled with :
"-O3 -mcpu=athlon-xp -march=athlon-xp -fomit-frame-pointer -momit-leaf-frame-pointer -fprefetch-loop-arrays -mfpmath=sse -fforce-addr -pipe"
I would also add "-ftracer" , but gcc 3.2.3 (that is in the base CD) doesn't support this option and I'll add that after bootstrapping gcc 3.3.2 .
I've added -mcpu=athlon-xp because I've heard that some ebuilds filter the -march option ,is this true? Anyway -mcpu shouldn't change the binary created by the compilation with only "-march" , because "-mcpu" is a subset of "-march" , that's right?
Thanks
Bye
Marcello |
|
Back to top |
|
|
nmcsween Guru
Joined: 12 Nov 2003 Posts: 381
|
Posted: Tue Feb 24, 2004 1:14 am Post subject: |
|
|
I don't see the use of putting -mcpu=* since that is exactly the same as the i-686 setting in CHOST and -march=* turns on that flag as well so it would just be a waste of typing. |
|
Back to top |
|
|
seppe Guru
Joined: 01 Sep 2003 Posts: 431 Location: Hove, Antwerp, Belgium
|
Posted: Tue Feb 24, 2004 5:06 pm Post subject: |
|
|
Hi, I have a pentium 3 800Mhz, and I tried these CFLAGS:
CFLAGS="-O3 -march=pentium3 -fprefetch-loop-arrays -funroll-loops -pipe -fomit-frame-pointer -fforce-addr -fmove-all-movables"
and it's still compiling after a 'emerge -e world'. I tried the same CFLAG first with -freduce-all-givs (which gains 35% of speed at a pentium3, I read here) but I get an error when it tries to compile python2.3 (parallel make error, or something).
Now my questions:
1) Do I gain much speed with my current CFLAG? (I had the standard 'safe' pentium3 CFLAG first which was suggested at the freehackers site)
2) How can I compile everything with the -freduce-all-givs? I mean: when it fails at python2.3 during a 'emerge -e world', can I compile python only without the -freduce-all-givs flag and continue my 'emerge -e world' without starting all over again? If so, how do I do this?
3) Will -mfpmath=sse,387 gain much speed? And is it safe? Do you suggest me to use it, or is the increase of speed marginal to risk this?
4) How can I use the acovea program with gcc 3.3.3? I get always 'compilation failed' and a segmentation fault when I should get my 'best' CFLAGS, and can I use acovea with 'emerge -e world'?
Thanks for replying in advance, but keep it simple ... I'm a CFLAG n00b _________________ nitro-sources, because between stable and experimental there exists only speed
Latest release I made: 2.6.13.2-nitro1 |
|
Back to top |
|
|
Malakai Apprentice
Joined: 24 Dec 2002 Posts: 299
|
Posted: Tue Feb 24, 2004 10:40 pm Post subject: |
|
|
[quote="robmoss2k"] rosinante wrote: |
Code: | CFLAGS="-O3 -march=athlon-xp -msse -mmmx -m3dnow -momit-leaf-frame-pointer -fomit-frame-pointer -funroll-loops -ffast-math -fprefetch-loop-arrays -freduce-all-givs -finline-limit=600 -mfpmath=387 -pipe" |
|
Quite a few of those settings are redundant.
-march=athlon-xp includes -msse -mmmx -m3dnow.
I don't exactly know how -momit-leaf-frame-pointer works, but afaik -fomit-frame-pointer gets rid of the framepointer all together, whereas the leaf one only disables it some of the time. You *shouldn't* need both.
Also, some of the other options you have manually specified are already implied by -O3 in the current gentoo version of GCC.
Just a little fyi, I'm guilty of redundancy in my cflags as well ^_^ |
|
Back to top |
|
|
Penguin_Biker n00b
Joined: 25 Sep 2003 Posts: 30 Location: Portage michigan USA
|
Posted: Wed Feb 25, 2004 2:31 am Post subject: |
|
|
Athlon XP 1700+
i'm also having a problem with -freduce-all-givs
here's my CFLAGS
Code: | CFLAGS="-O3 -march=athlon-xp -pipe -fprefetch-loop-arrays -fomit-frame-pointer -fforce-addr -freduce-all-givs -funroll-loops -maccumulate-outgoing-args -ffast-math" |
How much of a speed gain does -freduce-all-givs give on an athlon-xp? (or if at all)
here's the error i get
Code: | !!! ERROR: dev-lang/python-2.3.3 failed.
!!! Function src_compile, Line 124, Exitcode 2
!!! Parallel make failed
|
i'm doing emerge system now without it, is there a way to fix this or should i just forget it?
EDIT: I had made a note to remove it earlier that i had forgotten about, so my problem is fixed
currently....
Code: | CFLAGS="-O3 -march=athlon-xp -pipe -fprefetch-loop-arrays -fomit-frame-pointer -fforce-addr -funroll-loops -maccumulate-outgoing-args -ffast-math" |
and compiling _________________ A computer without a Microsoft operating system is like a dog without bricks tied to its head.
Registered linux user: #328010 |
|
Back to top |
|
|
tapted Tux's lil' helper
Joined: 02 Dec 2003 Posts: 122 Location: Sydney, Australia
|
Posted: Sat Feb 28, 2004 8:04 am Post subject: |
|
|
C1REX wrote: | Hi
Here In Poland there's a bit different fashion as far as flags are concerned, and you can come across a bit different solutions. For instance I'll give mine for Duron 800. They look a bit exotic here, but in Poland they can be regarded as pretty standard. It isn't very popular to meet O3. Os and O2 mostly.
Code: |
CFLAGS="-O2 -march=athlon-tbird -mcpu=athlon-tbird -falign-loops -ffast-math -frename-registers -funroll-all-loops -funroll-loops -pipe -fomit-frame-pointer ${LDFLAGS} -DNDEBUG -DG_DISABLE_ASSERT -DG_DISABLE_CHECKS -DG_DISABLE_CAST_CHECKS"
LDFLAGS="-s -z comreloc"
CXXFLAGS="${CFLAGS}"
|
Greetings from Poland |
In my experiments -frename-registers was the ONLY flag that -O3 had that -O2 didn't.
https://forums.gentoo.org/viewtopic.php?t=5717&postdays=0&postorder=asc&start=625#793905
So if you use -O3 there would be precisely no difference, but you can drop a redundant flag.
Also every source I've seen (and my own benchmarks on a raytracer I wrote) advise strongly AGAINST using -funroll-all-loops. It actually slows code down. The reason is this: to figure out where to jump into the unrolled loop, it needs to do a modulus operation on a number only known at runtime. A modulus operation takes 30 clock cycles on a MIPS (where an adddition takes 1) .. on x86 it would be similarly bad. The indeterminate jump also means that the unrolled loop can't be pipelined*. So, unless your _unrolled_ loop executes through lots of times, your code will actually be slower.
-funroll-loops will only unroll loops where the number of iterations is known at compile time, and the division is easy (like a power of two, or a power of two equivalent after an addition**). So it saves time.
Note that the compiler can't really tell the difference between a loop with a high average number of iterations and a low average number of iterations, when the number is only known at runtime. Most loops will be the latter (think library functions), so the net result is slower code because -funroll-all-loops unrolls them all.
Everyone else, look here:
https://forums.gentoo.org/viewtopic.php?t=5717&postdays=0&postorder=asc&start=625#793905
first, for a thread summary. I might do the rest of the thread later.
Footnotes:
* A pipeline panic is usually a hit of about 10 clock cycles.
** Compilers are smart, and know that things like
x * 31 = x * 32 -x
= x << 5 - x
and can save ~8 clock cycles. There's prolly an equivalent thing for modulus.
Moo. |
|
Back to top |
|
|
tapted Tux's lil' helper
Joined: 02 Dec 2003 Posts: 122 Location: Sydney, Australia
|
Posted: Sat Feb 28, 2004 8:10 am Post subject: |
|
|
Penguin_Biker wrote: | Athlon XP 1700+
i'm also having a problem with -freduce-all-givs
here's my CFLAGS
Code: | CFLAGS="-O3 -march=athlon-xp -pipe -fprefetch-loop-arrays -fomit-frame-pointer -fforce-addr -freduce-all-givs -funroll-loops -maccumulate-outgoing-args -ffast-math" |
How much of a speed gain does -freduce-all-givs give on an athlon-xp? (or if at all)
|
It's experimental. If the gcc developers find it to be advantageous (or bits of it) then it will be incorporated into the loop optimiser in a future version.
Currently, whether there's a performance boost at all depends on the code. Theoretically, however, it should always be an improvement. It's things like caching and pipelining that stuff things up. However, enough people use it to make me think that, on average, there is a not insignificant performance boost.
Penguin_Biker wrote: |
here's the error i get
Code: | !!! ERROR: dev-lang/python-2.3.3 failed.
!!! Function src_compile, Line 124, Exitcode 2
!!! Parallel make failed
|
i'm doing emerge system now without it, is there a way to fix this or should i just forget it?
|
Python has known problems with -freduce-all-givs.
Remove it from your CFLAGS before emerging python, peeps.
perl is also sensitive to some CFLAGS.
Most other things are fine.
Penguin_Biker wrote: |
EDIT: I had made a note to remove it earlier that i had forgotten about, so my problem is fixed
currently....
Code: | CFLAGS="-O3 -march=athlon-xp -pipe -fprefetch-loop-arrays -fomit-frame-pointer -fforce-addr -funroll-loops -maccumulate-outgoing-args -ffast-math" |
and compiling |
|
|
Back to top |
|
|
spaetz n00b
Joined: 15 Dec 2003 Posts: 16
|
Posted: Tue Mar 02, 2004 7:02 am Post subject: |
|
|
robmoss2k wrote: | Code: | touch ~/input.c
gcc -Q -v `emerge info | grep CFLAGS | sed -e s/CFLAGS// | sed -e s/'\"'// | sed -e s/'\"'// | sed -e s/'\='//` -c input.c |
|
How about this ?
Code: | touch ~/input.c
gcc -Q -v `emerge info | grep CFLAGS | cut -d '"' -f2` -c input.c |
|
|
Back to top |
|
|
til n00b
Joined: 02 Mar 2004 Posts: 7
|
Posted: Tue Mar 02, 2004 9:24 am Post subject: |
|
|
I also need some help, cause i habe the same problem, like others - but I thought my CFLAGS were just optimized for my system (Athlon XP 2200+). But anyway my gentoo crashes after compiling for about 3 hours.
For your help, my cflags:
Code: | CHOST="i686-pc-linux-gnu"
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer"
|
I don't see there any error in my config - do you?
And then, just another question by a newbie:
I got several problems with my keymap (need a german keyboard with umlauts) - first I chose
but this one was not determined to work (I had a mixture of an english an german keyboard - z and y where right placed, but on the umalut buttons I had the American layout) , so I changed the keymap to:
Code: | KEYMAP="de-latin1-nodeadkeys" | and now I got umlauts like ä, ö, ü - but I don't have the Alt Gr buttons, like @ or €
I think, I should post this question in another topic and search for answers, but I thought, perhaps sb. could help me so I needn't make effort (I'm lazy ). |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|