View previous topic :: View next topic |
Author |
Message |
tapted Tux's lil' helper
Joined: 02 Dec 2003 Posts: 122 Location: Sydney, Australia
|
Posted: Tue Mar 02, 2004 10:24 am Post subject: |
|
|
til wrote: | I also need some help, cause i habe the same problem, like others - but I thought my CFLAGS were just optimized for my system (Athlon XP 2200+). But anyway my gentoo crashes after compiling for about 3 hours.
|
What kind of crash??
A kernel panic?
I've had none on my p3, but had zounds of them on my p4 (gave up in the end).
try running the memtest86 program to see if there are memory problems.
Otherwise the kernel you're using may be badly configured. However, chances are it's a live CD kernel or some such... so should be OK.
Maybe try a live CD with a 2.6 kernel.
til wrote: |
For your help, my cflags:
Code: | CHOST="i686-pc-linux-gnu"
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer"
|
I don't see there any error in my config - do you?
|
These are quite tame, by Gentoo standards.
Sorry, I can't help you with keyboard layouts off the top of my head...
Here in Australia we use US keyboards (pretty much exclusively). We just have issues when we want our dates around a different way, and when we want to spell things in colourful ways. Or write programmes in our pyjamas to honour our neighbours' valour....
or something.
Moo. |
|
Back to top |
|
|
Säck Tux's lil' helper
Joined: 13 Dec 2003 Posts: 141 Location: Switzerland
|
Posted: Wed Mar 03, 2004 9:19 am Post subject: |
|
|
I tink i'll do a complet new install of my gentoo system, since i have played around a little bit too much and my hd is full.
i have a pentium 4-m and i'd like cflags settings that will work withouth problems.
my actual settings are:
CHOST="i686-pc-linux-gnu"
CFLAGS="-march=pentium4 -O3 -pipe -fomit-frame-pointer"
this has worked out in most of the cases pretty well, but not allways. Openoffice didn't compile, and strangely kde 3.2 korganizer doesn't work right. in a thread (i can't remember which one) that this might come from march=pentium4.
Well my next system should be a system that is optimized but STABLE!!
So I consider lowering my CFLAGS to
CFLAGS="-march=i686 -O2 -pipe -fomit-frame-pointer"
now my questions:
-is the change from -march=pentium4 to -march=i686 decrasing performance drastically.
-shouldn't i actually use -mcpu=i686 since my cpu isn't a pentium pro?
-is this going to result in a more stable system?
and my last question: when I do a stage 3 install, well what are the cflags of the i686 and the pentium 4 installation by default?
greets and thanks for your help _________________ Remember: Gentoo Rocks |
|
Back to top |
|
|
tapted Tux's lil' helper
Joined: 02 Dec 2003 Posts: 122 Location: Sydney, Australia
|
Posted: Thu Mar 04, 2004 10:56 am Post subject: |
|
|
Säck wrote: | Openoffice didn't compile,
|
It never seems to.
emerge openoffice-bin
seems to make most people happy
Säck wrote: | and strangely kde 3.2 korganizer doesn't work right. in a thread (i can't remember which one) that this might come from march=pentium4.
Well my next system should be a system that is optimized but STABLE!!
So I consider lowering my CFLAGS to
CFLAGS="-march=i686 -O2 -pipe -fomit-frame-pointer"
|
These are pretty tame...
Quote: |
now my questions:
-is the change from -march=pentium4 to -march=i686 decrasing performance drastically.
|
Most likely no. p4 extensions are things like sse2, which mainly effect floating point arithmetic. Unless you're doing ray tracing, something like 95% of operations are integer arithmetic.
However, there should be no reason (in theory) to drop down. -march enhancements rarely effect code semantics.
Säck wrote: |
-shouldn't i actually use -mcpu=i686 since my cpu isn't a pentium pro?
|
AFAIK, for the most part, these are identical. i586 is pentium, i686 is pentium pro. That's a generalisation .. I don't know the details.
I do, however, know the difference between -mcpu and -march. The difference is when code is run on an architecture that is not the same as the specified parameter. x86 is backward compatible, so there is no problem if you run i686 code on a p4, say. However, code compiled with -march=i686 will not run on a 486, say. However, if you use -mcpu, it is (in theory at least) compatible with all x86 architectures, BUT it includes the capability to run the extensions of the specified architecture, so should run fastest on it.
My grammar sucks, but that's the gist of it.
Unless you're running the same binary on two different processors/computers, there is not really any point to specify a -mcpu flag over a -march.
There might be cause to specify both (with the same argument), in case there's an ebuild that's known to be broken with a particular march, but not the equivalent mcpu, and so filters out the march... That's rare though. I don't bother.
Säck wrote: |
-is this going to result in a more stable system?
|
Unlikely. cf Tame.
Säck wrote: |
and my last question: when I do a stage 3 install, well what are the cflags of the i686 and the pentium 4 installation by default?
greets and thanks for your help |
AFAIK, P4 stage 3 and the GRP are compiled with something like
Code: |
CFLAGS="-O3 -march=pentium4 -funroll-loops -fprefetch-loop-arrays -pipe"
|
I'm not a gentoo developer though, so it may have changed, and I could be wrong.
For more info (and sorry for referencing my own post again, but it's a new page....) see
https://forums.gentoo.org/viewtopic.php?p=793905#793905
Moo. |
|
Back to top |
|
|
n3m0 l33t
Joined: 08 Feb 2004 Posts: 798 Location: Richville, Naples, Italy, Europe
|
Posted: Sat Mar 06, 2004 2:20 pm Post subject: |
|
|
tapted wrote: |
... I still don't see the point of -falign-*
64 is silly though. Anything above the size of a 'word' (32 or 64 _bits_ today -- or 4-8 bytes) doesn't make sense at all -- it's just a waste of cache. -falign-* uses _bytes_ [not bits or kB].
|
According to "AMD Athlon Processor x86 Code Optimization Guide" the AthlonXP processor has a 64-byte cache line.
This should justify the following flag:
-falign-functions=64
But the gcc manual say (about -falign-functions):
"If n is not specified or is zero, use a machine-dependent default"
that means (for me) that on the "standard" i686 machine "-falign-functions" and "-falign-functions=32" are the same thing (right?).
On the AXP "-falign-functions" and "-falign-functions=64" will be the same thing?.
Moreover, the "AMD Athlon Processor x86 Code Optimization Guide" says:
"In program hot spots (as determined by either profiling or loop
nesting analysis), place branch targets at or near the beginning
of 16-by te aligned code windows. Th is guideline improve s
performance insi de hotspots by maximizing the nu mber of
instruction fills into the instruction-byte queue and preserves I-
cache space in branch-intensive code outside such hotspots."
This passage seems to justify the following flag:
-falign-jumps=16
...but I'm not sure of this. _________________ Lenergia è la civiltà. Lasciarla in mano ai piromani/petrolieri è criminale. Perché aspettare che finisca il petrolio?
Letà della pietra non è mica finita per mancanza di pietre. - B.G.
Site/Blog: http://www.neminis.org |
|
Back to top |
|
|
sleek n00b
Joined: 09 Jan 2003 Posts: 71
|
Posted: Sun Mar 07, 2004 3:22 pm Post subject: |
|
|
For all those with an Intel Celeron (Coppermine) 600mhz CPU:
Code: | craig@sleekdesign code $ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Celeron (Coppermine)
stepping : 3
cpu MHz : 593.296
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1171.45 |
These CFLAGS work great:
Code: | CFLAGS="-O3 -march=pentium3 -fomit-frame-pointer -pipe -mmmx -msse -mfpmath=sse" |
_________________ Yesterday was the deadline for all complaints |
|
Back to top |
|
|
fishhead Apprentice
Joined: 07 Mar 2003 Posts: 162 Location: Pasadena, CA
|
Posted: Tue Mar 09, 2004 3:04 am Post subject: |
|
|
n3m0 wrote: |
According to "AMD Athlon Processor x86 Code Optimization Guide" the AthlonXP processor has a 64-byte cache line.
This should justify the following flag:
-falign-functions=64
But the gcc manual say (about -falign-functions):
"If n is not specified or is zero, use a machine-dependent default"
that means (for me) that on the "standard" i686 machine "-falign-functions" and "-falign-functions=32" are the same thing (right?).
On the AXP "-falign-functions" and "-falign-functions=64" will be the same thing?.
|
I thought so too at first, but you'll not really see any advantage with this. The athlon uses not-RAMBUS memory and can thus specify what part of the cache line to load first (i.e. where the function starts), It's to your advantage thous to do -falign-functions=16 since the athlon (as you cite below) fetches from 16 byte boundaries. You can specify a lower alignment and trade slightly more decoding at the beginning of a function for cache space.
GCC's alignment defaults are pretty well tuned already. I use -falign-jumps=16 -falign-loops=16 -falign-functions=16 -falign-labels=1 -- I think only one or two of these is different from what GCC uses by default.
n3m0 wrote: |
Moreover, the "AMD Athlon Processor x86 Code Optimization Guide" says:
"In program hot spots (as determined by either profiling or loop
nesting analysis), place branch targets at or near the beginning
of 16-by te aligned code windows. Th is guideline improve s
performance insi de hotspots by maximizing the nu mber of
instruction fills into the instruction-byte queue and preserves I-
cache space in branch-intensive code outside such hotspots."
This passage seems to justify the following flag:
-falign-jumps=16
...but I'm not sure of this. |
See above. I'm almost positive that GCC does this by default for the athlon. |
|
Back to top |
|
|
KingPunk Guru
Joined: 22 Jan 2004 Posts: 442 Location: Utica, New York, USA
|
Posted: Tue Mar 09, 2004 8:21 pm Post subject: |
|
|
just thought i'd add my two point two cents
Code: | CFLAGS="-march=athlon-xp -O3 -ffast-math -malign-double -funroll-loops -pipe -fomit-frame-pointer -msse -mfpmath=sse,387"
CHOST="i686-pc-linux-gnu"
CXXFLAGS="${CFLAGS}"
|
is there anything i should add or subtract?
(note that i don't care about having no chance of debugging, or anything like that.
no biggie to me. i just want the code, to F L Y! )
Thanks!
~KingPunk _________________ When the FBI/CIA/NSA/FDA/and other three-letter government agencies come looking, you don't know me, you never saw me, never heard of me. get it? got it? good!
also: ALL YOUR POLLITICAL BASE ARE BELONG TO HILLARY IN '08!! |
|
Back to top |
|
|
tapted Tux's lil' helper
Joined: 02 Dec 2003 Posts: 122 Location: Sydney, Australia
|
Posted: Tue Mar 09, 2004 10:23 pm Post subject: |
|
|
KingPunk wrote: | just thought i'd add my two point two cents
Code: | CFLAGS="-march=athlon-xp -O3 -ffast-math -malign-double -funroll-loops -pipe -fomit-frame-pointer -msse -mfpmath=sse,387"
CHOST="i686-pc-linux-gnu"
CXXFLAGS="${CFLAGS}"
|
is there anything i should add or subtract?
|
-mfpmath=sse,387 is bad.
See https://forums.gentoo.org/viewtopic.php?p=796878#796878
and the one two down from that.
-malign-double is also strongly warned AGAINST -- it generally results in slower code.... although, admittedly, I can't remember where I saw this or even what mailgn-double actually does...
-ffast-math is also debateable.
The rest are good, but there are probably others that you can include -- look back through the thread.
Moo. |
|
Back to top |
|
|
KingPunk Guru
Joined: 22 Jan 2004 Posts: 442 Location: Utica, New York, USA
|
Posted: Tue Mar 09, 2004 10:53 pm Post subject: |
|
|
odd enough, i've compiled the whole system with it. rofl.
and they say it will in fact, make it run slower.
so, what would be the best to use?
like, if you were to get the cflags to run on a 2500+ barton, 333fsb, 512 L2,
... what would you run?
i want to get the absloute fastest system going. that way i can get
every edge over my friends box hes building. (we got a nice little
competition going ..and he doesn't know how to do software
optimizations, via cflags, so yeah!)
so if i could get ahold of the "best" flags to use, without the need
for debugging, i just want my box to smoke. as long as it isn't menthol,
har har har
thanks much.
~KingPunk _________________ When the FBI/CIA/NSA/FDA/and other three-letter government agencies come looking, you don't know me, you never saw me, never heard of me. get it? got it? good!
also: ALL YOUR POLLITICAL BASE ARE BELONG TO HILLARY IN '08!! |
|
Back to top |
|
|
n3m0 l33t
Joined: 08 Feb 2004 Posts: 798 Location: Richville, Naples, Italy, Europe
|
Posted: Wed Mar 10, 2004 8:03 pm Post subject: |
|
|
fishhead wrote: |
GCC's alignment defaults are pretty well tuned already. I use -falign-jumps=16 -falign-loops=16 -falign-functions=16 -falign-labels=1 -- I think only one or two of these is different from what GCC uses by default.
|
Thanks for your hints.
I think I'll leave -falign* on the default value, implied by -O2...It seems the most reasonable choice.
Finally, my definitive flags should be these:
CFLAGS="-march=athlon-xp -O3 -pipe -mfpmath=387 -fforce-addr -fomit-frame-pointer -ffast-math -funroll-loops -fprefetch-loop-arrays -fmove-all-movables"
What do you think about them?
I have still a trouble about -O3. I would substitute it with "-O2 -frename-registers" (-frename-registers is one of two flags added moving form -O2 to -O3).
Infact, I have some trouble about -finline-functions (the other flag implied by -O3 and not implied by -O2).
It could increase the code excessively, increasing the load time, without provide a sensible increment of speed in the execution of a process.
But I'm quite unsure about this.
PS: ok ok, I know, I ask me too questions! _________________ Lenergia è la civiltà. Lasciarla in mano ai piromani/petrolieri è criminale. Perché aspettare che finisca il petrolio?
Letà della pietra non è mica finita per mancanza di pietre. - B.G.
Site/Blog: http://www.neminis.org |
|
Back to top |
|
|
punter Guru
Joined: 25 Nov 2002 Posts: 506
|
Posted: Fri Mar 12, 2004 1:33 pm Post subject: |
|
|
KingPunk wrote: |
so if i could get ahold of the "best" flags to use, without the need
for debugging, i just want my box to smoke. as long as it isn't menthol,
har har har
thanks much.
~KingPunk |
sounds like you need a hand for this small competition of yours,
forget about flags, go to bios and overclock cpu freq 60% higher than average, and bus/ram freq 50% faster.
then buy a floating powder nitrogen spray, take off cpu heatsink, and spray at cpu core, while doing a computationally expensive calc on the computer.
that'll make your computer smoke, as well as fry, and last but not least do the computation ultra-faster. |
|
Back to top |
|
|
Gentree Watchman
Joined: 01 Jul 2003 Posts: 5350 Location: France, Old Europe
|
Posted: Fri Mar 12, 2004 5:41 pm Post subject: |
|
|
Quote: | ..and he doesn't know how to do software
optimizations, via cflags, so yeah! |
Neither do you it seems!
Seriously as the last post said , you'll get far more from overclocking.
I dont know what your mobo is but I have a athlonXP2000+ on a KX7-333 (with a GOOD solid copper heatsink).
If I wnat to go mad I can push FSB to 186 and cpu to 2.323GHz.
It will fall on its arse if you try to rebuild KDE but will run normal desktop stuff fairly well.
Setup PChealth protection on your BIOS and use lm_sensors to keep an eye on the cpu and test with burnP6 and burnBX et al (emerge cpuburn I think)
I hit lucky with my cpu so your may not get as far.
Have fun. |
|
Back to top |
|
|
robmoss Retired Dev
Joined: 27 May 2003 Posts: 2634 Location: Jesus College, Oxford
|
|
Back to top |
|
|
n3m0 l33t
Joined: 08 Feb 2004 Posts: 798 Location: Richville, Naples, Italy, Europe
|
Posted: Fri Mar 12, 2004 7:56 pm Post subject: |
|
|
robmoss2k wrote: | I was under the impression that -malign-double was very, very good indeed... when it works. I may have to test this. |
I tried it during the first installation of Gentoo on my Athlon XP 2600.
It broke the most part of the executables.
The binutils did not function correctly.
Diffutils did not compile.
Etc...etc... _________________ Lenergia è la civiltà. Lasciarla in mano ai piromani/petrolieri è criminale. Perché aspettare che finisca il petrolio?
Letà della pietra non è mica finita per mancanza di pietre. - B.G.
Site/Blog: http://www.neminis.org |
|
Back to top |
|
|
nmcsween Guru
Joined: 12 Nov 2003 Posts: 381
|
Posted: Sat Mar 13, 2004 10:18 am Post subject: |
|
|
n3m0: Aligning the functions to take the whole width of the cache would cause something called cache misses and also fill it up with usless data since when it needs something that is say only 8 bytes it causes the extra 56 bytes to be filled with junk thus filling your caches with junk to my understanding -falign-functions and -falign-jumps only compiles some of the code into the boundries and not all (not all meaning other code).
Last edited by nmcsween on Sat Mar 13, 2004 10:25 am; edited 1 time in total |
|
Back to top |
|
|
nmcsween Guru
Joined: 12 Nov 2003 Posts: 381
|
Posted: Sat Mar 13, 2004 10:25 am Post subject: |
|
|
As far as -malign-double its use is to compile code into a two word boundry instead of the default. This generally maims the alignment, it's not needed. On the other hand if you feel like you need to ride the really really wild side of gcc optimizations then try -mregparm=3 this controls how many registers are used to pass integer arguments from 1-3, which is a good thing but make sure you do that on a fresh install. |
|
Back to top |
|
|
nmcsween Guru
Joined: 12 Nov 2003 Posts: 381
|
Posted: Sat Mar 13, 2004 10:29 am Post subject: |
|
|
If you want to have an ultra optimized system try out these flags:
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer -fno-crossjumping -falign-functions=16 -falign-loops=16 -falign-jumps=16 -fno-align-labels -mfpmath=sse,387 -maccumulate-outgoing-args -fmove-all-movables -freduce-all-givs"
#-fnew-ra ( use -fnew-ra with caution) All these flags optimize without an additonal increase in memory usage or drive space usage of what -O3 specifies. |
|
Back to top |
|
|
neenee Veteran
Joined: 20 Jul 2003 Posts: 1786
|
Posted: Sat Mar 13, 2004 10:43 am Post subject: |
|
|
i now use:
CFLAGS="-O2 -march=athlon-xp -pipe -fomit-frame-pointer -ftracer" |
|
Back to top |
|
|
KingPunk Guru
Joined: 22 Jan 2004 Posts: 442 Location: Utica, New York, USA
|
Posted: Tue Mar 16, 2004 12:58 am Post subject: |
|
|
Ultraoctane.com wrote: | If you want to have an ultra optimized system try out these flags:
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer -fno-crossjumping -falign-functions=16 -falign-loops=16 -falign-jumps=16 -fno-align-labels -mfpmath=sse,387 -maccumulate-outgoing-args -fmove-all-movables -freduce-all-givs"
#-fnew-ra ( use -fnew-ra with caution) All these flags optimize without an additonal increase in memory usage or drive space usage of what -O3 specifies. |
Thank you for your tip. building the system now.
/sings* oh what fun, it is to watch, code complile on the fly, hey! */
~KingPunk _________________ When the FBI/CIA/NSA/FDA/and other three-letter government agencies come looking, you don't know me, you never saw me, never heard of me. get it? got it? good!
also: ALL YOUR POLLITICAL BASE ARE BELONG TO HILLARY IN '08!! |
|
Back to top |
|
|
nmcsween Guru
Joined: 12 Nov 2003 Posts: 381
|
Posted: Wed Mar 17, 2004 6:25 am Post subject: |
|
|
Quote: |
Ultraoctane.com wrote:
If you want to have an ultra optimized system try out these flags:
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer -fno-crossjumping -falign-functions=16 -falign-loops=16 -falign-jumps=16 -fno-align-labels -mfpmath=sse,387 -maccumulate-outgoing-args -fmove-all-movables -freduce-all-givs"
#-fnew-ra ( use -fnew-ra with caution) All these flags optimize without an additonal increase in memory usage or drive space usage of what -O3 specifies.
Thank you for your tip. building the system now.
/sings* oh what fun, it is to watch, code complile on the fly, hey! */
~KingPunk
|
I should have added that you need to have a newer proc to use these flags and I assumed that everyone knew to edit there -march= flag. |
|
Back to top |
|
|
tapted Tux's lil' helper
Joined: 02 Dec 2003 Posts: 122 Location: Sydney, Australia
|
Posted: Wed Mar 17, 2004 8:21 am Post subject: |
|
|
I'll say it again: the consensus seems to be that -mfpmath=387,sse is bad...
According to
http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Optimize-Options.html
and
http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/i386-and-x86-64-Options.html
it would also appear that -fomit-frame-pointer \implies -momit-leaf-frame-pointer
and -mfpmath=387 is the default for all but the Athlon x86-64 compiler
-ftracer is new in gcc3.3 and looks good.
-fno-crossjumping and -fno-align-labels are not mentioned directly -- perhaps someone knows benefits/disadvantages.
-maccumulate-outgoing-args also looks handy.
Here's a snip
http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Optimize-Options.html wrote: |
-fnew-ra
Use a graph coloring register allocator. Currently this option is meant for testing, so we are interested to hear about miscompilations with -fnew-ra.
-ftracer
Perform tail duplication to enlarge superblock size. This transformation simplifies the control flow of the function allowing other optimizations to do better job.
-funroll-loops
Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies both -fstrength-reduce and -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.
-funroll-all-loops
Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. -funroll-all-loops implies the same options as -funroll-loops,
-fprefetch-loop-arrays
If supported by the target machine, generate instructions to prefetch memory to improve the performance of loops that access large arrays.
Disabled at level -Os.
|
the rest are old hat.
More snips:
http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/i386-and-x86-64-Options.html wrote: |
-malign-double
-mno-align-double
Control whether GCC aligns double, long double, and long long variables on a two word boundary or a one word boundary. Aligning double variables on a two word boundary will produce code that runs somewhat faster on a Pentium at the expense of more memory.
Warning: if you use the -malign-double switch, structures containing the above types will be aligned differently than the published application binary interface specifications for the 386 and will not be binary compatible with structures in code compiled without that switch.
-mregparm=num
Control how many registers are used to pass integer arguments. By default, no registers are used to pass arguments, and at most 3 registers can be used. You can control this behavior for a specific function by using the function attribute regparm. See Function Attributes.
Warning: if you use this switch, and num is nonzero, then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.
-maccumulate-outgoing-args
If enabled, the maximum amount of space required for outgoing arguments will be computed in the function prologue. This is faster on most modern CPUs because of reduced dependencies, improved scheduling and reduced stack usage when preferred stack boundary is not equal to 2. The drawback is a notable increase in code size. This switch implies -mno-push-args.
|
Moo. |
|
Back to top |
|
|
nmcsween Guru
Joined: 12 Nov 2003 Posts: 381
|
Posted: Wed Mar 17, 2004 10:56 am Post subject: |
|
|
Quote: |
I'll say it again: the consensus seems to be that -mfpmath=387,sse is bad...
|
Theres no way giving extra instruction sets can be bad. It may be a little risky but if your going for server stability then you shouldn't be looking at this thread.
Quote: |
-fno-crossjumping and -fno-align-labels are not mentioned directly -- perhaps someone knows benefits/disadvantages.
|
-fcrossjumping has shown to lessen performance.
Quote: |
-fnew-ra
Use a graph coloring register allocator. Currently this option is meant for testing, so we are interested to hear about miscompilations with -fnew-ra.
|
Seems to kill a large ammount of compiles but gives better performace.
Quote: |
funroll-loops
Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies both -fstrength-reduce and -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.
-funroll-all-loops
Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. -funroll-all-loops implies the same options as -funroll-loops,
|
These optimizations are a waste of space they don't give any real performance increase and most likely slow down a computer that uses them.
Quote: |
-mregparm=num
Control how many registers are used to pass integer arguments. By default, no registers are used to pass arguments, and at most 3 registers can be used. You can control this behavior for a specific function by using the function attribute regparm. See Function Attributes.
Warning: if you use this switch, and num is nonzero, then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.
|
This is the holly grail of optimizations but since it never works ( I haven't seen it work) it's useless right now. |
|
Back to top |
|
|
seppe Guru
Joined: 01 Sep 2003 Posts: 431 Location: Hove, Antwerp, Belgium
|
Posted: Wed Mar 17, 2004 3:06 pm Post subject: |
|
|
What do you guys suggest for this cpu?
Code: |
root@iris seppe # cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 3
cpu MHz : 800.314
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1568.76
|
I have now in make.conf:
Code: |
CFLAGS="-march=pentium3 -mmmx -msse -Os -fomit-frame-pointer -pipe -fforce-addr -fforce-mem -ffast-math -mpush-args -mfpmath=sse -w"
|
But I haven't done an emerge -e world yet, I just want to make sure that these are good. So if you have suggestions for my CFLAGS, let me know.
Btw, I took -Os instead of -O3 because I heard -O3 is bad when you have not much cache (I have 256kb). Why not -O2 then? Because I care more about the startup time of my apps then the general performance of those apps, although I'm considering to switch to -O2 so that I have more general performance and still a good startup time.
I have read the whole thread, and I have different stuff noted:
genflags suggests me:
Code: | CFLAGS="-march=pentium3 -O3 -pipe" |
another P3 coppermine user has these cflags:
Code: | -march=pentium3 -O2 -fomit-frame-pointer -momit-leaf-frame-pointer -fprefetch-loop-arrays |
another p3 user has this:
Code: | -march=pentium3 -O3 -mmmx -msse -pipe -fomit-frame-pointer -fprefetch-loop-arrays |
another p3 user has:
Code: | CFLAGS="-march=pentium3 -O2 -pipe -frename-registers -mmmx -msse -fmove-all-movables -mfpmath=sse -w" |
another p3 user:
Code: | -march=pentium3 -O3 -pipe -fomit-frame-pointer -fforce-addr -falign-functions=4 -fprefetch-loop-arrays -fexpensive-optimizations |
other stuff I noted:
Quote: | -funroll-loops is probably not good on a p3, do to bandwidth and L1 cache limits |
Quote: | All -fomit-frame-pointer does is free up a register. Free registers + less code on function entrance = very good. Use it! |
Quote: | -mfpmath=sse doesn't improve anything, but -ffast-math can increase the performance by 40% |
Quote: | I should note that on the Pentium 3, -O3 -freduce-all-givs generates code that is 35% faster than -O3 alone |
Quote: | Don't add too much cflag's because that will slow down the performance |
Ok, what I want to say is that I've read so many suggestions for my P3 800Mhz CPU so that I don't really know now which flag's I *really* should take and which flags not.
If you know which flags I *really should* take, please tell me
Remember that I want that my apps start up quickly (so not to large binaries) AND that I still want great general performance.
Thanks _________________ nitro-sources, because between stable and experimental there exists only speed
Latest release I made: 2.6.13.2-nitro1 |
|
Back to top |
|
|
nmcsween Guru
Joined: 12 Nov 2003 Posts: 381
|
Posted: Wed Mar 17, 2004 3:39 pm Post subject: |
|
|
First off I have to say don't listen to a good amount of people here. Some people seem to be giving bad advice. why? most likely they don't know what there talking about. (this isn't to anyone in particular). I really don't see why people are telling you to use -Os since your system is well within the limits of even -O3 and -O3 will add a few much needed flags to your compiles that your march flag specifies so to wrap this up heres what i recommend:
-march=pentium3 -O3 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer -fno-crossjumping -mfpmath=sse -maccumulate-outgoing-args -fmove-all-movables -freduce-all-givs that will give you a noticable increase in speed. Also -ffast-math is totaly up to you, but i don't recommend it since you'll get a 40% increase in speed in very very very rare occasions. |
|
Back to top |
|
|
nmcsween Guru
Joined: 12 Nov 2003 Posts: 381
|
Posted: Wed Mar 17, 2004 3:46 pm Post subject: |
|
|
Quote: |
Quote:
Don't add too much cflag's because that will slow down the performance
|
Thats simply wrong if you don't know what your doing with the cflags stay out of the kitchen or you'll get burned. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|