Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
CFLAGS Central (Part 1)
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3 ... 27, 28, 29 ... 35, 36, 37  Next  
This topic is locked: you cannot edit posts or make replies.    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
tapted
Tux's lil' helper
Tux's lil' helper


Joined: 02 Dec 2003
Posts: 122
Location: Sydney, Australia

PostPosted: Tue Mar 02, 2004 10:24 am    Post subject: Reply with quote

til wrote:
I also need some help, cause i habe the same problem, like others - but I thought my CFLAGS were just optimized for my system (Athlon XP 2200+). But anyway my gentoo crashes after compiling for about 3 hours.


What kind of crash??

A kernel panic?

I've had none on my p3, but had zounds of them on my p4 (gave up in the end).

try running the memtest86 program to see if there are memory problems.

Otherwise the kernel you're using may be badly configured. However, chances are it's a live CD kernel or some such... so should be OK.

Maybe try a live CD with a 2.6 kernel.

til wrote:

For your help, my cflags:
Code:
CHOST="i686-pc-linux-gnu"
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer"


I don't see there any error in my config - do you?

These are quite tame, by Gentoo standards.

Sorry, I can't help you with keyboard layouts off the top of my head...

Here in Australia we use US keyboards (pretty much exclusively). We just have issues when we want our dates around a different way, and when we want to spell things in colourful ways. Or write programmes in our pyjamas to honour our neighbours' valour....

or something.

Moo.
Back to top
View user's profile Send private message
Säck
Tux's lil' helper
Tux's lil' helper


Joined: 13 Dec 2003
Posts: 141
Location: Switzerland

PostPosted: Wed Mar 03, 2004 9:19 am    Post subject: Reply with quote

I tink i'll do a complet new install of my gentoo system, since i have played around a little bit too much and my hd is full.

i have a pentium 4-m and i'd like cflags settings that will work withouth problems.
my actual settings are:

CHOST="i686-pc-linux-gnu"
CFLAGS="-march=pentium4 -O3 -pipe -fomit-frame-pointer"

this has worked out in most of the cases pretty well, but not allways. Openoffice didn't compile, and strangely kde 3.2 korganizer doesn't work right. in a thread (i can't remember which one) that this might come from march=pentium4.

Well my next system should be a system that is optimized but STABLE!!
So I consider lowering my CFLAGS to

CFLAGS="-march=i686 -O2 -pipe -fomit-frame-pointer"

now my questions:
-is the change from -march=pentium4 to -march=i686 decrasing performance drastically.
-shouldn't i actually use -mcpu=i686 since my cpu isn't a pentium pro?
-is this going to result in a more stable system?

and my last question: when I do a stage 3 install, well what are the cflags of the i686 and the pentium 4 installation by default?

greets and thanks for your help
_________________
Remember: Gentoo Rocks
Back to top
View user's profile Send private message
tapted
Tux's lil' helper
Tux's lil' helper


Joined: 02 Dec 2003
Posts: 122
Location: Sydney, Australia

PostPosted: Thu Mar 04, 2004 10:56 am    Post subject: Reply with quote

Säck wrote:
Openoffice didn't compile,


It never seems to.

emerge openoffice-bin

seems to make most people happy
Säck wrote:
and strangely kde 3.2 korganizer doesn't work right. in a thread (i can't remember which one) that this might come from march=pentium4.

Well my next system should be a system that is optimized but STABLE!!
So I consider lowering my CFLAGS to

CFLAGS="-march=i686 -O2 -pipe -fomit-frame-pointer"

These are pretty tame...
Quote:


now my questions:
-is the change from -march=pentium4 to -march=i686 decrasing performance drastically.


Most likely no. p4 extensions are things like sse2, which mainly effect floating point arithmetic. Unless you're doing ray tracing, something like 95% of operations are integer arithmetic.

However, there should be no reason (in theory) to drop down. -march enhancements rarely effect code semantics.

Säck wrote:

-shouldn't i actually use -mcpu=i686 since my cpu isn't a pentium pro?


AFAIK, for the most part, these are identical. i586 is pentium, i686 is pentium pro. That's a generalisation .. I don't know the details.

I do, however, know the difference between -mcpu and -march. The difference is when code is run on an architecture that is not the same as the specified parameter. x86 is backward compatible, so there is no problem if you run i686 code on a p4, say. However, code compiled with -march=i686 will not run on a 486, say. However, if you use -mcpu, it is (in theory at least) compatible with all x86 architectures, BUT it includes the capability to run the extensions of the specified architecture, so should run fastest on it.

My grammar sucks, but that's the gist of it.

Unless you're running the same binary on two different processors/computers, there is not really any point to specify a -mcpu flag over a -march.

There might be cause to specify both (with the same argument), in case there's an ebuild that's known to be broken with a particular march, but not the equivalent mcpu, and so filters out the march... That's rare though. I don't bother.

Säck wrote:


-is this going to result in a more stable system?



Unlikely. cf Tame.

Säck wrote:


and my last question: when I do a stage 3 install, well what are the cflags of the i686 and the pentium 4 installation by default?

greets and thanks for your help


AFAIK, P4 stage 3 and the GRP are compiled with something like

Code:

CFLAGS="-O3 -march=pentium4 -funroll-loops -fprefetch-loop-arrays -pipe"


I'm not a gentoo developer though, so it may have changed, and I could be wrong.

For more info (and sorry for referencing my own post again, but it's a new page....) see

https://forums.gentoo.org/viewtopic.php?p=793905#793905

Moo.
Back to top
View user's profile Send private message
n3m0
l33t
l33t


Joined: 08 Feb 2004
Posts: 798
Location: Richville, Naples, Italy, Europe

PostPosted: Sat Mar 06, 2004 2:20 pm    Post subject: Reply with quote

tapted wrote:


... I still don't see the point of -falign-*

64 is silly though. Anything above the size of a 'word' (32 or 64 _bits_ today -- or 4-8 bytes) doesn't make sense at all -- it's just a waste of cache. -falign-* uses _bytes_ [not bits or kB].


According to "AMD Athlon Processor x86 Code Optimization Guide" the AthlonXP processor has a 64-byte cache line.
This should justify the following flag:

-falign-functions=64

But the gcc manual say (about -falign-functions):

"If n is not specified or is zero, use a machine-dependent default"

that means (for me) that on the "standard" i686 machine "-falign-functions" and "-falign-functions=32" are the same thing (right?).
On the AXP "-falign-functions" and "-falign-functions=64" will be the same thing?.

:?: :?:

Moreover, the "AMD Athlon Processor x86 Code Optimization Guide" says:

"In program hot spots (as determined by either profiling or loop
nesting analysis), place branch targets at or near the beginning
of 16-by te aligned code windows. Th is guideline improve s
performance insi de hotspots by maximizing the nu mber of
instruction fills into the instruction-byte queue and preserves I-
cache space in branch-intensive code outside such hotspots."


This passage seems to justify the following flag:

-falign-jumps=16

...but I'm not sure of this.
_________________
L’energia è la civiltà. Lasciarla in mano ai piromani/petrolieri è criminale. Perché aspettare che finisca il petrolio?
L’età della pietra non è mica finita per mancanza di pietre. - B.G.


Site/Blog: http://www.neminis.org
Back to top
View user's profile Send private message
sleek
n00b
n00b


Joined: 09 Jan 2003
Posts: 71

PostPosted: Sun Mar 07, 2004 3:22 pm    Post subject: Reply with quote

For all those with an Intel Celeron (Coppermine) 600mhz CPU:

Code:
craig@sleekdesign code $ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Celeron (Coppermine)
stepping        : 3
cpu MHz         : 593.296
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 1171.45


These CFLAGS work great:

Code:
CFLAGS="-O3 -march=pentium3 -fomit-frame-pointer -pipe -mmmx -msse -mfpmath=sse"

_________________
Yesterday was the deadline for all complaints
Back to top
View user's profile Send private message
fishhead
Apprentice
Apprentice


Joined: 07 Mar 2003
Posts: 162
Location: Pasadena, CA

PostPosted: Tue Mar 09, 2004 3:04 am    Post subject: Reply with quote

n3m0 wrote:

According to "AMD Athlon Processor x86 Code Optimization Guide" the AthlonXP processor has a 64-byte cache line.
This should justify the following flag:

-falign-functions=64

But the gcc manual say (about -falign-functions):

"If n is not specified or is zero, use a machine-dependent default"

that means (for me) that on the "standard" i686 machine "-falign-functions" and "-falign-functions=32" are the same thing (right?).
On the AXP "-falign-functions" and "-falign-functions=64" will be the same thing?.


I thought so too at first, but you'll not really see any advantage with this. The athlon uses not-RAMBUS memory and can thus specify what part of the cache line to load first (i.e. where the function starts), It's to your advantage thous to do -falign-functions=16 since the athlon (as you cite below) fetches from 16 byte boundaries. You can specify a lower alignment and trade slightly more decoding at the beginning of a function for cache space.

GCC's alignment defaults are pretty well tuned already. I use -falign-jumps=16 -falign-loops=16 -falign-functions=16 -falign-labels=1 -- I think only one or two of these is different from what GCC uses by default.

n3m0 wrote:

Moreover, the "AMD Athlon Processor x86 Code Optimization Guide" says:

"In program hot spots (as determined by either profiling or loop
nesting analysis), place branch targets at or near the beginning
of 16-by te aligned code windows. Th is guideline improve s
performance insi de hotspots by maximizing the nu mber of
instruction fills into the instruction-byte queue and preserves I-
cache space in branch-intensive code outside such hotspots."


This passage seems to justify the following flag:

-falign-jumps=16

...but I'm not sure of this.


See above. I'm almost positive that GCC does this by default for the athlon.
Back to top
View user's profile Send private message
KingPunk
Guru
Guru


Joined: 22 Jan 2004
Posts: 442
Location: Utica, New York, USA

PostPosted: Tue Mar 09, 2004 8:21 pm    Post subject: Reply with quote

just thought i'd add my two point two cents :D

Code:
CFLAGS="-march=athlon-xp -O3 -ffast-math -malign-double -funroll-loops -pipe -fomit-frame-pointer -msse -mfpmath=sse,387"
CHOST="i686-pc-linux-gnu"
CXXFLAGS="${CFLAGS}"


is there anything i should add or subtract?
(note that i don't care about having no chance of debugging, or anything like that.
no biggie to me. i just want the code, to F L Y! :twisted:)

Thanks!

~KingPunk
_________________
When the FBI/CIA/NSA/FDA/and other three-letter government agencies come looking, you don't know me, you never saw me, never heard of me. get it? got it? good!
also: ALL YOUR POLLITICAL BASE ARE BELONG TO HILLARY IN '08!!
Back to top
View user's profile Send private message
tapted
Tux's lil' helper
Tux's lil' helper


Joined: 02 Dec 2003
Posts: 122
Location: Sydney, Australia

PostPosted: Tue Mar 09, 2004 10:23 pm    Post subject: Reply with quote

KingPunk wrote:
just thought i'd add my two point two cents :D

Code:
CFLAGS="-march=athlon-xp -O3 -ffast-math -malign-double -funroll-loops -pipe -fomit-frame-pointer -msse -mfpmath=sse,387"
CHOST="i686-pc-linux-gnu"
CXXFLAGS="${CFLAGS}"


is there anything i should add or subtract?


-mfpmath=sse,387 is bad.

See https://forums.gentoo.org/viewtopic.php?p=796878#796878

and the one two down from that.

-malign-double is also strongly warned AGAINST -- it generally results in slower code.... although, admittedly, I can't remember where I saw this or even what mailgn-double actually does...

-ffast-math is also debateable.

The rest are good, but there are probably others that you can include -- look back through the thread.

Moo.
Back to top
View user's profile Send private message
KingPunk
Guru
Guru


Joined: 22 Jan 2004
Posts: 442
Location: Utica, New York, USA

PostPosted: Tue Mar 09, 2004 10:53 pm    Post subject: Reply with quote

odd enough, i've compiled the whole system with it. rofl.

and they say it will in fact, make it run slower.
so, what would be the best to use?
like, if you were to get the cflags to run on a 2500+ barton, 333fsb, 512 L2,
... what would you run?

i want to get the absloute fastest system going. that way i can get
every edge over my friends box hes building. (we got a nice little
competition going :twisted: ..and he doesn't know how to do software
optimizations, via cflags, so yeah!)

so if i could get ahold of the "best" flags to use, without the need
for debugging, i just want my box to smoke. as long as it isn't menthol,
har har har :lol:

thanks much.
~KingPunk
_________________
When the FBI/CIA/NSA/FDA/and other three-letter government agencies come looking, you don't know me, you never saw me, never heard of me. get it? got it? good!
also: ALL YOUR POLLITICAL BASE ARE BELONG TO HILLARY IN '08!!
Back to top
View user's profile Send private message
n3m0
l33t
l33t


Joined: 08 Feb 2004
Posts: 798
Location: Richville, Naples, Italy, Europe

PostPosted: Wed Mar 10, 2004 8:03 pm    Post subject: Reply with quote

fishhead wrote:

GCC's alignment defaults are pretty well tuned already. I use -falign-jumps=16 -falign-loops=16 -falign-functions=16 -falign-labels=1 -- I think only one or two of these is different from what GCC uses by default.


Thanks for your hints.
I think I'll leave -falign* on the default value, implied by -O2...It seems the most reasonable choice.

Finally, my definitive flags should be these:

CFLAGS="-march=athlon-xp -O3 -pipe -mfpmath=387 -fforce-addr -fomit-frame-pointer -ffast-math -funroll-loops -fprefetch-loop-arrays -fmove-all-movables"

What do you think about them?
I have still a trouble about -O3. I would substitute it with "-O2 -frename-registers" (-frename-registers is one of two flags added moving form -O2 to -O3).
Infact, I have some trouble about -finline-functions (the other flag implied by -O3 and not implied by -O2).
It could increase the code excessively, increasing the load time, without provide a sensible increment of speed in the execution of a process.
But I'm quite unsure about this.

PS: ok ok, I know, I ask me too questions! :)
_________________
L’energia è la civiltà. Lasciarla in mano ai piromani/petrolieri è criminale. Perché aspettare che finisca il petrolio?
L’età della pietra non è mica finita per mancanza di pietre. - B.G.


Site/Blog: http://www.neminis.org
Back to top
View user's profile Send private message
punter
Guru
Guru


Joined: 25 Nov 2002
Posts: 506

PostPosted: Fri Mar 12, 2004 1:33 pm    Post subject: Reply with quote

KingPunk wrote:

so if i could get ahold of the "best" flags to use, without the need
for debugging, i just want my box to smoke. as long as it isn't menthol,
har har har :lol:

thanks much.
~KingPunk


sounds like you need a hand for this small competition of yours,

forget about flags, go to bios and overclock cpu freq 60% higher than average, and bus/ram freq 50% faster.
then buy a floating powder nitrogen spray, take off cpu heatsink, and spray at cpu core, while doing a computationally expensive calc on the computer.

that'll make your computer smoke, as well as fry, and last but not least do the computation ultra-faster.
Back to top
View user's profile Send private message
Gentree
Watchman
Watchman


Joined: 01 Jul 2003
Posts: 5350
Location: France, Old Europe

PostPosted: Fri Mar 12, 2004 5:41 pm    Post subject: Reply with quote

Quote:
..and he doesn't know how to do software
optimizations, via cflags, so yeah!


Neither do you it seems!

Seriously as the last post said , you'll get far more from overclocking.

I dont know what your mobo is but I have a athlonXP2000+ on a KX7-333 (with a GOOD solid copper heatsink).

If I wnat to go mad I can push FSB to 186 and cpu to 2.323GHz.

It will fall on its arse if you try to rebuild KDE but will run normal desktop stuff fairly well.

Setup PChealth protection on your BIOS and use lm_sensors to keep an eye on the cpu and test with burnP6 and burnBX et al (emerge cpuburn I think)

I hit lucky with my cpu so your may not get as far.

Have fun.
Back to top
View user's profile Send private message
robmoss
Retired Dev
Retired Dev


Joined: 27 May 2003
Posts: 2634
Location: Jesus College, Oxford

PostPosted: Fri Mar 12, 2004 6:23 pm    Post subject: Reply with quote

I was under the impression that -malign-double was very, very good indeed... when it works. I may have to test this.
_________________
Reality is for those who can't face Science Fiction.

emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts
Back to top
View user's profile Send private message
n3m0
l33t
l33t


Joined: 08 Feb 2004
Posts: 798
Location: Richville, Naples, Italy, Europe

PostPosted: Fri Mar 12, 2004 7:56 pm    Post subject: Reply with quote

robmoss2k wrote:
I was under the impression that -malign-double was very, very good indeed... when it works. I may have to test this.


I tried it during the first installation of Gentoo on my Athlon XP 2600.
It broke the most part of the executables.
The binutils did not function correctly.
Diffutils did not compile.
Etc...etc...
_________________
L’energia è la civiltà. Lasciarla in mano ai piromani/petrolieri è criminale. Perché aspettare che finisca il petrolio?
L’età della pietra non è mica finita per mancanza di pietre. - B.G.


Site/Blog: http://www.neminis.org
Back to top
View user's profile Send private message
nmcsween
Guru
Guru


Joined: 12 Nov 2003
Posts: 381

PostPosted: Sat Mar 13, 2004 10:18 am    Post subject: Reply with quote

n3m0: Aligning the functions to take the whole width of the cache would cause something called cache misses and also fill it up with usless data since when it needs something that is say only 8 bytes it causes the extra 56 bytes to be filled with junk thus filling your caches with junk to my understanding -falign-functions and -falign-jumps only compiles some of the code into the boundries and not all (not all meaning other code).

Last edited by nmcsween on Sat Mar 13, 2004 10:25 am; edited 1 time in total
Back to top
View user's profile Send private message
nmcsween
Guru
Guru


Joined: 12 Nov 2003
Posts: 381

PostPosted: Sat Mar 13, 2004 10:25 am    Post subject: Reply with quote

As far as -malign-double its use is to compile code into a two word boundry instead of the default. This generally maims the alignment, it's not needed. On the other hand if you feel like you need to ride the really really wild side of gcc optimizations then try -mregparm=3 this controls how many registers are used to pass integer arguments from 1-3, which is a good thing but make sure you do that on a fresh install.
Back to top
View user's profile Send private message
nmcsween
Guru
Guru


Joined: 12 Nov 2003
Posts: 381

PostPosted: Sat Mar 13, 2004 10:29 am    Post subject: Reply with quote

If you want to have an ultra optimized system try out these flags:
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer -fno-crossjumping -falign-functions=16 -falign-loops=16 -falign-jumps=16 -fno-align-labels -mfpmath=sse,387 -maccumulate-outgoing-args -fmove-all-movables -freduce-all-givs"
#-fnew-ra ( use -fnew-ra with caution) All these flags optimize without an additonal increase in memory usage or drive space usage of what -O3 specifies.
Back to top
View user's profile Send private message
neenee
Veteran
Veteran


Joined: 20 Jul 2003
Posts: 1786

PostPosted: Sat Mar 13, 2004 10:43 am    Post subject: Reply with quote

i now use:

CFLAGS="-O2 -march=athlon-xp -pipe -fomit-frame-pointer -ftracer"
Back to top
View user's profile Send private message
KingPunk
Guru
Guru


Joined: 22 Jan 2004
Posts: 442
Location: Utica, New York, USA

PostPosted: Tue Mar 16, 2004 12:58 am    Post subject: Reply with quote

Ultraoctane.com wrote:
If you want to have an ultra optimized system try out these flags:
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer -fno-crossjumping -falign-functions=16 -falign-loops=16 -falign-jumps=16 -fno-align-labels -mfpmath=sse,387 -maccumulate-outgoing-args -fmove-all-movables -freduce-all-givs"
#-fnew-ra ( use -fnew-ra with caution) All these flags optimize without an additonal increase in memory usage or drive space usage of what -O3 specifies.


Thank you for your tip. building the system now. :D

/sings* oh what fun, it is to watch, code complile on the fly, hey! */

~KingPunk
_________________
When the FBI/CIA/NSA/FDA/and other three-letter government agencies come looking, you don't know me, you never saw me, never heard of me. get it? got it? good!
also: ALL YOUR POLLITICAL BASE ARE BELONG TO HILLARY IN '08!!
Back to top
View user's profile Send private message
nmcsween
Guru
Guru


Joined: 12 Nov 2003
Posts: 381

PostPosted: Wed Mar 17, 2004 6:25 am    Post subject: Reply with quote

Quote:

Ultraoctane.com wrote:
If you want to have an ultra optimized system try out these flags:
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer -fno-crossjumping -falign-functions=16 -falign-loops=16 -falign-jumps=16 -fno-align-labels -mfpmath=sse,387 -maccumulate-outgoing-args -fmove-all-movables -freduce-all-givs"
#-fnew-ra ( use -fnew-ra with caution) All these flags optimize without an additonal increase in memory usage or drive space usage of what -O3 specifies.


Thank you for your tip. building the system now.

/sings* oh what fun, it is to watch, code complile on the fly, hey! */

~KingPunk

I should have added that you need to have a newer proc to use these flags and I assumed that everyone knew to edit there -march= flag.
Back to top
View user's profile Send private message
tapted
Tux's lil' helper
Tux's lil' helper


Joined: 02 Dec 2003
Posts: 122
Location: Sydney, Australia

PostPosted: Wed Mar 17, 2004 8:21 am    Post subject: Reply with quote

I'll say it again: the consensus seems to be that -mfpmath=387,sse is bad...

According to

http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Optimize-Options.html
and
http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/i386-and-x86-64-Options.html

it would also appear that -fomit-frame-pointer \implies -momit-leaf-frame-pointer

and -mfpmath=387 is the default for all but the Athlon x86-64 compiler

-ftracer is new in gcc3.3 and looks good.

-fno-crossjumping and -fno-align-labels are not mentioned directly -- perhaps someone knows benefits/disadvantages.

-maccumulate-outgoing-args also looks handy.

Here's a snip

http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Optimize-Options.html wrote:

-fnew-ra
Use a graph coloring register allocator. Currently this option is meant for testing, so we are interested to hear about miscompilations with -fnew-ra.

-ftracer
Perform tail duplication to enlarge superblock size. This transformation simplifies the control flow of the function allowing other optimizations to do better job.

-funroll-loops
Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies both -fstrength-reduce and -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.

-funroll-all-loops
Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. -funroll-all-loops implies the same options as -funroll-loops,

-fprefetch-loop-arrays
If supported by the target machine, generate instructions to prefetch memory to improve the performance of loops that access large arrays.

Disabled at level -Os.


the rest are old hat.


More snips:


http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/i386-and-x86-64-Options.html wrote:

-malign-double
-mno-align-double
Control whether GCC aligns double, long double, and long long variables on a two word boundary or a one word boundary. Aligning double variables on a two word boundary will produce code that runs somewhat faster on a Pentium at the expense of more memory.

Warning: if you use the -malign-double switch, structures containing the above types will be aligned differently than the published application binary interface specifications for the 386 and will not be binary compatible with structures in code compiled without that switch.

-mregparm=num
Control how many registers are used to pass integer arguments. By default, no registers are used to pass arguments, and at most 3 registers can be used. You can control this behavior for a specific function by using the function attribute regparm. See Function Attributes.

Warning: if you use this switch, and num is nonzero, then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.

-maccumulate-outgoing-args
If enabled, the maximum amount of space required for outgoing arguments will be computed in the function prologue. This is faster on most modern CPUs because of reduced dependencies, improved scheduling and reduced stack usage when preferred stack boundary is not equal to 2. The drawback is a notable increase in code size. This switch implies -mno-push-args.


Moo.
Back to top
View user's profile Send private message
nmcsween
Guru
Guru


Joined: 12 Nov 2003
Posts: 381

PostPosted: Wed Mar 17, 2004 10:56 am    Post subject: Reply with quote

Quote:

I'll say it again: the consensus seems to be that -mfpmath=387,sse is bad...

Theres no way giving extra instruction sets can be bad. It may be a little risky but if your going for server stability then you shouldn't be looking at this thread.
Quote:

-fno-crossjumping and -fno-align-labels are not mentioned directly -- perhaps someone knows benefits/disadvantages.

-fcrossjumping has shown to lessen performance.
Quote:

-fnew-ra
Use a graph coloring register allocator. Currently this option is meant for testing, so we are interested to hear about miscompilations with -fnew-ra.

Seems to kill a large ammount of compiles but gives better performace.
Quote:

funroll-loops
Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies both -fstrength-reduce and -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.

-funroll-all-loops
Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. -funroll-all-loops implies the same options as -funroll-loops,

These optimizations are a waste of space they don't give any real performance increase and most likely slow down a computer that uses them.
Quote:

-mregparm=num
Control how many registers are used to pass integer arguments. By default, no registers are used to pass arguments, and at most 3 registers can be used. You can control this behavior for a specific function by using the function attribute regparm. See Function Attributes.

Warning: if you use this switch, and num is nonzero, then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.

This is the holly grail of optimizations but since it never works ( I haven't seen it work) it's useless right now.
Back to top
View user's profile Send private message
seppe
Guru
Guru


Joined: 01 Sep 2003
Posts: 431
Location: Hove, Antwerp, Belgium

PostPosted: Wed Mar 17, 2004 3:06 pm    Post subject: Reply with quote

What do you guys suggest for this cpu?

Code:

root@iris seppe # cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 3
cpu MHz         : 800.314
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 1568.76


I have now in make.conf:
Code:

CFLAGS="-march=pentium3 -mmmx -msse -Os -fomit-frame-pointer -pipe -fforce-addr -fforce-mem -ffast-math -mpush-args -mfpmath=sse -w"

But I haven't done an emerge -e world yet, I just want to make sure that these are good. So if you have suggestions for my CFLAGS, let me know.

Btw, I took -Os instead of -O3 because I heard -O3 is bad when you have not much cache (I have 256kb). Why not -O2 then? Because I care more about the startup time of my apps then the general performance of those apps, although I'm considering to switch to -O2 so that I have more general performance and still a good startup time.

I have read the whole thread, and I have different stuff noted:
genflags suggests me:
Code:
CFLAGS="-march=pentium3 -O3 -pipe"


another P3 coppermine user has these cflags:
Code:
-march=pentium3 -O2 -fomit-frame-pointer -momit-leaf-frame-pointer -fprefetch-loop-arrays


another p3 user has this:
Code:
-march=pentium3 -O3 -mmmx -msse -pipe -fomit-frame-pointer -fprefetch-loop-arrays


another p3 user has:
Code:
CFLAGS="-march=pentium3 -O2 -pipe -frename-registers -mmmx -msse -fmove-all-movables -mfpmath=sse -w"


another p3 user:
Code:
-march=pentium3 -O3 -pipe -fomit-frame-pointer -fforce-addr -falign-functions=4 -fprefetch-loop-arrays -fexpensive-optimizations


other stuff I noted:
Quote:
-funroll-loops is probably not good on a p3, do to bandwidth and L1 cache limits


Quote:
All -fomit-frame-pointer does is free up a register. Free registers + less code on function entrance = very good. Use it!


Quote:
-mfpmath=sse doesn't improve anything, but -ffast-math can increase the performance by 40%


Quote:
I should note that on the Pentium 3, -O3 -freduce-all-givs generates code that is 35% faster than -O3 alone


Quote:
Don't add too much cflag's because that will slow down the performance


Ok, what I want to say is that I've read so many suggestions for my P3 800Mhz CPU so that I don't really know now which flag's I *really* should take and which flags not.

If you know which flags I *really should* take, please tell me ;)
Remember that I want that my apps start up quickly (so not to large binaries) AND that I still want great general performance.

Thanks ;)
_________________
nitro-sources, because between stable and experimental there exists only speed

Latest release I made: 2.6.13.2-nitro1
Back to top
View user's profile Send private message
nmcsween
Guru
Guru


Joined: 12 Nov 2003
Posts: 381

PostPosted: Wed Mar 17, 2004 3:39 pm    Post subject: Reply with quote

First off I have to say don't listen to a good amount of people here. Some people seem to be giving bad advice. why? most likely they don't know what there talking about. (this isn't to anyone in particular). I really don't see why people are telling you to use -Os since your system is well within the limits of even -O3 and -O3 will add a few much needed flags to your compiles that your march flag specifies so to wrap this up heres what i recommend:
-march=pentium3 -O3 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer -fno-crossjumping -mfpmath=sse -maccumulate-outgoing-args -fmove-all-movables -freduce-all-givs that will give you a noticable increase in speed. Also -ffast-math is totaly up to you, but i don't recommend it since you'll get a 40% increase in speed in very very very rare occasions.
Back to top
View user's profile Send private message
nmcsween
Guru
Guru


Joined: 12 Nov 2003
Posts: 381

PostPosted: Wed Mar 17, 2004 3:46 pm    Post subject: Reply with quote

Quote:

Quote:
Don't add too much cflag's because that will slow down the performance

Thats simply wrong if you don't know what your doing with the cflags stay out of the kitchen or you'll get burned.
Back to top
View user's profile Send private message
Display posts from previous:   
This topic is locked: you cannot edit posts or make replies.    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page Previous  1, 2, 3 ... 27, 28, 29 ... 35, 36, 37  Next
Page 28 of 37

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum