gcc optimize for p3, p4 & xp

Message

krinn · Post by **krinn** » Sun Nov 21, 2004 7:12 am

Just find that, looks cool !

from gcc man pages...

`-mfpmath=UNIT'
     Generate floating point arithmetics for selected unit UNIT.  The
     choices for UNIT are:

    `387'
          Use the standard 387 floating point coprocessor present
          majority of chips and emulated otherwise.  Code compiled with
          this option will run almost everywhere.  The temporary
          results are computed in 80bit precision instead of precision
          specified by the type resulting in slightly different results
          compared to most of other chips. See `-ffloat-store' for more
          detailed description.

          This is the default choice for i386 compiler.

    `sse'
          Use scalar floating point instructions present in the SSE
          instruction set.  This instruction set is supported by
          Pentium3 and newer chips, in the AMD line by Athlon-4,
          Athlon-xp and Athlon-mp chips.  The earlier version of SSE
          instruction set supports only single precision arithmetics,
          thus the double and extended precision arithmetics is still
          instruction set supports only single precision arithmetics,
          thus the double and extended precision arithmetics is still
          done using 387.  Later version, present only in Pentium4 and
          the future AMD x86-64 chips supports double precision
          arithmetics too.

          For i387 you need to use `-march=CPU-TYPE', `-msse' or
          `-msse2' switches to enable SSE extensions and make this
          option effective.  For x86-64 compiler, these extensions are
          enabled by default.

          [b]The resulting code should be considerably faster in the
          majority of cases[/b] and avoid the numerical instability
          problems of 387 code, but may break some existing code that
          expects temporaries to be 80bit.

          This is the default choice for the x86-64 compiler.

Got it ?
pentium3, pentium4, athlon users could use it :p

Code: Select all

CFLAGS="-march=pentium4 -mtune=pentium4 -O3 -pipe -msse2 -msse -mfpmath=sse -mmmx

It should be safe as it's the default choice for x86-64...

krinn · Post by **krinn** » Sun Nov 21, 2004 7:18 am

Well i wasn't really sure i should post that one (looks more dangerous than the other) but, if you feel crazy enought

Code: Select all

    sse,387
          Attempt to utilize both instruction sets at once.  This
          effectively double the amount of available registers and on
          chips with separate execution units for 387 and SSE the
          execution resources too.  Use this option with care, as it is
          still experimental, because the GCC register allocator does
          not model separate functional units well resulting in
          instable performance.

Code: Select all

CFLAGS="-march=pentium4 -mtune=pentium4 -O3 -pipe -msse2 -msse -mfpmath=sse,387 -mmmx

And don't miss it--> Use this option with care, as it is still experimental

frenkel · Post by **frenkel** » Sun Nov 21, 2004 10:57 am

I'm using this -mfpmath=sse,387 flag since I installed this system about a year ago (Athlon XP 2800+) and never had any problems with it. I use this system every day.

Frank

Dolio · Post by **Dolio** » Mon Nov 22, 2004 3:48 am

The only flag here that probably does anything is '-mfpmath=sse,387' and that only because it's experimental.

When you set '-march=whatever' it should automatically signal gcc t use '-msse -mmmx' etc. as appropriate to the architecture you specify. The only reason to use those flags is if you want to use -march=i386 and enable everything else manually or if something weird is going on with your cpu (like you have an Athlon Thunderbird that magically developed sse2 instructions

).

Otherwise, it's either redundant (since it's already being specified by march) or potentially dangerous (since you could generate code that doesn't execute on your processor).

augury · Post by **augury** » Mon Nov 22, 2004 6:59 am

-mfpmath=sse,387 doesnt do anything worth the effort

-msse3 on -march=prescott will have an effect if you use gcc-3.4.3,
devs took it out, i dont know why exactly, i think it gets to much when by default maybe or just broken.

frenkel · Post by **frenkel** » Mon Nov 22, 2004 3:54 pm

augury wrote:-mfpmath=sse,387 doesnt do anything worth the effort

Where is this based on??

Frank

rhill · Post by **rhill** » Wed Dec 01, 2004 1:27 am

http://www.coyotegulch.com/products/aco ... ginal.html
http://www.coyotegulch.com/products/aco ... vea_4.html

i was also just browsing the gcc mailing list for reference to sse,387 sucking, and instead found an example to the contrary. in fact, for the P4, 'sse,387' > '387' > 'sse'. not right now (they were discussing a recent patch for gcc 4.0), but it's good to see that it's being looked at.

but i've heard a lot about how sse,387 doesn't work, is broken, or runs slower than the defaults. who knows, if it works for you, go for it. as with everything, it depends what you're running and what you're running it on.

MighMoS · Post by **MighMoS** » Wed Dec 01, 2004 2:14 am

This cut GNOME's startup time in half, as well as maploads for UT2k4 (I relinked the libs)

opm8 · Post by **opm8** » Wed Dec 01, 2004 7:13 am

MighMoS,

What's the command to relink libs?

MighMoS wrote:This cut GNOME's startup time in half, as well as maploads for UT2k4 (I relinked the libs)

ARC2300 · Post by **ARC2300** » Sun Dec 05, 2004 10:53 am

opm8 wrote:MighMoS,

What's the command to relink libs?

MighMoS wrote:This cut GNOME's startup time in half, as well as maploads for UT2k4 (I relinked the libs)

I believe you're looking for "ldconfig".

yngwin · Post by **yngwin** » Mon Dec 06, 2004 10:01 am

Actually on athlon-xp -mfpmath=387 is faster than the other options...

thechris · Post by **thechris** » Mon Dec 06, 2004 6:13 pm

in every test i've done and every one i've seen, -mfpmath=anything will be worse then omitting the option. I can only assume the compiler can determine these things better. in the future 387,sse should be faster.

Genkaku · Post by **Genkaku** » Mon Dec 06, 2004 6:59 pm

MighMoS, what cpu do you have ? And You have chosen -mfpmath=387, -mfpmath=sse or -mfpmath=sse,387 ?

krinn · Post by **krinn** » Wed Dec 08, 2004 11:27 pm

ok, after few days testing mfpmath=sse,387 i could say

- Speed: well, can't really see the difference as i haven't tune it yet, except maybe gnome, looks to respond faster, but could be psychologic result... and loading seems really better...
- Stability: actually no problem with binary, no crash... code is stable for me...

augury: i'm aware of flags for prescott, nocona, sse3... but i open that thread for the mfpmath that i wasn't knowing, everyone talk about others, but actually never saw a thread with that one.
Maybe a lot of ppl knows it, but as nobody write it down, i didn't get that one, until now...

dirtyepic: both links are dead, could you drop some others ?

dolio: yep, but 1/ redundant isn't dangerous (my gcc like it), and 2/ mtune will automatically set them, not march.
ie: -march=pentium4 -msse3 == -march=pentium4 -mtune=prescott
So if you only set march=pentium4 and got a prescott, you will not have sse3 code until mtune or msse3 specified... As you see, march gives general architecture optimization, but you need to tune to your processor implementation.

Anyone got a real testcase with "time" ?
ps: should be a program that will help gcc produce code for sse,387... some equations maybe. and result should fail a "diff nonoptimizedversion optimizedversion"

bi3l · Post by **bi3l** » Wed Dec 08, 2004 11:49 pm

krinn wrote:dolio: yep, but 1/ redundant isn't dangerous (my gcc like it), and 2/ mtune will automatically set them, not march.
ie: -march=pentium4 -msse3 == -march=pentium4 -mtune=prescott
So if you only set march=pentium4 and got a prescott, you will not have sse3 code until mtune or msse3 specified... As you see, march gives general architecture optimization, but you need to tune to your processor implementation.

That's not exactly true as you can just set -march=prescott and according to the man page of gcc:

specifying -march=cpu-type implies -mtune=cpu-type.

krinn · Post by **krinn** » Thu Dec 09, 2004 2:07 am

good catch