Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Really optimize for Athlon 1GHz?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
garyura
n00b
n00b


Joined: 18 Apr 2002
Posts: 12

PostPosted: Thu Apr 18, 2002 1:15 am    Post subject: Really optimize for Athlon 1GHz? Reply with quote

Can someone hellp me to configure my /etc/make.conf @ how to set my Gentoo to compile all my app. to really optimize for my hardware instead using the default one " i686 " . My hardware is

- Athlon 1GHz
- 512 MB Ram 133 MHz
- UltraATA 100
- TNT2 Nvidia
TQ for any help :?:
Back to top
View user's profile Send private message
monkeyboy
n00b
n00b


Joined: 18 Apr 2002
Posts: 29
Location: Denver

PostPosted: Thu Apr 18, 2002 2:33 am    Post subject: Reply with quote

These are the flags I use and they seem to work fine:

CHOST="i686-pc-linux-gnu"
CFLAGS="-march=i686 -O3 -pipe -fomit-frame-pointer -funroll-loops -fforce-addr -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4"
CXXFLAGS="-march=i686 -O3 -pipe -fomit-frame-pointer -funroll-loops -fforce-addr -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4"

I've compiled X and kde with -09 instead of -03 and haven't had any problems with thoes either. I have an athlon 1.4 w/ 512mb ram.
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Thu Apr 18, 2002 3:15 am    Post subject: Reply with quote

can I change my variables once I've installed gentoo ?

ie. can I change them from standard to your optimised ones? will this work?

I tried to compile something with these variables, and it failed.

is this because it's not right for my system, or because my main system is compiled with diff flags?
Back to top
View user's profile Send private message
vilanox
n00b
n00b


Joined: 16 Apr 2002
Posts: 12
Location: nyc

PostPosted: Thu Apr 18, 2002 4:00 am    Post subject: Reply with quote

Would that setup work with mine also?

Athlon 1700
512 DDR
GeForce 2 MMX

TIA,
vilanox
Back to top
View user's profile Send private message
c_kuzmanic
Guest





PostPosted: Thu Apr 18, 2002 6:08 am    Post subject: Reply with quote

Yes, since your Processor also belongs to the Athlon family, these optimizations will work for you.
Back to top
vilanox
n00b
n00b


Joined: 16 Apr 2002
Posts: 12
Location: nyc

PostPosted: Thu Apr 18, 2002 6:18 am    Post subject: Reply with quote

c_kuzmanic wrote:
Yes, since your Processor also belongs to the Athlon family, these optimizations will work for you.


Thanks a bunch,

vilanox
Back to top
View user's profile Send private message
theotherphil
n00b
n00b


Joined: 18 Apr 2002
Posts: 7

PostPosted: Thu Apr 18, 2002 9:47 am    Post subject: Reply with quote

Are there any specific optimisations for the XP/MP range? Also, how do I get make to run with 3 threads as a default instead of having to specify -j3 everytime? I run a dual XP1800 system so I may as well take advantage of it's performance :)
Back to top
View user's profile Send private message
slik
n00b
n00b


Joined: 18 Apr 2002
Posts: 48
Location: Alberta, Canada

PostPosted: Thu Apr 18, 2002 10:08 am    Post subject: Reply with quote

monkeyboy wrote:

CHOST="i686-pc-linux-gnu"
CFLAGS="-march=i686 -O3 -pipe -fomit-frame-pointer -funroll-loops -fforce-addr -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4"

From GCC info pages:
Quote:
When you specify `-O', the compiler turns on ... `-fomit-frame-pointer' on machines that can support debugging even without a frame pointer.
all other -O? options include this as well. So you might not want to explicitly set this flag if you want to debug something linked against libs compiled with this and instead let the compiler determine if it's safe to do this. If you're not going to be programming any c/c++, don't worry about it and include it.

And from GCC HOWTO:
Quote:
Internally, gcc translates these (-O?) to a series of -f and -m options. You can see exactly which -O levels map to which options by running gcc with the -v flag and the (undocumented) -Q flag.

Mine (not the howto's) results for passing ONLY -O3 for gcc 2.95.3:
Quote:
options passed: -O3
options enabled: -fdefer-pop -fcse-follow-jumps -fcse-skip-blocks -fexpensive-optimizations -fthread-jumps -fstrength-reduce -fpeephole -fforce-mem -ffunction-cse -finline-functions -finline -fkeep-static-consts -fcaller-saves -fpcc-struct-return -fgcse -frerun-cse-after-loop -frerun-loop-opt -fschedule-insns2 -fcommon -fgnu-linker -fregmove -foptimize-register-move -fargument-alias -fident -m80387 -mhard-float -mno-soft-float -mieee-fp -mfp-ret-in-387 -mschedule-prologue -mcpu=pentiumpro -march=pentium

I suspect that the mcpu and march are filled in by gcc by looking at information made available by the OS, you can see this with:
Code:
uname --machine --processor

I have an AMD Duron, so it's not a pentium or pentiumpro, it'd be interesting to see what would be filled in by gcc 3.0.4

More from the GCC HOWTO:

Quote:

There is currently no -mpentium or -m586 (Has not been updated in a while I guess). Linus suggests using -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2, to get 486 code optimisations but without the big gaps for alignment (which the pentium doesn't need). Michael Meissner (of Cygnus) says

My hunch is that -mno-strength-reduce also results in faster code on the x86 (note, I'm not talking about the strength reduction bug, which is another issue). This is because the x86 is rather register starved (and GCC's method of grouping registers into spill registers vs. other registers doesn't help either). Strength reduction typically results in using additional registers to replace multiplications with addition. I also suspect -fcaller-saves may also be a loss.
Another hunch is that -fomit-frame-pointer might or might not be a win. On the one hand, it can mean that another register is available for allocation. On the other hand, the way the x86 encodes its instruction set, means that stack relative addresses take more space instead of frame relative addresses, which means slightly less Icache availble to the program. Also, -fomit-frame-pointer, means that the compiler has to constantly adjust the stack pointer after calls, while with a frame, it can let the stack accumulate for a few calls.

The final word on this subject is from Linus again:

Note that if you want to get optimal performance, don't believe me: test. There are lots of gcc compiler switches, and it may be that a particular set gives the best optimizations for you.


mokeyboy wrote:
I've compiled X and kde with -09 instead of -03 and haven't had any problems with thoes either.


Anything higher than -O3 is still -O3, gentoo does not use pgcc. Again, from GCC HOWTO:
Quote:
Using an optimization level higher than your compiler supports (e.g. -O6) will have exactly the same effect as using the highest level that it does support.


For more info on available gcc optimization options info:gcc?Optimize_Options in galeon location bar works or
Code:
info --file=gcc --node="Optimize Options"
from the command line.

Last edited by slik on Thu Apr 18, 2002 1:16 pm; edited 1 time in total
Back to top
View user's profile Send private message
Guest






PostPosted: Thu Apr 18, 2002 12:54 pm    Post subject: Reply with quote

Those -Q and -v flags seem to work differently with gcc3.(0|1).
I always get the following output, no matter what -O flags.

Reading specs from /usr/lib/gcc-lib/athlon-mandrake-linux-gnu/3.0.4/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share/gcc-3.0.4 --enable-shared --enable-threads=posix --disable-checking --enable-long-long --enable-cstdio=stdio --enable-clocale=generic --enable-languages=c,c++,f77,objc,java --program-suffix=-3.0.4 --enable-objc-gc --host=athlon-mandrake-linux-gnu --with-system-zlib
Thread model: posix

Cheers,
A.
Back to top
slik
n00b
n00b


Joined: 18 Apr 2002
Posts: 48
Location: Alberta, Canada

PostPosted: Thu Apr 18, 2002 1:13 pm    Post subject: Reply with quote

Anonymous wrote:
Those -Q and -v flags seem to work differently with gcc3.(0|1).
I always get the following output, no matter what -O flags.


Well, I did have to pass a file to gcc to get that output.. (it was burried in other compiler output)
This is what I did:

Code:
gcc -v -Q -O3 blah.c

where blah.c consisted of
Code:
main(){}

produces do nothing a.out file to delete.
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Thu Apr 18, 2002 1:28 pm    Post subject: Reply with quote

so without sounding like a complete n00b, and without annoying anyone,

what USE variables, and flags should I have in my make.conf file for best performance out of my system, which is as follows:

amd XP 1800+
asus a7v133 mainboard
512Mb pc150 sdram
40Gb ide barracuda hdd 7200rpm
gf3 ti200
sblive! 5.1
promise ata100 controller (onboard)
realtek network card
USB s400 printer
Pioneer 16X dvd
ricoh 24x cdrw
microsoft internet pro keyboard
logitech dual optical usb mouse
19" lg 995E monitor

if someone can take the time to go through this for me, that would be much much appreciated :D
Back to top
View user's profile Send private message
Guest






PostPosted: Thu Apr 18, 2002 2:11 pm    Post subject: Reply with quote

Quote:
Well, I did have to pass a file to gcc to get that output.. (it was burried in other compiler output)


That did the trick. Thanks.

Cheers,
Andreas
Back to top
slik
n00b
n00b


Joined: 18 Apr 2002
Posts: 48
Location: Alberta, Canada

PostPosted: Thu Apr 18, 2002 3:20 pm    Post subject: Reply with quote

Andreas wrote:
That did the trick. Thanks.

Care to share what gcc 3.0.4 comes up with for -O3 optimization? what is your processor?
Back to top
View user's profile Send private message
devurandom
n00b
n00b


Joined: 08 Jan 2004
Posts: 63

PostPosted: Wed Apr 21, 2004 1:47 pm    Post subject: Reply with quote

Nice Thread...

Would "-mfpmath=sse" help anything?
Or is it contra productive?

I have also enabled:
-O2 (instead of "-O3", see below)
-fomit-frame-pointer (which I think is standard)
-fmove-all-movables
-funroll-loops
-fPIC -DPIC (for prelinking)

I have NOT enabled:
-O3 (because several people say it won't help anything or would even make the program more slowly and some programs on my system didn't compile with it (e.g. binutils ;) ))

Is this ok? Or are there other CFLAGS I should enable?

System:
Athlon-XP 2000+ (1.6GHz)
512 MB DDR-RAM (133 MHz FSB)
nVidia nForce 2 (ASUS A7N8X)
ATI Radeon 9000

Output of "gcc -v -Q -O2 -march=athlon-xp test.c"
Code:
GNU C version 3.3.3 20040217 (Gentoo Linux 3.3.3, propolice-3.3-7) (i686-pc-linux-gnu)
        compiled by GNU C version 3.3.3 20040217 (Gentoo Linux 3.3.3, propolice-3.3-7).
GGC heuristics: --param ggc-min-expand=64 --param ggc-min-heapsize=64488
options passed:  -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=3
 -march=athlon-xp -auxbase -O2
options enabled:  -fdefer-pop -foptimize-sibling-calls -fcse-follow-jumps
 -fcse-skip-blocks -fexpensive-optimizations -fthread-jumps
 -fstrength-reduce -fpeephole -fforce-mem -ffunction-cse
 -fkeep-static-consts -fcaller-saves -fpcc-struct-return -fgcse -fgcse-lm
 -fgcse-sm -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2
 -frerun-cse-after-loop -frerun-loop-opt -fdelete-null-pointer-checks
 -fschedule-insns2 -fsched-interblock -fsched-spec -fbranch-count-reg
 -freorder-blocks -freorder-functions -fcprop-registers -fcommon
 -fgnu-linker -fregmove -foptimize-register-move -fargument-alias
 -fstrict-aliasing -fmerge-constants -fzero-initialized-in-bss -fident
 -fpeephole2 -fguess-branch-probability -fmath-errno -ftrapping-math
 -m80387 -mhard-float -mno-soft-float -mieee-fp -mfp-ret-in-387
 -maccumulate-outgoing-args -mmmx -m3dnow -msse -mcpu=athlon-xp
 -march=athlon-xp
Back to top
View user's profile Send private message
GoodStuff
n00b
n00b


Joined: 09 Apr 2004
Posts: 15
Location: Belgium

PostPosted: Wed Apr 21, 2004 3:04 pm    Post subject: Reply with quote

I think there is not possible to get more than

-march=athlon-xp -O3 -fomit-frame-pointer -pipe

for your arch, any suggestion other?
Back to top
View user's profile Send private message
yngwin
Retired Dev
Retired Dev


Joined: 19 Dec 2002
Posts: 4572
Location: Suzhou, China

PostPosted: Wed Apr 21, 2004 3:45 pm    Post subject: Reply with quote

There are several threads on this matter, search for them and read! I have the following flags on my Athlon-XP that work well:
Code:
CFLAGS="-march=athlon-xp -mmmx -msse -m3dnow -O2 -pipe -fomit-frame-pointer -finline-functions -falign-jumps=16 -falign-loops=16 -falign-functions=64 -funroll-loops -ftracer -mfpmath=387"

-mfpmath=sse and -O3 can work counter-productive, and some say -mmmx -msse -m3dnow are included in -march=athlon-xp, but I haven't found definitive information on that. -ftracer is for gcc-3.3+ only, but can have really good results.
Back to top
View user's profile Send private message
Nate_S
Guru
Guru


Joined: 18 Mar 2004
Posts: 414

PostPosted: Wed Apr 21, 2004 4:34 pm    Post subject: Reply with quote

Quote:
So you might not want to explicitly set this flag if you want to debug something linked against libs compiled with this and instead let the compiler determine if it's safe to do this.


From what I'd heard, -fomit-frame-pointers always interferes with debugging on the x86, so it is never enabled by default. Setting just -O will not put it in. Also, don't the variables in make.conf just apply to ebuilds? Or is it systemwide? I'd assume that you could compile your system with it and just take it out when compiling programs you're trying to debug.

-Nate
Back to top
View user's profile Send private message
Cossins
Veteran
Veteran


Joined: 21 Mar 2003
Posts: 1136
Location: Copenhagen, Denmark

PostPosted: Wed Apr 21, 2004 5:25 pm    Post subject: Reply with quote

monkeyboy wrote:
I've compiled X and kde with -09 instead of -03 and haven't had any problems with thoes either.

First of all, that's an O (as in the letter) not a zero, as you wrote...
Second, setting it higher than 3 is unnecessary, and will fall back to 3 (the maximum). See the gcc man page if you don't believe me... ;)

- Simon
_________________
who cares
Back to top
View user's profile Send private message
ikaro
Advocate
Advocate


Joined: 14 Jul 2003
Posts: 2526
Location: Denmark

PostPosted: Wed Apr 21, 2004 11:40 pm    Post subject: Reply with quote

taskara wrote:
so without sounding like a complete n00b, and without annoying anyone,

what USE variables, and flags should I have in my make.conf file for best performance out of my system, which is as follows:

amd XP 1800+
asus a7v133 mainboard
512Mb pc150 sdram
40Gb ide barracuda hdd 7200rpm
gf3 ti200
sblive! 5.1
promise ata100 controller (onboard)
realtek network card
USB s400 printer
Pioneer 16X dvd
ricoh 24x cdrw
microsoft internet pro keyboard
logitech dual optical usb mouse
19" lg 995E monitor

if someone can take the time to go through this for me, that would be much much appreciated :D




I got the same CPU, and these are the CFLAGS I use, after I ran the Acovea benchmarks ( search for acovea, and you will find the thread with some ebuilds )

Code:

CFLAGS="-march=athlon-xp -O3 -pipe -fno-cprop-registers -fno-thread-jumps -fno-defer-pop -maccumulate-outgoing-args -fno-if-conversion2 -fno-delayed-branch -fno-crossjumping -fno-merge-constants -fno-omit-frame-pointer -ftracer -finline-limit=600 -minline-all-stringops -mno-push-args -fmove-all-movables -mno-align-stringops"

_________________
linux: #232767
Back to top
View user's profile Send private message
robmoss
Retired Dev
Retired Dev


Joined: 27 May 2003
Posts: 2634
Location: Jesus College, Oxford

PostPosted: Thu Apr 22, 2004 12:06 am    Post subject: Reply with quote

Code:
CFLAGS="-O3 -march=athlon-xp -momit-leaf-frame-pointer -fomit-frame-pointer -funroll-loops -ffast-math -ftracer -fprefetch-loop-arrays -finline-limit=600 -mfpmath=387 -pipe"


That's the set of CFLAGS I have. And I know about these things better than most; I'm forever tinkering with code to see what the best optimization set is (I use Acovea - search on Google for it if you don't know what it is). These are a very good global set for an Athlon XP.

-O3 is consistently faster than -O2 on all architectures in all configurations. People keep trying to decry -O3, but I have consistent evidence that tells me that I'm right and they're wrong. I don't think that -O3 is faster, I know that -O3 is faster (in general). So you should use -O3.

-march=athlon-xp does indeed include -mmmx -m3dnow -msse.

-mfpmath=387 is faster than all others thusfar - sse and sse,387 still don't work properly yet, and generally create slower code. I've yet to see any code which is faster when using sse or sse,387.

-ftracer gives you a bigger speed-up than any other flag. Hopefully, it will be included in -O3 when GCC 3.5 is released.
_________________
Reality is for those who can't face Science Fiction.

emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts
Back to top
View user's profile Send private message
Liff
Tux's lil' helper
Tux's lil' helper


Joined: 08 Oct 2002
Posts: 111

PostPosted: Thu Apr 22, 2004 1:55 am    Post subject: Reply with quote

In my experience, -Os is faster than -O3
_________________
A smoking section in a restaurant is like a urinating section in a swimming pool.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum