Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
CFLAGS and CXXFLAGS for Duron?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
Rasputin
Tux's lil' helper
Tux's lil' helper


Joined: 10 Dec 2003
Posts: 109
Location: Ukraine

PostPosted: Wed Jan 14, 2004 4:22 pm    Post subject: CFLAGS and CXXFLAGS for Duron? Reply with quote

Hi 2 all!
A have system with a CPU - Duron 1100! Somebody help me with a CFLAGS and CXXFLAGS?
What kind of type ma CPU?
_________________
Many receive advice, only the wise profit by it.
Back to top
View user's profile Send private message
TheCoop
Veteran
Veteran


Joined: 15 Jun 2002
Posts: 1814
Location: Where you least expect it

PostPosted: Wed Jan 14, 2004 4:30 pm    Post subject: Reply with quote

'-march=athlon -Os -pipe -fomit-frame-pointer' should work (dont want -O3, duron has a small register size. the smaller the better with an old processor)
_________________
95% of all computer errors occur between chair and keyboard (TM)

"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler

Change the world - move a rock
Back to top
View user's profile Send private message
Rasputin
Tux's lil' helper
Tux's lil' helper


Joined: 10 Dec 2003
Posts: 109
Location: Ukraine

PostPosted: Wed Jan 14, 2004 4:37 pm    Post subject: Reply with quote

TheCoop wrote:
'-march=athlon -Os -pipe -fomit-frame-pointer' should work (dont want -O3, duron has a small register size. the smaller the better with an old processor)

Sorry -Os - it's s not 3???
_________________
Many receive advice, only the wise profit by it.
Back to top
View user's profile Send private message
TheCoop
Veteran
Veteran


Joined: 15 Jun 2002
Posts: 1814
Location: Where you least expect it

PostPosted: Wed Jan 14, 2004 4:39 pm    Post subject: Reply with quote

yup - optimize for size
_________________
95% of all computer errors occur between chair and keyboard (TM)

"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler

Change the world - move a rock
Back to top
View user's profile Send private message
Rasputin
Tux's lil' helper
Tux's lil' helper


Joined: 10 Dec 2003
Posts: 109
Location: Ukraine

PostPosted: Wed Jan 14, 2004 4:45 pm    Post subject: Reply with quote

Thank's!
And what about other flag's? It is necessary to me?
_________________
Many receive advice, only the wise profit by it.
Back to top
View user's profile Send private message
TheCoop
Veteran
Veteran


Joined: 15 Jun 2002
Posts: 1814
Location: Where you least expect it

PostPosted: Wed Jan 14, 2004 4:56 pm    Post subject: Reply with quote

-pipe just speeds up the actual compilation itself, -fomit-frame-pointer frees up a cpu register. for full options, have a look at the gcc manpage.

fyi, my cflags are:
CFLAGS="-march=athlon-xp -Os -pipe -fomit-frame-pointer -ffast-math -fforce-addr -ftracer -fstack-protector -mfpmath=sse,387"
CXXFLAGS="${CFLAGS}"
_________________
95% of all computer errors occur between chair and keyboard (TM)

"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler

Change the world - move a rock
Back to top
View user's profile Send private message
Rasputin
Tux's lil' helper
Tux's lil' helper


Joined: 10 Dec 2003
Posts: 109
Location: Ukraine

PostPosted: Wed Jan 14, 2004 4:58 pm    Post subject: Reply with quote

Ok! Thaks... very march! :D
_________________
Many receive advice, only the wise profit by it.
Back to top
View user's profile Send private message
robmoss
Retired Dev
Retired Dev


Joined: 27 May 2003
Posts: 2634
Location: Jesus College, Oxford

PostPosted: Wed Jan 14, 2004 5:44 pm    Post subject: Reply with quote

-O3 still generates faster code on a Duron than -Os. If you don't believe me, try running acovea on one - there's generally between 2% and 25% speed improvement using -O3 over -Os.
_________________
Reality is for those who can't face Science Fiction.

emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts
Back to top
View user's profile Send private message
robmoss
Retired Dev
Retired Dev


Joined: 27 May 2003
Posts: 2634
Location: Jesus College, Oxford

PostPosted: Wed Jan 14, 2004 5:46 pm    Post subject: Reply with quote

Oh, and my Duron uses the following CFLAGS/CXXFLAGS:

Code:
CFLAGS="-O3 -march=athlon -momit-leaf-frame-pointer -fomit-frame-pointer -funroll-loops -ftracer -ffast-math -fprefetch-loop-arrays -freduce-all-givs -finline-limit=600 -mfpmath=387 -pipe"
CXXFLAGS=${CFLAGS}


If you want to know what these are, try:

Code:
info gcc


and then go to Invoking GCC; Optimization.
_________________
Reality is for those who can't face Science Fiction.

emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts
Back to top
View user's profile Send private message
Carlo
Developer
Developer


Joined: 12 Aug 2002
Posts: 3356

PostPosted: Wed Jan 14, 2004 6:30 pm    Post subject: Reply with quote

@robmoss2k: Unrolling loops with a small cache like the Duron has, isn't a good idea.


Carlo
_________________
Please make sure that you have searched for an answer to a question after reading all the relevant docs.
Back to top
View user's profile Send private message
sindre
Guru
Guru


Joined: 01 Nov 2002
Posts: 315
Location: Norway

PostPosted: Wed Jan 14, 2004 8:45 pm    Post subject: Reply with quote

I would be sceptic to the use of -mfpmath=sse,387. First of all I don't think that duron supports sse (if it did I think athlon-xp would be the appropiate arch). Second all benchmarks I've seen (sorry for not having links), shows a performance decrease with anything but the default -mfpmath=387 on any processor.
Back to top
View user's profile Send private message
TheCoop
Veteran
Veteran


Joined: 15 Jun 2002
Posts: 1814
Location: Where you least expect it

PostPosted: Wed Jan 14, 2004 8:56 pm    Post subject: Reply with quote

durons dont even have a sse register set to use

from what ive seen, sse,387 increases performance a little (and it should, since it doubles the registers used)
_________________
95% of all computer errors occur between chair and keyboard (TM)

"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler

Change the world - move a rock
Back to top
View user's profile Send private message
robmoss
Retired Dev
Retired Dev


Joined: 27 May 2003
Posts: 2634
Location: Jesus College, Oxford

PostPosted: Wed Jan 14, 2004 11:14 pm    Post subject: Reply with quote

Carlo wrote:
@robmoss2k: Unrolling loops with a small cache like the Duron has, isn't a good idea.


Wrong! This is typical GCC "conventional wisdom". I used to think the same myself, but it turns out that it's wrong. The same goes for -mfpmath=sse,387. Using -funroll-all-loops causes a performance hit on a Duron, but -funroll-loops gives a pretty large performance increase. -mfpmath=sse,387 uses more registers but actually produces code which runs slower.

Remember - I'm not getting my flags from conventional (and incorrect) wisdom, I'm getting them by experimentation and benchmarking. I've used acovea on a LOT of the important files in a LOT of different packages to determine what the best CFLAGS/CXXFLAGS are to use on those particular packages. Whilst the results vary between a Duron and an Athlon XP, they don't vary that much - both suggested that -funroll-loops is always a good idea.

I'm not going to try and understand what the code does. That would be insane. Nor am I going to try and understand how the 27 quadrillion different possible combinations of the basic flags interact. All I'm telling you is what works.
_________________
Reality is for those who can't face Science Fiction.

emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts
Back to top
View user's profile Send private message
syscrash
Guru
Guru


Joined: 14 Apr 2003
Posts: 541

PostPosted: Thu Jan 15, 2004 12:34 am    Post subject: Reply with quote

Morgan and Appalbred durons DO have sse.
Code:
syscrash2k@epsilon syscrash2k $ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 7
model name      : AMD Duron(tm) Processor
stepping        : 1
cpu MHz         : 1313.741
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 2572.28


Here's what I use:
Code:
CFLAGS="-march=athlon-xp -O3 -pipe"

:wink:
Back to top
View user's profile Send private message
Rasputin
Tux's lil' helper
Tux's lil' helper


Joined: 10 Dec 2003
Posts: 109
Location: Ukraine

PostPosted: Thu Jan 15, 2004 3:21 pm    Post subject: Reply with quote

What is register supported by Duron 1000?
SSE?
MMX?
3DNOW?

Please help me :?
_________________
Many receive advice, only the wise profit by it.
Back to top
View user's profile Send private message
TheCoop
Veteran
Veteran


Joined: 15 Jun 2002
Posts: 1814
Location: Where you least expect it

PostPosted: Thu Jan 15, 2004 6:41 pm    Post subject: Reply with quote

cat /proc/cpuinfo and have a look at the 'flags' line
_________________
95% of all computer errors occur between chair and keyboard (TM)

"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler

Change the world - move a rock
Back to top
View user's profile Send private message
Target
Apprentice
Apprentice


Joined: 25 Apr 2002
Posts: 200

PostPosted: Sun Jan 25, 2004 9:00 am    Post subject: Reply with quote

-funroll-loops should almost always improve speed. Remember, you're reclaiming a register (counter) and dropping lots of JMP instructions. Even a Duron's cache is enormous compared to the majority of loops this flag will unroll.

-funroll-all-loops is just silly because it unrolls loops where you don't know the number of iterations. As you can imagine, this means there's no counter to relcaim, and the jumps are replaced with even bigger code to test for terminating conditions & keep the "unrolled" loop looping... That sort of defeats the purpose.

Oh, a note on some flags to avoid redundancy or counterproductive settings:

-O activates:
-fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers
(it'll also activate -fomit-frame-pointer if and only if the arch doesn't need it for debugging. x86 does need it)

-O2 activates -O plus:
-fforce-mem -foptimize-sibling-calls -fstrength-reduce -fcse-follow-jumps -fcse-skip-blocks -frerun-cse-after-loop -frerun-loop-opt -fgcse -fgcse-lm -fgcse-sm -fgcse-las -fdelete-null-pointer-checks -fexpensive-optimizations -fregmove -fschedule-insns -fschedule-insns2 -fsched-interblock -fsched-spec -fcaller-saves -fpeephole2 -freorder-blocks -freorder-functions -fstrict-aliasing -funit-at-a-time -falign-functions -falign-jumps -falign-loops -falign-labels -fcrossjumping

-O3 activates -O2 plus:
-finline-functions -fweb (new, very nice register optimizations) -frename-registers

-Os activates -O2 minus:
-falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -fprefetch-loop-arrays
(which all increase size)

Some useful optimizations not covered by -O flags are:
-ffast-math (not covered because it violates IEEE fp standards, but results in a big speedup)
-funroll-loops (not covered because it makes binaries huge. It also forces -frerun-cse-after-loop, no matter the -O flag you're using)
-ftracer (new, helps the other optimizations work better)
-funswitch-loops (new, unrolls and copies branch logic in loops so you have two unrolled copies of any branch, one for each outcome. this can potentially make your binaries huge and the performance gain is questionable)
-fpeel-loops (complete unroll of tiny loops, size for performance tradeoff)
-fmove-all-movables (less severe version of -funswitch-loops, will move all non-branching sections of loops out of the loops, unrolled)
-fforce-addr (forces pointer arithmetic into registers. compliments -fforce-mem. may clobber better optimizations on procs with few registers)

Some architectures imply flags:

-march=athlon implies -mmmx and -m3dnow, and will work for all Durons.
-march=athlon-xp implies -mmmx, -m3dnow and -msse, and will work for newer Durons with sse (check /proc/cpuinfo).

-mfpmath defaults to 387 on x86, and you should probably leave it that way except for apps that specifically use sse instructions. sse is surprisingly bad as a general-purpose fp unit (at least as gcc currently builds), and unfortunately gcc isn't very good yet at picking the best combination of registers to use when you try to use both (sse,387). Using both also risks instability.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum