Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
CFLAGS Central (Part 1)
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3 ... 26, 27, 28 ... 35, 36, 37  Next  
This topic is locked: you cannot edit posts or make replies.    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
sleek
n00b
n00b


Joined: 09 Jan 2003
Posts: 71

PostPosted: Mon Feb 09, 2004 2:42 am    Post subject: Reply with quote

What would be the best CFLAGS line for my CPU based on the information below:

Code:
craig@sleekdesign craig $ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Celeron (Coppermine)
stepping        : 3
cpu MHz         : 593.202
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 1182.92

_________________
Yesterday was the deadline for all complaints
Back to top
View user's profile Send private message
soaringcondor
Tux's lil' helper
Tux's lil' helper


Joined: 16 Dec 2003
Posts: 103

PostPosted: Mon Feb 09, 2004 6:58 am    Post subject: Reply with quote

Actually -Os will disable a couple of functions that -O2 enabled, other than that they are basically the same. Those functions speed up the program but in doing so increase the file size.
Back to top
View user's profile Send private message
robmoss
Retired Dev
Retired Dev


Joined: 27 May 2003
Posts: 2634
Location: Jesus College, Oxford

PostPosted: Mon Feb 09, 2004 4:50 pm    Post subject: Reply with quote

I'd be interested to see that gimp benchmark run with -O3 and with -O3 -ftracer and would wager that a significant speed boost would result. The -O* optimization levels are correct, and -O2 is still faster than -Os on a P4 in general. Check out Scott Robert Ladd's ACOVEA website. It should be clear to anyone that no amount of trying to actually work out how GCC works will result in a definitive answer even for a simple piece of code as to what will run the fastest; in fact, the interoperation of optimizations is enough to confuse a hardened GCC dev, no matter how much he/she knows about the architecture and operation of GCC. Evolutionary analysis is the only way to go...
_________________
Reality is for those who can't face Science Fiction.

emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts
Back to top
View user's profile Send private message
sapphirecat
Guru
Guru


Joined: 15 Jan 2003
Posts: 376

PostPosted: Mon Feb 09, 2004 6:47 pm    Post subject: Reply with quote

soaringcondor wrote:
Actually -Os will disable a couple of functions that -O2 enabled, other than that they are basically the same. Those functions speed up the program but in doing so increase the file size.


Got a reference? I don't really know my way around the gcc source; I got to toplev.c from the Freshmeat article which calls it topolev.c.
_________________
Former Gentoo user; switched to Kubuntu 7.04 when I got sick of waiting on gcc. Chance of thread necro if you reply now approaching 100%...
Back to top
View user's profile Send private message
rosinante
n00b
n00b


Joined: 26 Dec 2003
Posts: 5

PostPosted: Mon Feb 09, 2004 8:37 pm    Post subject: Reply with quote

Some questions:


Which is the default compiler with gentoo: gcc 3.2.3 of 3.3.2 ?


Is -fpic mandatory for prelinking?


I have a pentium4 celeron with SSE/SSE2/MMX capabilities; does GCC *effectively* passes these options with '-march=pentium4' ?


I have a very small L2 cache (128KB) and limited ram (128MB, shared for video), what would be optimum:
-'Os' with '-mftpmath=sse' and '-funroll-loops'
or
-'O2' with '-mftpmath=sse'


Are '-finline-functions' and '-frename-registers' any good on my low-performance box?


Any pointers are greatly appreciated - this gcc-tweaking is a mess :)
Back to top
View user's profile Send private message
robmoss
Retired Dev
Retired Dev


Joined: 27 May 2003
Posts: 2634
Location: Jesus College, Oxford

PostPosted: Mon Feb 09, 2004 9:14 pm    Post subject: Reply with quote

rosinante wrote:
Some questions:


Which is the default compiler with gentoo: gcc 3.2.3 of 3.3.2 ?


Is -fpic mandatory for prelinking?


I have a pentium4 celeron with SSE/SSE2/MMX capabilities; does GCC *effectively* passes these options with '-march=pentium4' ?


I have a very small L2 cache (128KB) and limited ram (128MB, shared for video), what would be optimum:
-'Os' with '-mfpmath=sse' and '-funroll-loops'
or
-'O2' with '-mfpmath=sse'


Are '-finline-functions' and '-frename-registers' any good on my low-performance box?


Any pointers are greatly appreciated - this gcc-tweaking is a mess :)


People will give you all sorts of silly answers to this, so I'll try and pre-empt them and give you something useful.

The default compiler for Gentoo is now GCC 3.3.2. This shift occurred only a couple of days ago, so there are still some teething troubles.

With regards your CFLAGS - I don't know anything about pre-linking, so I can't help you there.

If you want to work out which CFLAGS work best for your machine, I'd suggest going to this site (NB - site currently down, Google's cache is here) and running some tests on the example source files. I don't know which of the -mmmx -msse -msse2 flags are switched on by your arch flag, but this code will tell you exactly what is:

Code:
touch ~/input.c
gcc -Q -v `emerge info | grep CFLAGS | sed -e s/CFLAGS// | sed -e s/'\"'// | sed -e s/'\"'// | sed -e s/'\='//` -c input.c


Look for the options following "options passed:" for what is and isn't enabled. Don't guess and don't listen to anyone else; this varies from architecture to architecture and from GCC version to GCC version (even between minor versions!) so experiment is the only way to go here.

FWIW, here's the output from mine:

Code:
options passed:  -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=2
 -march=athlon-xp -msse -mmmx -m3dnow -momit-leaf-frame-pointer
 -mfpmath=387 -auxbase -O3 -fomit-frame-pointer -funroll-loops -ffast-math
 -fprefetch-loop-arrays -freduce-all-givs -finline-limit=600
options enabled:  -fdefer-pop -fomit-frame-pointer -foptimize-sibling-calls
 -fcse-follow-jumps -fcse-skip-blocks -fexpensive-optimizations
 -fthread-jumps -fstrength-reduce -funroll-loops -fprefetch-loop-arrays
 -freduce-all-givs -fpeephole -fforce-mem -ffunction-cse
 -fkeep-static-consts -fcaller-saves -fpcc-struct-return -fgcse -fgcse-lm
 -fgcse-sm -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2
 -frerun-cse-after-loop -frerun-loop-opt -fdelete-null-pointer-checks
 -fschedule-insns2 -fsched-interblock -fsched-spec -fbranch-count-reg
 -freorder-blocks -freorder-functions -frename-registers -fcprop-registers
 -fcommon -fgnu-linker -fregmove -foptimize-register-move -fargument-alias
 -fstrict-aliasing -fmerge-constants -fzero-initialized-in-bss -fident
 -fpeephole2 -ffinite-math-only -fguess-branch-probability
 -funsafe-math-optimizations -m80387 -mhard-float -mno-soft-float
 -mfp-ret-in-387 -momit-leaf-frame-pointer -maccumulate-outgoing-args -mmmx
 -m3dnow -msse -mcpu=athlon-xp -mfpmath=387 -march=athlon-xp


Here's my current CFLAGS:

Code:
CFLAGS="-O3 -march=athlon-xp -msse -mmmx -m3dnow -momit-leaf-frame-pointer -fomit-frame-pointer -funroll-loops -ffast-math -fprefetch-loop-arrays -freduce-all-givs -finline-limit=600 -mfpmath=387 -pipe"


Those CFLAGS are probably pretty close to the fastest available as generic optimizations as determined by acovea for my breed of Athlon XP. But since there are such vast differences made even by doing something so trivial as changing the speed of the FSB, if I had a different breed of Athlon XP, my CFLAGS would probably be completely different!

[Note: I should put -ftracer back in my CFLAGS (I have done now, but I'm not going through all that again simply for the sake of correctness) - I was testing GCC 3.2 compatibility with a bit of code, and as such had to take -ftracer, probably the best optimization there is that isn't included in -O3 and much more beneficial in particular than -ffast-math on almost all code, even maths-intensive stuff, temporarily out.]

Have fun experimenting!
_________________
Reality is for those who can't face Science Fiction.

emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts
Back to top
View user's profile Send private message
sapphirecat
Guru
Guru


Joined: 15 Jan 2003
Posts: 376

PostPosted: Tue Feb 10, 2004 5:54 am    Post subject: Reply with quote

robmoss2k wrote:
I'd be interested to see that gimp benchmark run with -O3 and with -O3 -ftracer and would wager that a significant speed boost would result.


Well, I tried. I can't get any difference among any of the combinations today. Not between the configurations I saw before, and not even between GCC 3.3 -O3 -ftracer and the runs with the binary where I accidentally left out -march=athlon-xp on GCC 3.2 -O2. This suggests I've done something wrong today, but I have no idea what it might be. :(

I watched the compilations, and they definitely used the right flags. I did the GCC3.2 builds before upgrading the compiler, and -ftracer worked, so the compilers were right. I've looked at ldd and watched top with full paths. Everything uses the appropriate plugins and libraries. I am going to bed before my head explodes.
_________________
Former Gentoo user; switched to Kubuntu 7.04 when I got sick of waiting on gcc. Chance of thread necro if you reply now approaching 100%...
Back to top
View user's profile Send private message
rosinante
n00b
n00b


Joined: 26 Dec 2003
Posts: 5

PostPosted: Wed Feb 11, 2004 3:24 pm    Post subject: Reply with quote

robmoss2k wrote:

With regards your CFLAGS - I don't know anything about pre-linking, so I can't help you there.

I found out about prelinking; the -fPIC (which is different from -fpic) is only used for shared libraries (the key of prelinking), or, by the hardened-sources for compiling the et_dyn elf binaries.

As far as I found out, all shared libraries are compiled with -fPIC by default. Can anyone confirm this?

robmoss2k wrote:

If you want to work out which CFLAGS work best for your machine, I'd suggest going to this site (NB - site currently down, Google's cache is here) and running some tests on the example source files. I don't know which of the -mmmx -msse -msse2 flags are switched on by your arch flag, but this code will tell you exactly what is:
Code:
touch ~/input.c
gcc -Q -v `emerge info | grep CFLAGS | sed -e s/CFLAGS// | sed -e s/'\"'// | sed -e s/'\"'// | sed -e s/'\='//` -c input.c



Nice code, will try this out - as well as the acovea utility.

I'm currently skimming the intel docs on my processor to see if I can tweak something more, currently I have the following possible combinations:

Code:

   CFLAGS="-march=pentium4 -mcpu=pentium4 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer
           -mmmx -msse -msse2 -mfpmath=sse
           -Os"

   CFLAGS="-march=pentium4 -mcpu=pentium4 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer
           -mmmx -msse -msse2 -mfpmath=sse
           -Os -finline-functions -frename-registers"

   CFLAGS="-march=pentium4 -mcpu=pentium4 -pipe -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer
           -mmmx -msse -msse2 -mfpmath=sse
           -O3 -fprefetch-loop-arrays -funroll-loops"


I also noticed that SSE has a USE flag, so I guess that even compiling with defaults (as in: use 387 for floating point thingies) doesn't give you guarantees on what processes will use effectively.

I could also give up on theoretical research and simply compile several times to see the difference :)
Back to top
View user's profile Send private message
Nutterpc
Tux's lil' helper
Tux's lil' helper


Joined: 02 Feb 2004
Posts: 83

PostPosted: Fri Feb 13, 2004 1:13 am    Post subject: Reply with quote

Good day to one, good day to all 8)

I reckon making the move to Gentoo for me was wise, considering my tweaking background on windows, I thought gentoo would be the best one for me to use :wink: .......as that's what it's all about, tweaking your linux distro to how you like it

Anywayz, my CFLAGS I use are as follows

-march=athlon-xp -O3 -fforce-addr -fomit-frame-pointer -foptimize-sibling-calls -fthread-jumps -fgcse-lm -fgcse-sm -frename-registers -mmmx -m3dnow -msse -mfpmath=387 -ffast-math -fmerge-constants -fnocprop-registers --param max-gcse-memory=512

I've had no probs with this string, me own one....and on a Barton core XP2500 it absolutely hoons :twisted: , plus having a Gb of DDR400 Kingmax in Dual Channel might also have something to do with it, heeheeheeheehee :lol:

Will modify it as I see more ways to get speed outta gentoo

Nutterpc
_________________
If it isn't broke, you ain't tweaked it right
Registered Linux User 353232
Back to top
View user's profile Send private message
nmcsween
Guru
Guru


Joined: 12 Nov 2003
Posts: 381

PostPosted: Mon Feb 16, 2004 9:30 am    Post subject: Reply with quote

Alright Im not a gcc developer but wouldn't it be better to -falign-functions to equal 64? the L1 cache on the athlon xp is 64kb and since most code will be exchanged in maybe 1 or 2 cycles with the L2 cache so why go with 16?
Back to top
View user's profile Send private message
tapted
Tux's lil' helper
Tux's lil' helper


Joined: 02 Dec 2003
Posts: 122
Location: Sydney, Australia

PostPosted: Wed Feb 18, 2004 10:12 pm    Post subject: Reply with quote

Ultraoctane.com wrote:
Alright Im not a gcc developer but wouldn't it be better to -falign-functions to equal 64? the L1 cache on the athlon xp is 64kb and since most code will be exchanged in maybe 1 or 2 cycles with the L2 cache so why go with 16?


... I still don't see the point of -falign-*

64 is silly though. Anything above the size of a 'word' (32 or 64 _bits_ today -- or 4-8 bytes) doesn't make sense at all -- it's just a waste of cache. -falign-* uses _bytes_ [not bits or kB].

So maybe use 4 or 8 ... but why does having things word-aligned help?

If you're optimising for size, set it to 1, so you can save a few bytes here and there.

On RISC processors, all instructions are word-aligned anyway, so meh.

On x86, they don't _need_ to be, but why does making it word-aligned help on a processor that doesn't really care about having things word-aligned?

Anyone?
Back to top
View user's profile Send private message
robmoss
Retired Dev
Retired Dev


Joined: 27 May 2003
Posts: 2634
Location: Jesus College, Oxford

PostPosted: Thu Feb 19, 2004 12:07 am    Post subject: Reply with quote

A brief attack on the acovea sources and a recompile suggests precisely no difference whatsoever with -falign-*=2^n, n integer, n <= 6, * functions / loops / jumps, except for a performance hit when they're all set equal to 1.

So set them to whatever the hell you want, it makes no difference.
_________________
Reality is for those who can't face Science Fiction.

emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts
Back to top
View user's profile Send private message
tapted
Tux's lil' helper
Tux's lil' helper


Joined: 02 Dec 2003
Posts: 122
Location: Sydney, Australia

PostPosted: Thu Feb 19, 2004 3:34 am    Post subject: Reply with quote

Nutterpc wrote:
Good day to one, good day to all 8)
Anywayz, my CFLAGS I use are as follows

-march=athlon-xp -O3 -fforce-addr -fomit-frame-pointer -foptimize-sibling-calls -fthread-jumps -fgcse-lm -fgcse-sm -frename-registers -mmmx -m3dnow -msse -mfpmath=387 -ffast-math -fmerge-constants -fnocprop-registers --param max-gcse-memory=512

Nutterpc


Although gcc doesn't care, some of these are redundant with -O3. This may seem anal, but just so nobody thinks "hey, I didn't use -fsomethingorother. I should recompile my system", here are the redundancies:

-fmerge-constants comes in with -O
-fthread-jumps is meant to as well, if it works on your architecture.

-foptimize-sibling-calls comes in with -O2

-frename-registers comes in with -O3

mmx, 3dnow, sse should come in with -march, IF gcc can use them on your architecture. AFAIK, the get automatically disabled if you try on anything but P4/Athlon64 because only they support 64-bit floats.

The following are new to me, but docs say they could be useful:

-fgcse-lm
-fgcse-sm
-fnocprop-registers

Anyone got benchmarks for these?

The rest have been looked at pretty thoroughly and are good to use.
Back to top
View user's profile Send private message
robmoss
Retired Dev
Retired Dev


Joined: 27 May 2003
Posts: 2634
Location: Jesus College, Oxford

PostPosted: Fri Feb 20, 2004 4:49 pm    Post subject: Reply with quote

Code:
touch ~/input.c
gcc -Q -v `emerge info | grep CFLAGS | sed -e s/CFLAGS// | sed -e s/'\"'// | sed -e s/'\"'// | sed -e s/'\='//` -c input.c


Don't forget to use that horrible little bit of code (I'm sure it can be made much neater, I was just trying to be quick) to check what your CFLAGS are actually implying when you change them from one set to another. If there's no difference in what gcc actually uses, then there's no point recompiling everything...
_________________
Reality is for those who can't face Science Fiction.

emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts
Back to top
View user's profile Send private message
C1REX
l33t
l33t


Joined: 02 Jan 2004
Posts: 774
Location: Poland/UK

PostPosted: Mon Feb 23, 2004 12:24 am    Post subject: Reply with quote

Hi
Here In Poland there's a bit different fashion as far as flags are concerned, and you can come across a bit different solutions. For instance I'll give mine for Duron 800. They look a bit exotic here, but in Poland they can be regarded as pretty standard. It isn't very popular to meet O3. Os and O2 mostly.
Code:

CFLAGS="-O2 -march=athlon-tbird -mcpu=athlon-tbird -falign-loops -ffast-math -frename-registers -funroll-all-loops -funroll-loops -pipe -fomit-frame-pointer ${LDFLAGS} -DNDEBUG -DG_DISABLE_ASSERT -DG_DISABLE_CHECKS -DG_DISABLE_CAST_CHECKS"

LDFLAGS="-s -z comreloc"

CXXFLAGS="${CFLAGS}"



Greetings from Poland
_________________
CLICK HERE to help move gentoo up on distrowatch.

If you like Gentoo you can thank devs here - https://www.gentoo.org/donate/
Back to top
View user's profile Send private message
t0mcat
Tux's lil' helper
Tux's lil' helper


Joined: 12 Feb 2004
Posts: 111
Location: Catania, Italy

PostPosted: Mon Feb 23, 2004 6:28 pm    Post subject: Reply with quote

lo all,
i'm a noob to the penguin os, i've recently installed gentoo and made the bootstrap with CFLAGS="-march=athlon-xp -O2 -pipe".

i've read so many stuff about CFLAGS that i'm really confused, btw after many hours spent reading everywhere, now i'm about to "emerge -e world" with this:

CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer -ffast-math -fmove-all-movables -funroll-loops -fprefetch-loop-arrays -fforce-addr -mmmx -msse -m3dnow -mfpmath=387 -frename-registers -maccumulate-outgoing-args"

i've got an athlon-xp barton 2500@3200.

hope everything will be ok.

btw i've got a n00b question about recompiling the whole mess:
can i "emerge -e world" from an X terminal (gnome)? i'd be quite bored having the pc occupied for more than a whole day, so i'd like to surf or chat while the cpu does such a huge work. Or shall i do it necessary from the console after a simple boot, without loading any D.E.?

ta for any reply.
_________________
il gattaccio
a.k.a etienne
Back to top
View user's profile Send private message
AnonimoVeneziano
n00b
n00b


Joined: 25 May 2003
Posts: 65

PostPosted: Mon Feb 23, 2004 8:20 pm    Post subject: Tip: Styles can be applied quickly to selected text. Reply with quote

Hi all :)

I'm just installing Gentoo Now , and I'm doing the bootstrap in this moment (I'm posting from Links :P )

Anyway , I've compiled with :

"-O3 -mcpu=athlon-xp -march=athlon-xp -fomit-frame-pointer -momit-leaf-frame-pointer -fprefetch-loop-arrays -mfpmath=sse -fforce-addr -pipe"

I would also add "-ftracer" , but gcc 3.2.3 (that is in the base CD) doesn't support this option and I'll add that after bootstrapping gcc 3.3.2 .

I've added -mcpu=athlon-xp because I've heard that some ebuilds filter the -march option ,is this true? Anyway -mcpu shouldn't change the binary created by the compilation with only "-march" , because "-mcpu" is a subset of "-march" , that's right?

Thanks

Bye

Marcello
Back to top
View user's profile Send private message
nmcsween
Guru
Guru


Joined: 12 Nov 2003
Posts: 381

PostPosted: Tue Feb 24, 2004 1:14 am    Post subject: Reply with quote

I don't see the use of putting -mcpu=* since that is exactly the same as the i-686 setting in CHOST and -march=* turns on that flag as well so it would just be a waste of typing.
Back to top
View user's profile Send private message
seppe
Guru
Guru


Joined: 01 Sep 2003
Posts: 431
Location: Hove, Antwerp, Belgium

PostPosted: Tue Feb 24, 2004 5:06 pm    Post subject: Reply with quote

Hi, I have a pentium 3 800Mhz, and I tried these CFLAGS:

CFLAGS="-O3 -march=pentium3 -fprefetch-loop-arrays -funroll-loops -pipe -fomit-frame-pointer -fforce-addr -fmove-all-movables"

and it's still compiling after a 'emerge -e world'. I tried the same CFLAG first with -freduce-all-givs (which gains 35% of speed at a pentium3, I read here) but I get an error when it tries to compile python2.3 (parallel make error, or something).

Now my questions:
1) Do I gain much speed with my current CFLAG? (I had the standard 'safe' pentium3 CFLAG first which was suggested at the freehackers site)
2) How can I compile everything with the -freduce-all-givs? I mean: when it fails at python2.3 during a 'emerge -e world', can I compile python only without the -freduce-all-givs flag and continue my 'emerge -e world' without starting all over again? If so, how do I do this?
3) Will -mfpmath=sse,387 gain much speed? And is it safe? Do you suggest me to use it, or is the increase of speed marginal to risk this?
4) How can I use the acovea program with gcc 3.3.3? I get always 'compilation failed' and a segmentation fault when I should get my 'best' CFLAGS, and can I use acovea with 'emerge -e world'?

Thanks for replying in advance, but keep it simple ... I'm a CFLAG n00b ;)
_________________
nitro-sources, because between stable and experimental there exists only speed

Latest release I made: 2.6.13.2-nitro1
Back to top
View user's profile Send private message
Malakai
Apprentice
Apprentice


Joined: 24 Dec 2002
Posts: 299

PostPosted: Tue Feb 24, 2004 10:40 pm    Post subject: Reply with quote

[quote="robmoss2k"]
rosinante wrote:


Code:
CFLAGS="-O3 -march=athlon-xp -msse -mmmx -m3dnow -momit-leaf-frame-pointer -fomit-frame-pointer -funroll-loops -ffast-math -fprefetch-loop-arrays -freduce-all-givs -finline-limit=600 -mfpmath=387 -pipe"



Quite a few of those settings are redundant.

-march=athlon-xp includes -msse -mmmx -m3dnow.

I don't exactly know how -momit-leaf-frame-pointer works, but afaik -fomit-frame-pointer gets rid of the framepointer all together, whereas the leaf one only disables it some of the time. You *shouldn't* need both.

Also, some of the other options you have manually specified are already implied by -O3 in the current gentoo version of GCC.

Just a little fyi, I'm guilty of redundancy in my cflags as well ^_^
Back to top
View user's profile Send private message
Penguin_Biker
n00b
n00b


Joined: 25 Sep 2003
Posts: 30
Location: Portage michigan USA

PostPosted: Wed Feb 25, 2004 2:31 am    Post subject: Reply with quote

Athlon XP 1700+

i'm also having a problem with -freduce-all-givs

here's my CFLAGS

Code:
CFLAGS="-O3 -march=athlon-xp -pipe -fprefetch-loop-arrays -fomit-frame-pointer -fforce-addr -freduce-all-givs -funroll-loops -maccumulate-outgoing-args -ffast-math"


How much of a speed gain does -freduce-all-givs give on an athlon-xp? (or if at all)

here's the error i get
Code:
!!! ERROR: dev-lang/python-2.3.3 failed.
!!! Function src_compile, Line 124, Exitcode 2
!!! Parallel make failed


i'm doing emerge system now without it, is there a way to fix this or should i just forget it?

EDIT: I had made a note to remove it earlier that i had forgotten about, so my problem is fixed

currently....
Code:
CFLAGS="-O3 -march=athlon-xp -pipe -fprefetch-loop-arrays -fomit-frame-pointer -fforce-addr -funroll-loops -maccumulate-outgoing-args -ffast-math"


and compiling
_________________
A computer without a Microsoft operating system is like a dog without bricks tied to its head.

Registered linux user: #328010
Back to top
View user's profile Send private message
tapted
Tux's lil' helper
Tux's lil' helper


Joined: 02 Dec 2003
Posts: 122
Location: Sydney, Australia

PostPosted: Sat Feb 28, 2004 8:04 am    Post subject: Reply with quote

C1REX wrote:
Hi
Here In Poland there's a bit different fashion as far as flags are concerned, and you can come across a bit different solutions. For instance I'll give mine for Duron 800. They look a bit exotic here, but in Poland they can be regarded as pretty standard. It isn't very popular to meet O3. Os and O2 mostly.
Code:

CFLAGS="-O2 -march=athlon-tbird -mcpu=athlon-tbird -falign-loops -ffast-math -frename-registers -funroll-all-loops -funroll-loops -pipe -fomit-frame-pointer ${LDFLAGS} -DNDEBUG -DG_DISABLE_ASSERT -DG_DISABLE_CHECKS -DG_DISABLE_CAST_CHECKS"

LDFLAGS="-s -z comreloc"

CXXFLAGS="${CFLAGS}"



Greetings from Poland


In my experiments -frename-registers was the ONLY flag that -O3 had that -O2 didn't.

https://forums.gentoo.org/viewtopic.php?t=5717&postdays=0&postorder=asc&start=625#793905

So if you use -O3 there would be precisely no difference, but you can drop a redundant flag.

Also every source I've seen (and my own benchmarks on a raytracer I wrote) advise strongly AGAINST using -funroll-all-loops. It actually slows code down. The reason is this: to figure out where to jump into the unrolled loop, it needs to do a modulus operation on a number only known at runtime. A modulus operation takes 30 clock cycles on a MIPS (where an adddition takes 1) .. on x86 it would be similarly bad. The indeterminate jump also means that the unrolled loop can't be pipelined*. So, unless your _unrolled_ loop executes through lots of times, your code will actually be slower.

-funroll-loops will only unroll loops where the number of iterations is known at compile time, and the division is easy (like a power of two, or a power of two equivalent after an addition**). So it saves time.

Note that the compiler can't really tell the difference between a loop with a high average number of iterations and a low average number of iterations, when the number is only known at runtime. Most loops will be the latter (think library functions), so the net result is slower code because -funroll-all-loops unrolls them all.

Everyone else, look here:

https://forums.gentoo.org/viewtopic.php?t=5717&postdays=0&postorder=asc&start=625#793905

first, for a thread summary. I might do the rest of the thread later.

Footnotes:

* A pipeline panic is usually a hit of about 10 clock cycles.
** Compilers are smart, and know that things like

x * 31 = x * 32 -x
= x << 5 - x

and can save ~8 clock cycles. There's prolly an equivalent thing for modulus.

Moo.
Back to top
View user's profile Send private message
tapted
Tux's lil' helper
Tux's lil' helper


Joined: 02 Dec 2003
Posts: 122
Location: Sydney, Australia

PostPosted: Sat Feb 28, 2004 8:10 am    Post subject: Reply with quote

Penguin_Biker wrote:
Athlon XP 1700+

i'm also having a problem with -freduce-all-givs

here's my CFLAGS

Code:
CFLAGS="-O3 -march=athlon-xp -pipe -fprefetch-loop-arrays -fomit-frame-pointer -fforce-addr -freduce-all-givs -funroll-loops -maccumulate-outgoing-args -ffast-math"


How much of a speed gain does -freduce-all-givs give on an athlon-xp? (or if at all)


It's experimental. If the gcc developers find it to be advantageous (or bits of it) then it will be incorporated into the loop optimiser in a future version.

Currently, whether there's a performance boost at all depends on the code. Theoretically, however, it should always be an improvement. It's things like caching and pipelining that stuff things up. However, enough people use it to make me think that, on average, there is a not insignificant performance boost.

Penguin_Biker wrote:


here's the error i get
Code:
!!! ERROR: dev-lang/python-2.3.3 failed.
!!! Function src_compile, Line 124, Exitcode 2
!!! Parallel make failed


i'm doing emerge system now without it, is there a way to fix this or should i just forget it?


Python has known problems with -freduce-all-givs.

Remove it from your CFLAGS before emerging python, peeps.

perl is also sensitive to some CFLAGS.

Most other things are fine.

Penguin_Biker wrote:


EDIT: I had made a note to remove it earlier that i had forgotten about, so my problem is fixed

currently....
Code:
CFLAGS="-O3 -march=athlon-xp -pipe -fprefetch-loop-arrays -fomit-frame-pointer -fforce-addr -funroll-loops -maccumulate-outgoing-args -ffast-math"


and compiling
Back to top
View user's profile Send private message
spaetz
n00b
n00b


Joined: 15 Dec 2003
Posts: 16

PostPosted: Tue Mar 02, 2004 7:02 am    Post subject: Reply with quote

robmoss2k wrote:
Code:
touch ~/input.c
gcc -Q -v `emerge info | grep CFLAGS | sed -e s/CFLAGS// | sed -e s/'\"'// | sed -e s/'\"'// | sed -e s/'\='//` -c input.c


How about this :) ?
Code:
touch ~/input.c
gcc -Q -v `emerge info | grep CFLAGS | cut -d '"' -f2` -c input.c
Back to top
View user's profile Send private message
til
n00b
n00b


Joined: 02 Mar 2004
Posts: 7

PostPosted: Tue Mar 02, 2004 9:24 am    Post subject: Reply with quote

I also need some help, cause i habe the same problem, like others - but I thought my CFLAGS were just optimized for my system (Athlon XP 2200+). But anyway my gentoo crashes after compiling for about 3 hours.

For your help, my cflags:
Code:
CHOST="i686-pc-linux-gnu"
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer"


I don't see there any error in my config - do you?

And then, just another question by a newbie:
I got several problems with my keymap (need a german keyboard with umlauts) - first I chose
Code:
KEYMAP="de"
but this one was not determined to work (I had a mixture of an english an german keyboard - z and y where right placed, but on the umalut buttons I had the American layout) , so I changed the keymap to:
Code:
KEYMAP="de-latin1-nodeadkeys"
and now I got umlauts like ä, ö, ü - but I don't have the Alt Gr buttons, like @ or &euro;
I think, I should post this question in another topic and search for answers, but I thought, perhaps sb. could help me so I needn't make effort (I'm lazy :P).
Back to top
View user's profile Send private message
Display posts from previous:   
This topic is locked: you cannot edit posts or make replies.    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page Previous  1, 2, 3 ... 26, 27, 28 ... 35, 36, 37  Next
Page 27 of 37

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum