View previous topic :: View next topic |
Author |
Message |
zinion Guru
Joined: 27 Oct 2004 Posts: 541 Location: Ruhgebietshausen
|
Posted: Fri Nov 26, 2004 9:29 am Post subject: [gcc 3.4] AMD's Recommended CFLAGS |
|
|
Code: |
CFLAGS="-O3 -march=athlon64 -ffast-math -funroll-all-loops -funit-at-a-time -fpeel-loops -ftracer -funswitch-loops -fomit-frame-pointer -pipe"
|
AMD (2004): Compiler Usage Guidelines for 64-bit Operating Systems on AMD64 Platforms (online)(16.11.2004) http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/32035.pdf
(Page 29)
Maybe somebody can make this sticky? _________________ Es ist schön und warm
hier im Gentoo-Land |
|
Back to top |
|
|
AndersH n00b
Joined: 15 Oct 2003 Posts: 3 Location: Göteborg, Sweden
|
Posted: Fri Nov 26, 2004 10:04 am Post subject: |
|
|
Looks interesting, but isn't -ffast-math dangerous in certain situations, or am I misremembering something? Does anyone dare to try these flags and tell us the results? |
|
Back to top |
|
|
LordArthas Guru
Joined: 01 Nov 2004 Posts: 500 Location: Maniago, Friûl, Italia
|
Posted: Fri Nov 26, 2004 10:29 am Post subject: |
|
|
Hi!
AndersH wrote: | Looks interesting, but isn't -ffast-math dangerous in certain situations, or am I misremembering something? Does anyone dare to try these flags and tell us the results? |
I jest update my /etc/make.conf with these flag, I'll take a look at what happens as soon as I emerge something.
Michele. |
|
Back to top |
|
|
zinion Guru
Joined: 27 Oct 2004 Posts: 541 Location: Ruhgebietshausen
|
Posted: Fri Nov 26, 2004 11:01 am Post subject: |
|
|
Since I use 3.4, I have this CFLAGS in my make.conf and everything runs fine. I recompiled my system (emerge -e worl)and there were one or two GNOME-things that didn't compile but I don't remember which and I don't know if the CFLAGS are the reason. Because I use KDE, I didn't investigate any further and used emerge --resume --skipfirst _________________ Es ist schön und warm
hier im Gentoo-Land |
|
Back to top |
|
|
barry Apprentice
Joined: 01 May 2002 Posts: 170 Location: UK
|
Posted: Fri Nov 26, 2004 5:14 pm Post subject: |
|
|
Those flags may be recommended for optimal performance, but they're likely to break some packages. It'd be safer to stick to -fomit-frame-pointer -march=k8 -O2 -pipe, and perhaps -frename-registers, -fweb and -funit-at-a-time. |
|
Back to top |
|
|
get sirius Guru
Joined: 27 Apr 2002 Posts: 316 Location: Madison, WI
|
Posted: Fri Nov 26, 2004 6:06 pm Post subject: |
|
|
"-funit-at-a-time" is included in the -O2 optimizations and, by extension, in the -O3 optimizations as well. I've used -ffast-math in my u/p box and it sure made Setiathome fly .
EDIT: And -frename-registers and -fweb are included in the O3 optimizations.
Last edited by get sirius on Fri Nov 26, 2004 7:50 pm; edited 1 time in total |
|
Back to top |
|
|
barry Apprentice
Joined: 01 May 2002 Posts: 170 Location: UK
|
Posted: Fri Nov 26, 2004 6:43 pm Post subject: |
|
|
I didn't realise -funit-at-a-time was included in -O2. -ffast-math will give good boosts for certain packages, but will probably break others.
A lot of ebuilds will enable -ffast-math if it'll help anyway, so I'd recommend leaving it out of /etc/make.conf. |
|
Back to top |
|
|
toofastforyahuh Apprentice
Joined: 18 May 2004 Posts: 164
|
Posted: Sat Nov 27, 2004 8:02 am Post subject: |
|
|
Is it even necessary to use -fomit-frame-pointer?
As per the kernel changelogs:
http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.9
Quote: |
[PATCH] x86_64: remove CONFIG_FRAME_POINTER
CONFIG_FRAME_POINTER has never worked on x86-64 because it never passed
-fno-omit-frame-pointer to the compiler, and that is the only way to get a
frame pointer on x86-64.
|
|
|
Back to top |
|
|
AnXa Apprentice
Joined: 06 Apr 2004 Posts: 250
|
Posted: Sun Nov 28, 2004 11:14 am Post subject: |
|
|
I got working machine when compiling with those options. It's fast stable and anything you can expect from 64bit OS. _________________ The idea isn't about how do you see or hear it, it's about how do you experience it... |
|
Back to top |
|
|
GentooBox Veteran
Joined: 22 Jun 2003 Posts: 1168 Location: Denmark
|
Posted: Sun Nov 28, 2004 2:10 pm Post subject: |
|
|
can someone please post the cflags from the pdf file. _________________ Encrypt, lock up everything and duct tape the rest |
|
Back to top |
|
|
barry Apprentice
Joined: 01 May 2002 Posts: 170 Location: UK
|
Posted: Sun Nov 28, 2004 8:57 pm Post subject: |
|
|
Because something appears to be stable, doesn't necessarily mean it is. There may be subtle bugs introduced by using aggressive optimisations. It's best to optimise packages individually with options like these.
There's also no doubt that -funroll-all-loops will cause performance degredation with some packages. |
|
Back to top |
|
|
>Octoploid< n00b
Joined: 27 Jun 2004 Posts: 57
|
Posted: Sun Nov 28, 2004 10:38 pm Post subject: |
|
|
It's stupid to unroll loops and do software pipelining on
a massive out of order processor, such as the Athlon64.
You will gain nothing and ,due to unrolling, your instruction
cache will miss more often (because you increase the amount
of code) making your code run slower.
Let the hardware make the decisions. It is faster and more
competent than the compiler...
BTW the AMD guideline is for for micro kernels and benchmarks,
not for general purpose programs. |
|
Back to top |
|
|
toofastforyahuh Apprentice
Joined: 18 May 2004 Posts: 164
|
Posted: Tue Nov 30, 2004 7:27 am Post subject: |
|
|
Not entirely true.
Unrolling loops helps the hardware make the right decisions. It's alot easier to parallelize and you avoid loop overhead.
However, it does impact L1 cache as you mentioned. This whole issue is discussed in another PDF on AMD's website, along with other optimizations. Software dos and don'ts.
As for whether or not it helps in real applications, I decided to find out for myself. I generally choose xmame for benchmarking instead of the strange abstract benchmarks that have no meaning to me. (Linpack? SPEC? It means something but not everything.)
http://www.anthrofox.org/code/mame/xmame64_bench88.html
The bottom table shows a decent sample of MAME drivers, and demonstrates:
1. -fomit-frame-pointer really is useless on X86-64
2. 64-bit itself is more important than all the silly ricer CFLAGS I've tried so far.
3. Although not shown, -O3 does perform better than -O2 or -O1. This is not always true outside of xmame, though.
4. -funroll-all-loops does provide a few percent increase, and only the occasional decrease
5. -funroll-all-loops -ftracer -fpeel-loops provides a few percent inccrease pretty consistently and seldom hurts. Generally the damage is under 1% when it hurts. It definitely helps the games that need it the most, too.
The next question is what is the impact of -funswitch-loops and does -funroll-loops perform better than -funroll-all-loops?
Intuition says -funroll-loops might be safer than blindly unrolling all loops.
I do not edit my /etc/make.conf to have ricer CFLAGS settiings, though. I keep the system conservative. amd64 itself is generally a bigger and safer win than exotic and dangerous CFLAGS.
Last edited by toofastforyahuh on Tue Nov 30, 2004 10:07 am; edited 2 times in total |
|
Back to top |
|
|
>Octoploid< n00b
Joined: 27 Jun 2004 Posts: 57
|
Posted: Tue Nov 30, 2004 7:44 am Post subject: |
|
|
Unfortunately the link you provided is unreachable from anywhere outside your own machine. |
|
Back to top |
|
|
mooseboy n00b
Joined: 14 Oct 2004 Posts: 26
|
Posted: Tue Nov 30, 2004 8:11 am Post subject: |
|
|
>Octoploid< wrote: |
Unfortunately the link you provided is unreachable from anywhere outside your own machine. | |
|
Back to top |
|
|
toofastforyahuh Apprentice
Joined: 18 May 2004 Posts: 164
|
Posted: Tue Nov 30, 2004 10:04 am Post subject: |
|
|
Sorry, wrong window. Fixed the link above and here.
http://www.anthrofox.org/code/mame/xmame64_bench88.html
Also added a third table with -funroll-loops and -funswitch-loops results.
Seems neither is a huge win. They add a bit of noise to the scores. Sometimes -funroll-loops is better than -funroll-all-loops, but not always. Addimg -funswitch-loops is also not a guaranteed win for xmame.
In previous experiments I also compared gcc-3.3.x to gcc-3.4.0 and found almost no difference with -march=k8. Again, a fair amount of noise, but not really harmful. Just disappointing to those of us who were expecting more. Sometimes reality hurts. |
|
Back to top |
|
|
barry Apprentice
Joined: 01 May 2002 Posts: 170 Location: UK
|
Posted: Tue Nov 30, 2004 10:45 am Post subject: |
|
|
Loop unrolling does do more. Take this simple C example:
Code: | main()
{
int x=9000;
while (x != 0) {
x = x - 1;
}
} |
The function should perform a loop which decreases x by 1 until it reaches zero. If you compile this with -O2 or -O3, that's exactly what happens, as you can see from the generated assembly:
However, with -funroll-loops, the compiler can optimise the loop by decreasing x by 8 each time instead, and this is the result:
Code: | subl $8, %eax
jne .L4
|
So without -funroll-loops, the loop is performed 9000 times. With -funroll-loops, it's performed 1125 times.
Of course this kind of optimisation is rare - the compiler must be able to change the code without it breaking anything, as it can in this example. |
|
Back to top |
|
|
lavish Bodhisattva
Joined: 13 Sep 2004 Posts: 4296
|
Posted: Tue Nov 30, 2004 10:15 pm Post subject: |
|
|
I would like to thanks toofastforyahuh and barry... very interesting posts!
now... is it possibleto make any conclusions?
-march=athlon64 03 -funroll-all-loops -ftracer -fpeel-loops -pipe
is a good way to go?
My actuals CFLAGS are quite conservative but they worked fine for me (at least)
-march=athlon64 -O2 -fweb -frename-registers -ftracer -pipe _________________ minimalblue.com | secgroup.github.io/ |
|
Back to top |
|
|
toofastforyahuh Apprentice
Joined: 18 May 2004 Posts: 164
|
Posted: Wed Dec 01, 2004 8:40 am Post subject: |
|
|
Is it possible to make conclusions?
For xmame on current K8 CPUs (all Athlon64/Opteron/FX models) it seems like that is the best way to go. It's alot of really intense vanilla C code. Various data structures and the occasional arrays. I built my machine for xmame first and video work second, so that's what I focus on for now.
As for generalizing beyond xmame, I can't give you any firm conclusions. Different programs *will* behave differently. Sometimes -O2 really is better than -O3. And I strongly caution you exotic CFLAGS can and do break things on some programs. I have personally witnessed gcc-3.2.x screw up xmame on MIPS/IRIX just by using -O3 alone. And if you add flags willy nilly all over the place---even if it seems safe---you can be asking for trouble. For example, -funit-at-a-time is built into -O2 and -O3, but if you also add it again to the command line along with -O3 (as AMD's PDF improperly suggested!), gcc-3.4.3 will die during xmame compilation! You'd think it'd be safe, but gcc barfs anyway.
My recommendation is to keep /etc/make.conf something safe and simple. I keep mine at -march=k8 -O2 -pipe. Then for the programs I care about and spend alot of CPU cycles on--like xmame--I try to optimize those individually. But admittedly 3-4% is far, far, far less significant than going amd64 in the first place. Just using amd64 is the real win here. Everything else I've done here is just gravy on top of it and academic.
-march=k8 -O2 -pipe <--should probably be safe and good all around
-march=k8 -O3 -pipe <---MIGHT be better and still safe, depending on the program
-march=k8 -O3 -funroll-all-loops -fpeel-loops -ftracer -pipe <---nitro burning for xmame, at least |
|
Back to top |
|
|
barry Apprentice
Joined: 01 May 2002 Posts: 170 Location: UK
|
Posted: Wed Dec 01, 2004 11:17 am Post subject: |
|
|
I would agree with the last post. Keep /etc/make.conf simple. By adding -ffancy-obscure-optimisation you might gain an overall 1% speed increase, but what's the point if you've now introduced subtle bugs into perl, python, X.org, and bash? You'll notice that many programs that benefit greatly from optimisations like -funroll-loops include it automatically during the build process.
'-march=k8 -O2 -fweb -frename-registers -pipe' seems safe. Both -fweb and -frename-registers are included in -O3 - both make debugging more difficult, but don't increase code size like -finline-functions, which is also part of -O3.
Also remember that the x86-64 platform is not nearly as well-tested as x86, which makes using aggressive CFLAGS even less wise. |
|
Back to top |
|
|
Trevoke Advocate
Joined: 04 Sep 2004 Posts: 4099 Location: NY, NY
|
Posted: Wed Dec 01, 2004 6:27 pm Post subject: |
|
|
What's the difference between
-march=athlon-64
-match=k8
And whatever other options are available for AMD64? _________________ Votre moment detente
What is the nature of conflict? |
|
Back to top |
|
|
lavish Bodhisattva
Joined: 13 Sep 2004 Posts: 4296
|
Posted: Wed Dec 01, 2004 7:34 pm Post subject: |
|
|
Trevoke wrote: | What's the difference between
-march=athlon-64
-match=k8
And whatever other options are available for AMD64? |
There are no differences
read the technotes _________________ minimalblue.com | secgroup.github.io/ |
|
Back to top |
|
|
Trevoke Advocate
Joined: 04 Sep 2004 Posts: 4099 Location: NY, NY
|
Posted: Wed Dec 01, 2004 7:55 pm Post subject: |
|
|
I have; if there are no differences, then why keep several separate options? _________________ Votre moment detente
What is the nature of conflict? |
|
Back to top |
|
|
barry Apprentice
Joined: 01 May 2002 Posts: 170 Location: UK
|
Posted: Wed Dec 01, 2004 8:20 pm Post subject: |
|
|
They are identical (though it's athlon64, not athlon-64). There are other cases where you can use equivalent options, like pentiumpro/i686 or pentium/i586. This is all explained in the GCC info pages. |
|
Back to top |
|
|
lavish Bodhisattva
Joined: 13 Sep 2004 Posts: 4296
|
Posted: Wed Dec 01, 2004 9:08 pm Post subject: |
|
|
Trevoke wrote: | I have; if there are no differences, then why keep several separate options? |
U have? lol eheh
There will be some differences in gcc >=4.0 _________________ minimalblue.com | secgroup.github.io/ |
|
Back to top |
|
|
|