Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[gcc 3.4] AMD's Recommended CFLAGS
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3, 4, 5  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
zinion
Guru
Guru


Joined: 27 Oct 2004
Posts: 541
Location: Ruhgebietshausen

PostPosted: Fri Nov 26, 2004 9:29 am    Post subject: [gcc 3.4] AMD's Recommended CFLAGS Reply with quote

Code:

CFLAGS="-O3 -march=athlon64 -ffast-math -funroll-all-loops -funit-at-a-time -fpeel-loops -ftracer -funswitch-loops -fomit-frame-pointer -pipe"


AMD (2004): Compiler Usage Guidelines for 64-bit Operating Systems on AMD64 Platforms (online)(16.11.2004) http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/32035.pdf
(Page 29)

Maybe somebody can make this sticky? ;)
_________________
Es ist schön und warm
hier im Gentoo-Land
Back to top
View user's profile Send private message
AndersH
n00b
n00b


Joined: 15 Oct 2003
Posts: 3
Location: Göteborg, Sweden

PostPosted: Fri Nov 26, 2004 10:04 am    Post subject: Reply with quote

Looks interesting, but isn't -ffast-math dangerous in certain situations, or am I misremembering something? Does anyone dare to try these flags and tell us the results?
Back to top
View user's profile Send private message
LordArthas
Guru
Guru


Joined: 01 Nov 2004
Posts: 500
Location: Maniago, Friûl, Italia

PostPosted: Fri Nov 26, 2004 10:29 am    Post subject: Reply with quote

Hi!

AndersH wrote:
Looks interesting, but isn't -ffast-math dangerous in certain situations, or am I misremembering something? Does anyone dare to try these flags and tell us the results?


I jest update my /etc/make.conf with these flag, I'll take a look at what happens as soon as I emerge something. ;-)

Michele.
Back to top
View user's profile Send private message
zinion
Guru
Guru


Joined: 27 Oct 2004
Posts: 541
Location: Ruhgebietshausen

PostPosted: Fri Nov 26, 2004 11:01 am    Post subject: Reply with quote

Since I use 3.4, I have this CFLAGS in my make.conf and everything runs fine. I recompiled my system (emerge -e worl)and there were one or two GNOME-things that didn't compile but I don't remember which and I don't know if the CFLAGS are the reason. Because I use KDE, I didn't investigate any further and used emerge --resume --skipfirst
_________________
Es ist schön und warm
hier im Gentoo-Land
Back to top
View user's profile Send private message
barry
Apprentice
Apprentice


Joined: 01 May 2002
Posts: 170
Location: UK

PostPosted: Fri Nov 26, 2004 5:14 pm    Post subject: Reply with quote

Those flags may be recommended for optimal performance, but they're likely to break some packages. It'd be safer to stick to -fomit-frame-pointer -march=k8 -O2 -pipe, and perhaps -frename-registers, -fweb and -funit-at-a-time.
Back to top
View user's profile Send private message
get sirius
Guru
Guru


Joined: 27 Apr 2002
Posts: 316
Location: Madison, WI

PostPosted: Fri Nov 26, 2004 6:06 pm    Post subject: Reply with quote

"-funit-at-a-time" is included in the -O2 optimizations and, by extension, in the -O3 optimizations as well. I've used -ffast-math in my u/p box and it sure made Setiathome fly :D .

EDIT: And -frename-registers and -fweb are included in the O3 optimizations.


Last edited by get sirius on Fri Nov 26, 2004 7:50 pm; edited 1 time in total
Back to top
View user's profile Send private message
barry
Apprentice
Apprentice


Joined: 01 May 2002
Posts: 170
Location: UK

PostPosted: Fri Nov 26, 2004 6:43 pm    Post subject: Reply with quote

I didn't realise -funit-at-a-time was included in -O2. -ffast-math will give good boosts for certain packages, but will probably break others.

A lot of ebuilds will enable -ffast-math if it'll help anyway, so I'd recommend leaving it out of /etc/make.conf.
Back to top
View user's profile Send private message
toofastforyahuh
Apprentice
Apprentice


Joined: 18 May 2004
Posts: 164

PostPosted: Sat Nov 27, 2004 8:02 am    Post subject: Reply with quote

Is it even necessary to use -fomit-frame-pointer?

As per the kernel changelogs:

http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.9


Quote:

[PATCH] x86_64: remove CONFIG_FRAME_POINTER

CONFIG_FRAME_POINTER has never worked on x86-64 because it never passed
-fno-omit-frame-pointer to the compiler, and that is the only way to get a
frame pointer on x86-64.
Back to top
View user's profile Send private message
AnXa
Apprentice
Apprentice


Joined: 06 Apr 2004
Posts: 250

PostPosted: Sun Nov 28, 2004 11:14 am    Post subject: Reply with quote

I got working machine when compiling with those options. It's fast stable and anything you can expect from 64bit OS. ;)
_________________
The idea isn't about how do you see or hear it, it's about how do you experience it...
Back to top
View user's profile Send private message
GentooBox
Veteran
Veteran


Joined: 22 Jun 2003
Posts: 1168
Location: Denmark

PostPosted: Sun Nov 28, 2004 2:10 pm    Post subject: Reply with quote

can someone please post the cflags from the pdf file.
_________________
Encrypt, lock up everything and duct tape the rest
Back to top
View user's profile Send private message
barry
Apprentice
Apprentice


Joined: 01 May 2002
Posts: 170
Location: UK

PostPosted: Sun Nov 28, 2004 8:57 pm    Post subject: Reply with quote

Because something appears to be stable, doesn't necessarily mean it is. There may be subtle bugs introduced by using aggressive optimisations. It's best to optimise packages individually with options like these.

There's also no doubt that -funroll-all-loops will cause performance degredation with some packages.
Back to top
View user's profile Send private message
>Octoploid<
n00b
n00b


Joined: 27 Jun 2004
Posts: 57

PostPosted: Sun Nov 28, 2004 10:38 pm    Post subject: Reply with quote

It's stupid to unroll loops and do software pipelining on
a massive out of order processor, such as the Athlon64.
You will gain nothing and ,due to unrolling, your instruction
cache will miss more often (because you increase the amount
of code) making your code run slower.
Let the hardware make the decisions. It is faster and more
competent than the compiler...

BTW the AMD guideline is for for micro kernels and benchmarks,
not for general purpose programs.
Back to top
View user's profile Send private message
toofastforyahuh
Apprentice
Apprentice


Joined: 18 May 2004
Posts: 164

PostPosted: Tue Nov 30, 2004 7:27 am    Post subject: Reply with quote

Not entirely true.

Unrolling loops helps the hardware make the right decisions. It's alot easier to parallelize and you avoid loop overhead.
However, it does impact L1 cache as you mentioned. This whole issue is discussed in another PDF on AMD's website, along with other optimizations. Software dos and don'ts.

As for whether or not it helps in real applications, I decided to find out for myself. I generally choose xmame for benchmarking instead of the strange abstract benchmarks that have no meaning to me. (Linpack? SPEC? It means something but not everything.)

http://www.anthrofox.org/code/mame/xmame64_bench88.html

The bottom table shows a decent sample of MAME drivers, and demonstrates:
1. -fomit-frame-pointer really is useless on X86-64
2. 64-bit itself is more important than all the silly ricer CFLAGS I've tried so far.
3. Although not shown, -O3 does perform better than -O2 or -O1. This is not always true outside of xmame, though.
4. -funroll-all-loops does provide a few percent increase, and only the occasional decrease
5. -funroll-all-loops -ftracer -fpeel-loops provides a few percent inccrease pretty consistently and seldom hurts. Generally the damage is under 1% when it hurts. It definitely helps the games that need it the most, too.

The next question is what is the impact of -funswitch-loops and does -funroll-loops perform better than -funroll-all-loops?

Intuition says -funroll-loops might be safer than blindly unrolling all loops.

I do not edit my /etc/make.conf to have ricer CFLAGS settiings, though. I keep the system conservative. amd64 itself is generally a bigger and safer win than exotic and dangerous CFLAGS.


Last edited by toofastforyahuh on Tue Nov 30, 2004 10:07 am; edited 2 times in total
Back to top
View user's profile Send private message
>Octoploid<
n00b
n00b


Joined: 27 Jun 2004
Posts: 57

PostPosted: Tue Nov 30, 2004 7:44 am    Post subject: Reply with quote

toofastforyahuh wrote:
Not entirely true.
file:///home/sbehling/mame88/xmame64_bench88.html


Unfortunately the link you provided is unreachable from anywhere outside your own machine.
Back to top
View user's profile Send private message
mooseboy
n00b
n00b


Joined: 14 Oct 2004
Posts: 26

PostPosted: Tue Nov 30, 2004 8:11 am    Post subject: Reply with quote

>Octoploid< wrote:
toofastforyahuh wrote:
Not entirely true.
file:///home/sbehling/mame88/xmame64_bench88.html


Unfortunately the link you provided is unreachable from anywhere outside your own machine.
:lol: :lol: :lol:
Back to top
View user's profile Send private message
toofastforyahuh
Apprentice
Apprentice


Joined: 18 May 2004
Posts: 164

PostPosted: Tue Nov 30, 2004 10:04 am    Post subject: Reply with quote

Sorry, wrong window. Fixed the link above and here.
http://www.anthrofox.org/code/mame/xmame64_bench88.html

Also added a third table with -funroll-loops and -funswitch-loops results.
Seems neither is a huge win. They add a bit of noise to the scores. Sometimes -funroll-loops is better than -funroll-all-loops, but not always. Addimg -funswitch-loops is also not a guaranteed win for xmame.


In previous experiments I also compared gcc-3.3.x to gcc-3.4.0 and found almost no difference with -march=k8. Again, a fair amount of noise, but not really harmful. Just disappointing to those of us who were expecting more. Sometimes reality hurts.
Back to top
View user's profile Send private message
barry
Apprentice
Apprentice


Joined: 01 May 2002
Posts: 170
Location: UK

PostPosted: Tue Nov 30, 2004 10:45 am    Post subject: Reply with quote

Loop unrolling does do more. Take this simple C example:

Code:
main()
{
  int x=9000;
  while (x != 0) {
    x = x - 1;
  }
}


The function should perform a loop which decreases x by 1 until it reaches zero. If you compile this with -O2 or -O3, that's exactly what happens, as you can see from the generated assembly:

Code:
decl    %eax
jne     .L4


However, with -funroll-loops, the compiler can optimise the loop by decreasing x by 8 each time instead, and this is the result:

Code:
subl    $8, %eax
jne     .L4


So without -funroll-loops, the loop is performed 9000 times. With -funroll-loops, it's performed 1125 times.

Of course this kind of optimisation is rare - the compiler must be able to change the code without it breaking anything, as it can in this example.
Back to top
View user's profile Send private message
lavish
Bodhisattva
Bodhisattva


Joined: 13 Sep 2004
Posts: 4296

PostPosted: Tue Nov 30, 2004 10:15 pm    Post subject: Reply with quote

I would like to thanks toofastforyahuh and barry... very interesting posts!
now... is it possibleto make any conclusions?


-march=athlon64 03 -funroll-all-loops -ftracer -fpeel-loops -pipe


is a good way to go?

My actuals CFLAGS are quite conservative but they worked fine for me (at least)
-march=athlon64 -O2 -fweb -frename-registers -ftracer -pipe
_________________
minimalblue.com | secgroup.github.io/
Back to top
View user's profile Send private message
toofastforyahuh
Apprentice
Apprentice


Joined: 18 May 2004
Posts: 164

PostPosted: Wed Dec 01, 2004 8:40 am    Post subject: Reply with quote

Is it possible to make conclusions?

For xmame on current K8 CPUs (all Athlon64/Opteron/FX models) it seems like that is the best way to go. It's alot of really intense vanilla C code. Various data structures and the occasional arrays. I built my machine for xmame first and video work second, so that's what I focus on for now.

As for generalizing beyond xmame, I can't give you any firm conclusions. Different programs *will* behave differently. Sometimes -O2 really is better than -O3. And I strongly caution you exotic CFLAGS can and do break things on some programs. I have personally witnessed gcc-3.2.x screw up xmame on MIPS/IRIX just by using -O3 alone. And if you add flags willy nilly all over the place---even if it seems safe---you can be asking for trouble. For example, -funit-at-a-time is built into -O2 and -O3, but if you also add it again to the command line along with -O3 (as AMD's PDF improperly suggested!), gcc-3.4.3 will die during xmame compilation! You'd think it'd be safe, but gcc barfs anyway.

My recommendation is to keep /etc/make.conf something safe and simple. I keep mine at -march=k8 -O2 -pipe. Then for the programs I care about and spend alot of CPU cycles on--like xmame--I try to optimize those individually. But admittedly 3-4% is far, far, far less significant than going amd64 in the first place. Just using amd64 is the real win here. Everything else I've done here is just gravy on top of it and academic.

-march=k8 -O2 -pipe <--should probably be safe and good all around
-march=k8 -O3 -pipe <---MIGHT be better and still safe, depending on the program
-march=k8 -O3 -funroll-all-loops -fpeel-loops -ftracer -pipe <---nitro burning for xmame, at least
Back to top
View user's profile Send private message
barry
Apprentice
Apprentice


Joined: 01 May 2002
Posts: 170
Location: UK

PostPosted: Wed Dec 01, 2004 11:17 am    Post subject: Reply with quote

I would agree with the last post. Keep /etc/make.conf simple. By adding -ffancy-obscure-optimisation you might gain an overall 1% speed increase, but what's the point if you've now introduced subtle bugs into perl, python, X.org, and bash? You'll notice that many programs that benefit greatly from optimisations like -funroll-loops include it automatically during the build process.

'-march=k8 -O2 -fweb -frename-registers -pipe' seems safe. Both -fweb and -frename-registers are included in -O3 - both make debugging more difficult, but don't increase code size like -finline-functions, which is also part of -O3.

Also remember that the x86-64 platform is not nearly as well-tested as x86, which makes using aggressive CFLAGS even less wise.
Back to top
View user's profile Send private message
Trevoke
Advocate
Advocate


Joined: 04 Sep 2004
Posts: 4099
Location: NY, NY

PostPosted: Wed Dec 01, 2004 6:27 pm    Post subject: Reply with quote

What's the difference between
-march=athlon-64
-match=k8

And whatever other options are available for AMD64?
_________________
Votre moment detente
What is the nature of conflict?
Back to top
View user's profile Send private message
lavish
Bodhisattva
Bodhisattva


Joined: 13 Sep 2004
Posts: 4296

PostPosted: Wed Dec 01, 2004 7:34 pm    Post subject: Reply with quote

Trevoke wrote:
What's the difference between
-march=athlon-64
-match=k8

And whatever other options are available for AMD64?


There are no differences ;)

read the technotes
_________________
minimalblue.com | secgroup.github.io/
Back to top
View user's profile Send private message
Trevoke
Advocate
Advocate


Joined: 04 Sep 2004
Posts: 4099
Location: NY, NY

PostPosted: Wed Dec 01, 2004 7:55 pm    Post subject: Reply with quote

I have; if there are no differences, then why keep several separate options?
_________________
Votre moment detente
What is the nature of conflict?
Back to top
View user's profile Send private message
barry
Apprentice
Apprentice


Joined: 01 May 2002
Posts: 170
Location: UK

PostPosted: Wed Dec 01, 2004 8:20 pm    Post subject: Reply with quote

They are identical (though it's athlon64, not athlon-64). There are other cases where you can use equivalent options, like pentiumpro/i686 or pentium/i586. This is all explained in the GCC info pages.
Back to top
View user's profile Send private message
lavish
Bodhisattva
Bodhisattva


Joined: 13 Sep 2004
Posts: 4296

PostPosted: Wed Dec 01, 2004 9:08 pm    Post subject: Reply with quote

Trevoke wrote:
I have; if there are no differences, then why keep several separate options?


U have? lol eheh
There will be some differences in gcc >=4.0
_________________
minimalblue.com | secgroup.github.io/
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Goto page 1, 2, 3, 4, 5  Next
Page 1 of 5

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum