Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Architectures & Platforms Gentoo on AMD64
  • Search

[gcc 3.4] AMD's Recommended CFLAGS

Have an x86-64 problem? Post here.
Locked
Advanced search
117 posts
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next
Author
Message
zinion
Guru
Guru
User avatar
Posts: 541
Joined: Wed Oct 27, 2004 10:39 pm
Location: Ruhgebietshausen
Contact:
Contact zinion
Website

[gcc 3.4] AMD's Recommended CFLAGS

  • Quote

Post by zinion » Fri Nov 26, 2004 9:29 am

Code: Select all

CFLAGS="-O3 -march=athlon64 -ffast-math -funroll-all-loops -funit-at-a-time -fpeel-loops -ftracer -funswitch-loops -fomit-frame-pointer -pipe" 
AMD (2004): Compiler Usage Guidelines for 64-bit Operating Systems on AMD64 Platforms (online)(16.11.2004) http://www.amd.com/us-en/assets/content ... /32035.pdf
(Page 29)

Maybe somebody can make this sticky? ;)
Es ist schön und warm
hier im Gentoo-Land
Top
AndersH
n00b
n00b
User avatar
Posts: 3
Joined: Wed Oct 15, 2003 8:02 am
Location: Göteborg, Sweden

  • Quote

Post by AndersH » Fri Nov 26, 2004 10:04 am

Looks interesting, but isn't -ffast-math dangerous in certain situations, or am I misremembering something? Does anyone dare to try these flags and tell us the results?
Top
LordArthas
Guru
Guru
User avatar
Posts: 500
Joined: Mon Nov 01, 2004 1:32 pm
Location: Maniago, Friûl, Italia
Contact:
Contact LordArthas
Website

  • Quote

Post by LordArthas » Fri Nov 26, 2004 10:29 am

Hi!
AndersH wrote:Looks interesting, but isn't -ffast-math dangerous in certain situations, or am I misremembering something? Does anyone dare to try these flags and tell us the results?
I jest update my /etc/make.conf with these flag, I'll take a look at what happens as soon as I emerge something. ;-)

Michele.
Top
zinion
Guru
Guru
User avatar
Posts: 541
Joined: Wed Oct 27, 2004 10:39 pm
Location: Ruhgebietshausen
Contact:
Contact zinion
Website

  • Quote

Post by zinion » Fri Nov 26, 2004 11:01 am

Since I use 3.4, I have this CFLAGS in my make.conf and everything runs fine. I recompiled my system (emerge -e worl)and there were one or two GNOME-things that didn't compile but I don't remember which and I don't know if the CFLAGS are the reason. Because I use KDE, I didn't investigate any further and used emerge --resume --skipfirst
Es ist schön und warm
hier im Gentoo-Land
Top
barry
Apprentice
Apprentice
Posts: 170
Joined: Wed May 01, 2002 10:18 pm
Location: UK

  • Quote

Post by barry » Fri Nov 26, 2004 5:14 pm

Those flags may be recommended for optimal performance, but they're likely to break some packages. It'd be safer to stick to -fomit-frame-pointer -march=k8 -O2 -pipe, and perhaps -frename-registers, -fweb and -funit-at-a-time.
Top
get sirius
Guru
Guru
Posts: 316
Joined: Sat Apr 27, 2002 10:41 pm
Location: Madison, WI

  • Quote

Post by get sirius » Fri Nov 26, 2004 6:06 pm

"-funit-at-a-time" is included in the -O2 optimizations and, by extension, in the -O3 optimizations as well. I've used -ffast-math in my u/p box and it sure made Setiathome fly :D .

EDIT: And -frename-registers and -fweb are included in the O3 optimizations.
Last edited by get sirius on Fri Nov 26, 2004 7:50 pm, edited 1 time in total.
Top
barry
Apprentice
Apprentice
Posts: 170
Joined: Wed May 01, 2002 10:18 pm
Location: UK

  • Quote

Post by barry » Fri Nov 26, 2004 6:43 pm

I didn't realise -funit-at-a-time was included in -O2. -ffast-math will give good boosts for certain packages, but will probably break others.

A lot of ebuilds will enable -ffast-math if it'll help anyway, so I'd recommend leaving it out of /etc/make.conf.
Top
toofastforyahuh
Apprentice
Apprentice
Posts: 172
Joined: Tue May 18, 2004 6:46 am

  • Quote

Post by toofastforyahuh » Sat Nov 27, 2004 8:02 am

Is it even necessary to use -fomit-frame-pointer?

As per the kernel changelogs:

http://www.kernel.org/pub/linux/kernel/ ... eLog-2.6.9

[PATCH] x86_64: remove CONFIG_FRAME_POINTER

CONFIG_FRAME_POINTER has never worked on x86-64 because it never passed
-fno-omit-frame-pointer to the compiler, and that is the only way to get a
frame pointer on x86-64.
Top
AnXa
Apprentice
Apprentice
User avatar
Posts: 250
Joined: Tue Apr 06, 2004 5:29 pm

  • Quote

Post by AnXa » Sun Nov 28, 2004 11:14 am

I got working machine when compiling with those options. It's fast stable and anything you can expect from 64bit OS. ;)
The idea isn't about how do you see or hear it, it's about how do you experience it...
Top
GentooBox
Veteran
Veteran
User avatar
Posts: 1168
Joined: Sun Jun 22, 2003 10:52 am
Location: Denmark

  • Quote

Post by GentooBox » Sun Nov 28, 2004 2:10 pm

can someone please post the cflags from the pdf file.
Encrypt, lock up everything and duct tape the rest
Top
barry
Apprentice
Apprentice
Posts: 170
Joined: Wed May 01, 2002 10:18 pm
Location: UK

  • Quote

Post by barry » Sun Nov 28, 2004 8:57 pm

Because something appears to be stable, doesn't necessarily mean it is. There may be subtle bugs introduced by using aggressive optimisations. It's best to optimise packages individually with options like these.

There's also no doubt that -funroll-all-loops will cause performance degredation with some packages.
Top
>Octoploid<
n00b
n00b
User avatar
Posts: 57
Joined: Sun Jun 27, 2004 5:05 am

  • Quote

Post by >Octoploid< » Sun Nov 28, 2004 10:38 pm

It's stupid to unroll loops and do software pipelining on
a massive out of order processor, such as the Athlon64.
You will gain nothing and ,due to unrolling, your instruction
cache will miss more often (because you increase the amount
of code) making your code run slower.
Let the hardware make the decisions. It is faster and more
competent than the compiler...

BTW the AMD guideline is for for micro kernels and benchmarks,
not for general purpose programs.
Top
toofastforyahuh
Apprentice
Apprentice
Posts: 172
Joined: Tue May 18, 2004 6:46 am

  • Quote

Post by toofastforyahuh » Tue Nov 30, 2004 7:27 am

Not entirely true.

Unrolling loops helps the hardware make the right decisions. It's alot easier to parallelize and you avoid loop overhead.
However, it does impact L1 cache as you mentioned. This whole issue is discussed in another PDF on AMD's website, along with other optimizations. Software dos and don'ts.

As for whether or not it helps in real applications, I decided to find out for myself. I generally choose xmame for benchmarking instead of the strange abstract benchmarks that have no meaning to me. (Linpack? SPEC? It means something but not everything.)

http://www.anthrofox.org/code/mame/xmame64_bench88.html

The bottom table shows a decent sample of MAME drivers, and demonstrates:
1. -fomit-frame-pointer really is useless on X86-64
2. 64-bit itself is more important than all the silly ricer CFLAGS I've tried so far.
3. Although not shown, -O3 does perform better than -O2 or -O1. This is not always true outside of xmame, though.
4. -funroll-all-loops does provide a few percent increase, and only the occasional decrease
5. -funroll-all-loops -ftracer -fpeel-loops provides a few percent inccrease pretty consistently and seldom hurts. Generally the damage is under 1% when it hurts. It definitely helps the games that need it the most, too.

The next question is what is the impact of -funswitch-loops and does -funroll-loops perform better than -funroll-all-loops?

Intuition says -funroll-loops might be safer than blindly unrolling all loops.

I do not edit my /etc/make.conf to have ricer CFLAGS settiings, though. I keep the system conservative. amd64 itself is generally a bigger and safer win than exotic and dangerous CFLAGS.
Last edited by toofastforyahuh on Tue Nov 30, 2004 10:07 am, edited 2 times in total.
Top
>Octoploid<
n00b
n00b
User avatar
Posts: 57
Joined: Sun Jun 27, 2004 5:05 am

  • Quote

Post by >Octoploid< » Tue Nov 30, 2004 7:44 am

toofastforyahuh wrote:Not entirely true.
[url]file:///home/sbehling/mame88/xmame64_bench88.html[/url]
Unfortunately the link you provided is unreachable from anywhere outside your own machine.
Top
mooseboy
n00b
n00b
Posts: 26
Joined: Thu Oct 14, 2004 7:44 am

  • Quote

Post by mooseboy » Tue Nov 30, 2004 8:11 am

>Octoploid< wrote:
toofastforyahuh wrote:Not entirely true.
[url]file:///home/sbehling/mame88/xmame64_bench88.html[/url]
Unfortunately the link you provided is unreachable from anywhere outside your own machine.
:lol: :lol: :lol:
Top
toofastforyahuh
Apprentice
Apprentice
Posts: 172
Joined: Tue May 18, 2004 6:46 am

  • Quote

Post by toofastforyahuh » Tue Nov 30, 2004 10:04 am

Sorry, wrong window. Fixed the link above and here.
http://www.anthrofox.org/code/mame/xmame64_bench88.html

Also added a third table with -funroll-loops and -funswitch-loops results.
Seems neither is a huge win. They add a bit of noise to the scores. Sometimes -funroll-loops is better than -funroll-all-loops, but not always. Addimg -funswitch-loops is also not a guaranteed win for xmame.


In previous experiments I also compared gcc-3.3.x to gcc-3.4.0 and found almost no difference with -march=k8. Again, a fair amount of noise, but not really harmful. Just disappointing to those of us who were expecting more. Sometimes reality hurts.
Top
barry
Apprentice
Apprentice
Posts: 170
Joined: Wed May 01, 2002 10:18 pm
Location: UK

  • Quote

Post by barry » Tue Nov 30, 2004 10:45 am

Loop unrolling does do more. Take this simple C example:

Code: Select all

main()
{
  int x=9000;
  while (x != 0) {
    x = x - 1;
  }
}
The function should perform a loop which decreases x by 1 until it reaches zero. If you compile this with -O2 or -O3, that's exactly what happens, as you can see from the generated assembly:

Code: Select all

decl    %eax
jne     .L4
However, with -funroll-loops, the compiler can optimise the loop by decreasing x by 8 each time instead, and this is the result:

Code: Select all

subl    $8, %eax
jne     .L4
So without -funroll-loops, the loop is performed 9000 times. With -funroll-loops, it's performed 1125 times.

Of course this kind of optimisation is rare - the compiler must be able to change the code without it breaking anything, as it can in this example.
Top
lavish
Bodhisattva
Bodhisattva
Posts: 4296
Joined: Mon Sep 13, 2004 10:33 am
Contact:
Contact lavish
Website

  • Quote

Post by lavish » Tue Nov 30, 2004 10:15 pm

I would like to thanks toofastforyahuh and barry... very interesting posts!
now... is it possibleto make any conclusions?


-march=athlon64 03 -funroll-all-loops -ftracer -fpeel-loops -pipe


is a good way to go?

My actuals CFLAGS are quite conservative but they worked fine for me (at least)
-march=athlon64 -O2 -fweb -frename-registers -ftracer -pipe
minimalblue.com | secgroup.github.io/
Top
toofastforyahuh
Apprentice
Apprentice
Posts: 172
Joined: Tue May 18, 2004 6:46 am

  • Quote

Post by toofastforyahuh » Wed Dec 01, 2004 8:40 am

Is it possible to make conclusions?

For xmame on current K8 CPUs (all Athlon64/Opteron/FX models) it seems like that is the best way to go. It's alot of really intense vanilla C code. Various data structures and the occasional arrays. I built my machine for xmame first and video work second, so that's what I focus on for now.

As for generalizing beyond xmame, I can't give you any firm conclusions. Different programs *will* behave differently. Sometimes -O2 really is better than -O3. And I strongly caution you exotic CFLAGS can and do break things on some programs. I have personally witnessed gcc-3.2.x screw up xmame on MIPS/IRIX just by using -O3 alone. And if you add flags willy nilly all over the place---even if it seems safe---you can be asking for trouble. For example, -funit-at-a-time is built into -O2 and -O3, but if you also add it again to the command line along with -O3 (as AMD's PDF improperly suggested!), gcc-3.4.3 will die during xmame compilation! You'd think it'd be safe, but gcc barfs anyway.

My recommendation is to keep /etc/make.conf something safe and simple. I keep mine at -march=k8 -O2 -pipe. Then for the programs I care about and spend alot of CPU cycles on--like xmame--I try to optimize those individually. But admittedly 3-4% is far, far, far less significant than going amd64 in the first place. Just using amd64 is the real win here. Everything else I've done here is just gravy on top of it and academic.

-march=k8 -O2 -pipe <--should probably be safe and good all around
-march=k8 -O3 -pipe <---MIGHT be better and still safe, depending on the program
-march=k8 -O3 -funroll-all-loops -fpeel-loops -ftracer -pipe <---nitro burning for xmame, at least
Top
barry
Apprentice
Apprentice
Posts: 170
Joined: Wed May 01, 2002 10:18 pm
Location: UK

  • Quote

Post by barry » Wed Dec 01, 2004 11:17 am

I would agree with the last post. Keep /etc/make.conf simple. By adding -ffancy-obscure-optimisation you might gain an overall 1% speed increase, but what's the point if you've now introduced subtle bugs into perl, python, X.org, and bash? You'll notice that many programs that benefit greatly from optimisations like -funroll-loops include it automatically during the build process.

'-march=k8 -O2 -fweb -frename-registers -pipe' seems safe. Both -fweb and -frename-registers are included in -O3 - both make debugging more difficult, but don't increase code size like -finline-functions, which is also part of -O3.

Also remember that the x86-64 platform is not nearly as well-tested as x86, which makes using aggressive CFLAGS even less wise.
Top
Trevoke
Advocate
Advocate
User avatar
Posts: 4099
Joined: Sat Sep 04, 2004 6:01 pm
Location: NY, NY
Contact:
Contact Trevoke
Website

  • Quote

Post by Trevoke » Wed Dec 01, 2004 6:27 pm

What's the difference between
-march=athlon-64
-match=k8

And whatever other options are available for AMD64?
Votre moment detente
What is the nature of conflict?
Top
lavish
Bodhisattva
Bodhisattva
Posts: 4296
Joined: Mon Sep 13, 2004 10:33 am
Contact:
Contact lavish
Website

  • Quote

Post by lavish » Wed Dec 01, 2004 7:34 pm

Trevoke wrote:What's the difference between
-march=athlon-64
-match=k8

And whatever other options are available for AMD64?
There are no differences ;)

read the technotes
minimalblue.com | secgroup.github.io/
Top
Trevoke
Advocate
Advocate
User avatar
Posts: 4099
Joined: Sat Sep 04, 2004 6:01 pm
Location: NY, NY
Contact:
Contact Trevoke
Website

  • Quote

Post by Trevoke » Wed Dec 01, 2004 7:55 pm

I have; if there are no differences, then why keep several separate options?
Votre moment detente
What is the nature of conflict?
Top
barry
Apprentice
Apprentice
Posts: 170
Joined: Wed May 01, 2002 10:18 pm
Location: UK

  • Quote

Post by barry » Wed Dec 01, 2004 8:20 pm

They are identical (though it's athlon64, not athlon-64). There are other cases where you can use equivalent options, like pentiumpro/i686 or pentium/i586. This is all explained in the GCC info pages.
Top
lavish
Bodhisattva
Bodhisattva
Posts: 4296
Joined: Mon Sep 13, 2004 10:33 am
Contact:
Contact lavish
Website

  • Quote

Post by lavish » Wed Dec 01, 2004 9:08 pm

Trevoke wrote:I have; if there are no differences, then why keep several separate options?
U have? lol eheh
There will be some differences in gcc >=4.0
minimalblue.com | secgroup.github.io/
Top
Locked

117 posts
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next

Return to “Gentoo on AMD64”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy