Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Architectures & Platforms Gentoo on AMD64
  • Search

[gcc 3.4] AMD's Recommended CFLAGS

Have an x86-64 problem? Post here.
Locked
Advanced search
117 posts
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next
Author
Message
gian
Apprentice
Apprentice
User avatar
Posts: 212
Joined: Mon Jul 26, 2004 2:35 pm
Location: Europe
Contact:
Contact gian
Website

fast math problems

  • Quote

Post by gian » Wed Dec 01, 2004 9:13 pm

I did get strange results using the --fast-math option with our home made circuit simulator .... (we use some lapack, blas and nag-like functions)
Top
toofastforyahuh
Apprentice
Apprentice
Posts: 172
Joined: Tue May 18, 2004 6:46 am

  • Quote

Post by toofastforyahuh » Tue Dec 07, 2004 8:39 pm

Just to be fair I just ran another datapoint. Since I also built my box for video, it's interesting to see what kind of performance I get.
I compiled mjpegtools 3 times and tested each with kino.

This is realistic since I actually use kino to do this stuff and building a DVD with mpeg2enc takes basically all day, so any performance difference is noticeable and welcome.

I took a 30.33 second DV file and ran it through kino's export MPEG tool.
I set it to type 8 (DVD) and used the following flags:

Code: Select all

mpeg2enc -v 0 -4 2 -2 1 -q 3 -D 10 -H -c -b 8000
mp2enc -v 0 -r 48000 -b 224
These are decent quality settings for video and passable settings for audio.

With mjpegtools compiled with the following flags:

Code: Select all

-march=k8 -O2 -pipe:     3:44 encode time
-march=k8 -O3 -pipe:    3:39 encode time
-march=k8 -O3 -funroll-all-loops -fpeel-loops -ftracer -pipe:  3:28 encode time
I repeated the latter run again and got exactly the same result, so I think the margin of error is low.

So here, too, my nitro CFLAGS actually do provide a benefit (about 7%). Again, since DVD encoding with mpeg2enc is an all-day affair I think here 7% savings will add up quickly to a noticeable wall-clock savings.

But I still advise caution for the rest of the system. Don't go gung-ho on CFLAGS for everything. It can backfire if you aren't careful.
Top
toofastforyahuh
Apprentice
Apprentice
Posts: 172
Joined: Tue May 18, 2004 6:46 am

  • Quote

Post by toofastforyahuh » Wed Dec 08, 2004 8:29 am

Aha! And a good case where it seems to backfire is gzip.
For me gzip is consistently faster with -O2 than with -O3 or my nitro flags.
So there you have it. No universal answer except: stay sane. Build your system sanely and safely, then only focus on the few programs you really care about and do some careful benchmark profiling. Don't add random CFLAGS just because they sound good. You'll only give the "Gentoo is Rice" crowd a laugh if it blows up on you.
Top
barry
Apprentice
Apprentice
Posts: 170
Joined: Wed May 01, 2002 10:18 pm
Location: UK

  • Quote

Post by barry » Wed Dec 08, 2004 12:51 pm

You should avoid -funroll-all-loops generally. The GCC documentation advises against using it. -funroll-loops has more chance of being beneficial, but shouldn't be used in CFLAGS because it can break packages and may slow down just as much software as it speeds up.

-O2 together with -fweb and -frename-registers is probably the safest bet.
Top
toofastforyahuh
Apprentice
Apprentice
Posts: 172
Joined: Tue May 18, 2004 6:46 am

  • Quote

Post by toofastforyahuh » Wed Dec 08, 2004 4:45 pm

Check the benchmarks above. So far, for the programs I've cared about, I've seen only a positive trend with unrolling loops, and with -O3, so it's unlikely -O2 -frename-registers is better still, but you're welcome to try xmame and mjpegtools to see if you can rise above my nitro flags. Again, I don't use crazy flags for the whole system. I refuse to compromise stability. Benchmarks on specific programs, though, definitely point toward some gravy to be had. So far that's all I can call it, though---gravy. The real meal is in going amd64 in the first place.
Top
barry
Apprentice
Apprentice
Posts: 170
Joined: Wed May 01, 2002 10:18 pm
Location: UK

  • Quote

Post by barry » Wed Dec 08, 2004 10:27 pm

The only difference between -O2 and -O3 is the inclusion of -fweb, frename-registers and -finline-functions. -funroll-loops can certainly help speed up certain applications, so it's a good idea to use it on those, but definitely not system-wide. The same goes for -ffast-math.
Top
zinion
Guru
Guru
User avatar
Posts: 541
Joined: Wed Oct 27, 2004 10:39 pm
Location: Ruhgebietshausen
Contact:
Contact zinion
Website

  • Quote

Post by zinion » Fri Dec 10, 2004 10:37 am

What exactly can happen if I use them system-wide? Because I do it and my system runs really fine the last time...
Es ist schön und warm
hier im Gentoo-Land
Top
toofastforyahuh
Apprentice
Apprentice
Posts: 172
Joined: Tue May 18, 2004 6:46 am

  • Quote

Post by toofastforyahuh » Fri Dec 10, 2004 10:59 am

Are you sure? How do you know? I could play tempest on xmame if I compiled it with -O3 on my old SGI but the vector coordinates and colors were corrupted! Just because you haven't had a failure yet doesn't mean you don't have one just waiting, lurking in the shadows.

Using wonky CFLAGS can lead to disaster if you aren't careful.

Besides, it isn't always a win. -O2 is sometimes better (as I just proved with gzip), and if you believe acovea sometimes -O1 with a few flags is better than anything else. If you just use some tricked out quirky flags blindly you're just gambling that all programs will 1. compile at all, 2. compile correctly, 3. work better. All programs do not behave the same.

It is much wiser to build a stable system with safer CFLAGS and then profile your usage.
It's not worth your time to build a 3% faster alsamixer and risk its stability, but it definitely is worth your time to understand how a bottleneck program behaves. For example, if you spend a day just waiting for mpeg2enc to finish encoding a DVD for you. A 7% speedup makes sense there because you really need it.
Top
NismoC32
Apprentice
Apprentice
User avatar
Posts: 222
Joined: Mon Apr 07, 2003 12:10 pm

  • Quote

Post by NismoC32 » Fri Dec 10, 2004 11:53 am

What CFLAG should I use during Installastion og Gentoo -x86_64 ?
Having problem getting it installed.
It stops on "gettext" if i use stage 1 and "cracklib" on stage 2 and "ucl" on stage 3.

Dont't know if it has someting to do with CFLAG setting or other things though.

I have used this settings:
"-02 -march=athlon64 -ftrace -fprefetch-loop-arrays -pipe".
Top
toofastforyahuh
Apprentice
Apprentice
Posts: 172
Joined: Tue May 18, 2004 6:46 am

  • Quote

Post by toofastforyahuh » Fri Dec 10, 2004 5:30 pm

Start with -02 -march=athlon64 -pipe and see if that helps.
Top
superwutze
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 137
Joined: Thu Dec 09, 2004 8:02 pm
Location: Europe/Vienna

  • Quote

Post by superwutze » Fri Dec 10, 2004 5:40 pm

i did a stage1 with 2004.3 and used this settings from start on:
CFLAGS="-O3 -march=opteron -funroll-loops -pipe -ftracer"
there was no problem and still there is none.
bill who? micro what?
Top
borkdox
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 123
Joined: Fri Jan 16, 2004 8:14 pm
Contact:
Contact borkdox
Website

  • Quote

Post by borkdox » Sat Dec 11, 2004 5:03 pm

NismoC32 wrote:What CFLAG should I use during Installastion og Gentoo -x86_64 ?
Having problem getting it installed.
It stops on "gettext" if i use stage 1 and "cracklib" on stage 2 and "ucl" on stage 3.

Dont't know if it has someting to do with CFLAG setting or other things though.

I have used this settings:
"-02 -march=athlon64 -ftrace -fprefetch-loop-arrays -pipe".
I installed gentoo with "-march=athlon64 -O2 -pipe -fweb -frename-registers" and it was very smooth install overall.
Top
teilo
Apprentice
Apprentice
User avatar
Posts: 276
Joined: Fri Jun 20, 2003 2:36 pm
Location: Minneapolis, MN

  • Quote

Post by teilo » Sat Dec 11, 2004 6:47 pm

elocal wrote:
NismoC32 wrote:What CFLAG should I use during Installastion og Gentoo -x86_64 ?
Having problem getting it installed.
It stops on "gettext" if i use stage 1 and "cracklib" on stage 2 and "ucl" on stage 3.

Dont't know if it has someting to do with CFLAG setting or other things though.

I have used this settings:
"-02 -march=athlon64 -ftrace -fprefetch-loop-arrays -pipe".
I installed gentoo with "-march=athlon64 -O2 -pipe -fweb -frename-registers" and it was very smooth install overall.
I use the same, but with -ftracer also. Whole system is compiled with the same flags (except for when they are filtered, obviously). Everything runs smoothly. Everything is stable.
Teilo who is called Teilo
Top
NismoC32
Apprentice
Apprentice
User avatar
Posts: 222
Joined: Mon Apr 07, 2003 12:10 pm

  • Quote

Post by NismoC32 » Mon Dec 13, 2004 7:41 am

I had to remove all CFLAGS eccept -02 and -pipe and skip emerge sync during installation to get it to work.
After reboot I could put inn more flags, I now have:
-O2 -march=athlon64 -fweb -funroll-loops -pipe and it works OK.
I have synced the portage tree and emerge all new packages.

And I now have a smooth working fast speeding fine tuned monster file server...

Thanks guys...
Top
cybrjackle
Apprentice
Apprentice
User avatar
Posts: 248
Joined: Thu Jan 09, 2003 3:37 pm
Location: USA

  • Quote

Post by cybrjackle » Wed Feb 02, 2005 3:09 am

Code: Select all

CFLAGS="-march=athlon64 -O2 -fweb -frename-registers -ftracer -pipe"
Been using those for awhile now and the box stays pretty sane :lol:
Top
silverpig
Tux's lil' helper
Tux's lil' helper
Posts: 143
Joined: Wed Dec 10, 2003 4:31 am
Location: Vancouver BC

  • Quote

Post by silverpig » Wed Feb 02, 2005 6:08 am

-march=athlon64 -02 -pipe -fomit-frame-pointer

I did the install initially with -03 and almost everything worked. I couldn't compile firefox though. I changed to -02 and it emerged just fine.
'Cause I can.
Top
WuppieCat
n00b
n00b
User avatar
Posts: 38
Joined: Thu Oct 17, 2002 3:22 pm
Location: Cheshire, UK

  • Quote

Post by WuppieCat » Thu Apr 28, 2005 2:08 pm

lavish wrote:
Trevoke wrote:I have; if there are no differences, then why keep several separate options?
U have? lol eheh
There will be some differences in gcc >=4.0
I thought I had read the same ie that gcc-4.0 would differentiate between the four potential -march cflags for amd x86-64 chips. However when I tried to find the quote I couldn't at all - in fact everything, including the release docs for gcc-4.0, points to there being no difference between these still. Is this something that has slipped to gcc-4.1 or have I just imagined it?
What also makes me curious about this (now I have considered it) is that there is no difference in the feature set between the various chips, they are differentiated by having additional HT links and different cache sizes etc. In fact there is more difference between chip revisions ie the new venice cores having SSE3/Enhanced Branch Prediction etc. Anyone know the answer to this conundrum?
Top
hvengel
Guru
Guru
Posts: 515
Joined: Sun Sep 19, 2004 1:29 am

  • Quote

Post by hvengel » Sat Apr 30, 2005 3:01 am

The AMD document also recommends the use of thier AMD Core Math Library (ACML). Just for giggles I tried emerge -pv acml and it is masked by ~amd64. I have some stuff that I run that I think might benifit from faster math libraries. Has anyone unmasked this? Does it help? Do you need to set a use flag?
Top
ozbird
Apprentice
Apprentice
User avatar
Posts: 187
Joined: Tue Oct 21, 2003 11:46 am

  • Quote

Post by ozbird » Sat Apr 30, 2005 6:42 am

silverpig wrote:-march=athlon64 -02 -pipe -fomit-frame-pointer

I did the install initially with -03 and almost everything worked. I couldn't compile firefox though. I changed to -02 and it emerged just fine.
I use CFLAGS="-march=athlon64 -O3 -ftracer -pipe" and everything, including Firefox, works fine.

If you're after a benchmark challenge, try the new Povray benchmark http://www.haveland.com/index.htm?povbench/index.php
There are some AMD64 users who claim times under 5 minutes; that's four times faster than I've been able to achieve (20m 46s)
Even when using the same optimisations that some provided, I haven't beaten that time. I suspect shenanigans...
Top
hvengel
Guru
Guru
Posts: 515
Joined: Sun Sep 19, 2004 1:29 am

  • Quote

Post by hvengel » Sat Apr 30, 2005 6:52 pm

I have one application, libpano12, that will seg fault with anything other than -O0. But everything else on my systems has been built with -O2 without problems. Other amd64 users have also reported the same problem with libpano12 on Gentoo and other distros. Just an FYI that the correct optimizations are more dependant on the specific piece of software than anything else.
Top
Joffer
Guru
Guru
User avatar
Posts: 585
Joined: Tue Sep 10, 2002 12:02 am
Location: Arendal, Norway

  • Quote

Post by Joffer » Sun Aug 07, 2005 10:24 pm

Reading this tread have made me see I'm probably a lucky guy. What I mean is I've compiled my entire system with an insane amount of CFLAGS and it is _stable_ 8):

Code: Select all

CFLAGS="-O3 -march=athlon64 -mtune=athlon64 gftracer -fprefetch-loop-arrays -pipe -funroll-loops -mfpmath=sse -fweb -frename-registers -fmove-all-movables -fpeel-loops -freduce-all-givs -mno-align-stringops -minline-all-stringops -mno-push-args -momit-leaf-frame-pointer -fomit-frame-pointer"
I am however going about now and cleaning up this mess, and choose a more sane CFLAG and later try more tweaking on "single" application where I actually can measure a difference, like divx/mp3/ogg encoding and such, and not my entire system.

Your discussion made me think about using these cflags:

Code: Select all

CFLAGS="-march="athlon64 -mtune=athlon64 -O2 -pipe -fweb -frename-registers -ftracer"
As of April 2006 - Athlon64 X2 4200+ 1GB RAM - amd64-2006.0 profiled system with portage 2.1_preX, ck-sources-2.6.16, glibc-2.4-r1 (overlay w/-Bdirect&-hashvals), binutils-2.16.91.0.6 (overlay), gcc-4.1, Xorg 7
Top
nxsty
Veteran
Veteran
User avatar
Posts: 1556
Joined: Wed Jun 23, 2004 7:00 pm
Location: .se
Contact:
Contact nxsty
Website

  • Quote

Post by nxsty » Mon Aug 08, 2005 7:53 am

hvengel wrote:The AMD document also recommends the use of thier AMD Core Math Library (ACML). Just for giggles I tried emerge -pv acml and it is masked by ~amd64. I have some stuff that I run that I think might benifit from faster math libraries. Has anyone unmasked this? Does it help? Do you need to set a use flag?
Check out this bug:
http://bugs.gentoo.org/show_bug.cgi?id=100289
Top
lightvhawk0
Guru
Guru
User avatar
Posts: 388
Joined: Fri Nov 07, 2003 12:59 am

  • Quote

Post by lightvhawk0 » Sat Aug 20, 2005 7:31 am

I think the reason why most people with crazy flags get away with them is because many packages filter out "Unstable" CFLAGS
also is there an overlay for the patched version of glibc
If God has made us in his image, we have returned him the favor. - Voltaire
Top
revertex
l33t
l33t
User avatar
Posts: 806
Joined: Wed Apr 23, 2003 9:21 am

  • Quote

Post by revertex » Fri Oct 21, 2005 9:03 am

@Joffer:

"--march=athlon64" doesn't include "mfpmath=sse"?

and why does ppl still add "-fomit-frame-pointer" to their cflags when use -O, -O2, -O3 or-Os?

I've found it in gcc onlinedocs page:
-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.

On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage.

Enabled at levels -O, -O2, -O3, -Os.
it seems "-fomit-frame-pointer" is absolutely redundant if you use "-O?"
Top
crazycat
l33t
l33t
User avatar
Posts: 838
Joined: Tue Aug 26, 2003 6:04 pm
Location: Hamburg, Germany

  • Quote

Post by crazycat » Fri Oct 21, 2005 9:25 am

-march=athlon64 uses -mfpmath=sse by default (mfpmath=i387 generally has better performance on my system) , -fomit-frame-pointer does nothing on amd64 and people who are using it just don't know much about cflags. I don't know exactly about -pipe but i think it prevents gcc to use temporary files and pipes instead.
Top
Locked

117 posts
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next

Return to “Gentoo on AMD64”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy