Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[gcc 3.4] AMD's Recommended CFLAGS
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
gian
Apprentice
Apprentice


Joined: 26 Jul 2004
Posts: 212
Location: Europe

PostPosted: Wed Dec 01, 2004 9:13 pm    Post subject: fast math problems Reply with quote

I did get strange results using the --fast-math option with our home made circuit simulator .... (we use some lapack, blas and nag-like functions)
Back to top
View user's profile Send private message
toofastforyahuh
Apprentice
Apprentice


Joined: 18 May 2004
Posts: 165

PostPosted: Tue Dec 07, 2004 8:39 pm    Post subject: Reply with quote

Just to be fair I just ran another datapoint. Since I also built my box for video, it's interesting to see what kind of performance I get.
I compiled mjpegtools 3 times and tested each with kino.

This is realistic since I actually use kino to do this stuff and building a DVD with mpeg2enc takes basically all day, so any performance difference is noticeable and welcome.

I took a 30.33 second DV file and ran it through kino's export MPEG tool.
I set it to type 8 (DVD) and used the following flags:
Code:

mpeg2enc -v 0 -4 2 -2 1 -q 3 -D 10 -H -c -b 8000
mp2enc -v 0 -r 48000 -b 224


These are decent quality settings for video and passable settings for audio.

With mjpegtools compiled with the following flags:
Code:

-march=k8 -O2 -pipe:     3:44 encode time
-march=k8 -O3 -pipe:    3:39 encode time
-march=k8 -O3 -funroll-all-loops -fpeel-loops -ftracer -pipe:  3:28 encode time


I repeated the latter run again and got exactly the same result, so I think the margin of error is low.

So here, too, my nitro CFLAGS actually do provide a benefit (about 7%). Again, since DVD encoding with mpeg2enc is an all-day affair I think here 7% savings will add up quickly to a noticeable wall-clock savings.

But I still advise caution for the rest of the system. Don't go gung-ho on CFLAGS for everything. It can backfire if you aren't careful.
Back to top
View user's profile Send private message
toofastforyahuh
Apprentice
Apprentice


Joined: 18 May 2004
Posts: 165

PostPosted: Wed Dec 08, 2004 8:29 am    Post subject: Reply with quote

Aha! And a good case where it seems to backfire is gzip.
For me gzip is consistently faster with -O2 than with -O3 or my nitro flags.
So there you have it. No universal answer except: stay sane. Build your system sanely and safely, then only focus on the few programs you really care about and do some careful benchmark profiling. Don't add random CFLAGS just because they sound good. You'll only give the "Gentoo is Rice" crowd a laugh if it blows up on you.
Back to top
View user's profile Send private message
barry
Apprentice
Apprentice


Joined: 01 May 2002
Posts: 170
Location: UK

PostPosted: Wed Dec 08, 2004 12:51 pm    Post subject: Reply with quote

You should avoid -funroll-all-loops generally. The GCC documentation advises against using it. -funroll-loops has more chance of being beneficial, but shouldn't be used in CFLAGS because it can break packages and may slow down just as much software as it speeds up.

-O2 together with -fweb and -frename-registers is probably the safest bet.
Back to top
View user's profile Send private message
toofastforyahuh
Apprentice
Apprentice


Joined: 18 May 2004
Posts: 165

PostPosted: Wed Dec 08, 2004 4:45 pm    Post subject: Reply with quote

Check the benchmarks above. So far, for the programs I've cared about, I've seen only a positive trend with unrolling loops, and with -O3, so it's unlikely -O2 -frename-registers is better still, but you're welcome to try xmame and mjpegtools to see if you can rise above my nitro flags. Again, I don't use crazy flags for the whole system. I refuse to compromise stability. Benchmarks on specific programs, though, definitely point toward some gravy to be had. So far that's all I can call it, though---gravy. The real meal is in going amd64 in the first place.
Back to top
View user's profile Send private message
barry
Apprentice
Apprentice


Joined: 01 May 2002
Posts: 170
Location: UK

PostPosted: Wed Dec 08, 2004 10:27 pm    Post subject: Reply with quote

The only difference between -O2 and -O3 is the inclusion of -fweb, frename-registers and -finline-functions. -funroll-loops can certainly help speed up certain applications, so it's a good idea to use it on those, but definitely not system-wide. The same goes for -ffast-math.
Back to top
View user's profile Send private message
zinion
Guru
Guru


Joined: 27 Oct 2004
Posts: 541
Location: Ruhgebietshausen

PostPosted: Fri Dec 10, 2004 10:37 am    Post subject: Reply with quote

What exactly can happen if I use them system-wide? Because I do it and my system runs really fine the last time...
_________________
Es ist schön und warm
hier im Gentoo-Land
Back to top
View user's profile Send private message
toofastforyahuh
Apprentice
Apprentice


Joined: 18 May 2004
Posts: 165

PostPosted: Fri Dec 10, 2004 10:59 am    Post subject: Reply with quote

Are you sure? How do you know? I could play tempest on xmame if I compiled it with -O3 on my old SGI but the vector coordinates and colors were corrupted! Just because you haven't had a failure yet doesn't mean you don't have one just waiting, lurking in the shadows.

Using wonky CFLAGS can lead to disaster if you aren't careful.

Besides, it isn't always a win. -O2 is sometimes better (as I just proved with gzip), and if you believe acovea sometimes -O1 with a few flags is better than anything else. If you just use some tricked out quirky flags blindly you're just gambling that all programs will 1. compile at all, 2. compile correctly, 3. work better. All programs do not behave the same.

It is much wiser to build a stable system with safer CFLAGS and then profile your usage.
It's not worth your time to build a 3% faster alsamixer and risk its stability, but it definitely is worth your time to understand how a bottleneck program behaves. For example, if you spend a day just waiting for mpeg2enc to finish encoding a DVD for you. A 7% speedup makes sense there because you really need it.
Back to top
View user's profile Send private message
NismoC32
Apprentice
Apprentice


Joined: 07 Apr 2003
Posts: 214

PostPosted: Fri Dec 10, 2004 11:53 am    Post subject: Reply with quote

What CFLAG should I use during Installastion og Gentoo -x86_64 ?
Having problem getting it installed.
It stops on "gettext" if i use stage 1 and "cracklib" on stage 2 and "ucl" on stage 3.

Dont't know if it has someting to do with CFLAG setting or other things though.

I have used this settings:
"-02 -march=athlon64 -ftrace -fprefetch-loop-arrays -pipe".
Back to top
View user's profile Send private message
toofastforyahuh
Apprentice
Apprentice


Joined: 18 May 2004
Posts: 165

PostPosted: Fri Dec 10, 2004 5:30 pm    Post subject: Reply with quote

Start with -02 -march=athlon64 -pipe and see if that helps.
Back to top
View user's profile Send private message
superwutze
Tux's lil' helper
Tux's lil' helper


Joined: 09 Dec 2004
Posts: 137
Location: Europe/Vienna

PostPosted: Fri Dec 10, 2004 5:40 pm    Post subject: Reply with quote

i did a stage1 with 2004.3 and used this settings from start on:
CFLAGS="-O3 -march=opteron -funroll-loops -pipe -ftracer"
there was no problem and still there is none.
_________________
bill who? micro what?
Back to top
View user's profile Send private message
borkdox
Tux's lil' helper
Tux's lil' helper


Joined: 16 Jan 2004
Posts: 123

PostPosted: Sat Dec 11, 2004 5:03 pm    Post subject: Reply with quote

NismoC32 wrote:
What CFLAG should I use during Installastion og Gentoo -x86_64 ?
Having problem getting it installed.
It stops on "gettext" if i use stage 1 and "cracklib" on stage 2 and "ucl" on stage 3.

Dont't know if it has someting to do with CFLAG setting or other things though.

I have used this settings:
"-02 -march=athlon64 -ftrace -fprefetch-loop-arrays -pipe".


I installed gentoo with "-march=athlon64 -O2 -pipe -fweb -frename-registers" and it was very smooth install overall.
Back to top
View user's profile Send private message
teilo
Apprentice
Apprentice


Joined: 20 Jun 2003
Posts: 276
Location: Minneapolis, MN

PostPosted: Sat Dec 11, 2004 6:47 pm    Post subject: Reply with quote

elocal wrote:
NismoC32 wrote:
What CFLAG should I use during Installastion og Gentoo -x86_64 ?
Having problem getting it installed.
It stops on "gettext" if i use stage 1 and "cracklib" on stage 2 and "ucl" on stage 3.

Dont't know if it has someting to do with CFLAG setting or other things though.

I have used this settings:
"-02 -march=athlon64 -ftrace -fprefetch-loop-arrays -pipe".


I installed gentoo with "-march=athlon64 -O2 -pipe -fweb -frename-registers" and it was very smooth install overall.


I use the same, but with -ftracer also. Whole system is compiled with the same flags (except for when they are filtered, obviously). Everything runs smoothly. Everything is stable.
_________________
Teilo who is called Teilo
Back to top
View user's profile Send private message
NismoC32
Apprentice
Apprentice


Joined: 07 Apr 2003
Posts: 214

PostPosted: Mon Dec 13, 2004 7:41 am    Post subject: Reply with quote

I had to remove all CFLAGS eccept -02 and -pipe and skip emerge sync during installation to get it to work.
After reboot I could put inn more flags, I now have:
-O2 -march=athlon64 -fweb -funroll-loops -pipe and it works OK.
I have synced the portage tree and emerge all new packages.

And I now have a smooth working fast speeding fine tuned monster file server...

Thanks guys...
Back to top
View user's profile Send private message
cybrjackle
Apprentice
Apprentice


Joined: 09 Jan 2003
Posts: 248
Location: USA

PostPosted: Wed Feb 02, 2005 3:09 am    Post subject: Reply with quote

Code:
CFLAGS="-march=athlon64 -O2 -fweb -frename-registers -ftracer -pipe"


Been using those for awhile now and the box stays pretty sane :lol:
Back to top
View user's profile Send private message
silverpig
Tux's lil' helper
Tux's lil' helper


Joined: 10 Dec 2003
Posts: 143
Location: Vancouver BC

PostPosted: Wed Feb 02, 2005 6:08 am    Post subject: Reply with quote

-march=athlon64 -02 -pipe -fomit-frame-pointer

I did the install initially with -03 and almost everything worked. I couldn't compile firefox though. I changed to -02 and it emerged just fine.
_________________
'Cause I can.
Back to top
View user's profile Send private message
WuppieCat
n00b
n00b


Joined: 17 Oct 2002
Posts: 38
Location: Cheshire, UK

PostPosted: Thu Apr 28, 2005 2:08 pm    Post subject: Reply with quote

lavish wrote:
Trevoke wrote:
I have; if there are no differences, then why keep several separate options?


U have? lol eheh
There will be some differences in gcc >=4.0


I thought I had read the same ie that gcc-4.0 would differentiate between the four potential -march cflags for amd x86-64 chips. However when I tried to find the quote I couldn't at all - in fact everything, including the release docs for gcc-4.0, points to there being no difference between these still. Is this something that has slipped to gcc-4.1 or have I just imagined it?
What also makes me curious about this (now I have considered it) is that there is no difference in the feature set between the various chips, they are differentiated by having additional HT links and different cache sizes etc. In fact there is more difference between chip revisions ie the new venice cores having SSE3/Enhanced Branch Prediction etc. Anyone know the answer to this conundrum?
Back to top
View user's profile Send private message
hvengel
Guru
Guru


Joined: 19 Sep 2004
Posts: 515

PostPosted: Sat Apr 30, 2005 3:01 am    Post subject: Reply with quote

The AMD document also recommends the use of thier AMD Core Math Library (ACML). Just for giggles I tried emerge -pv acml and it is masked by ~amd64. I have some stuff that I run that I think might benifit from faster math libraries. Has anyone unmasked this? Does it help? Do you need to set a use flag?
Back to top
View user's profile Send private message
ozbird
Apprentice
Apprentice


Joined: 21 Oct 2003
Posts: 185

PostPosted: Sat Apr 30, 2005 6:42 am    Post subject: Reply with quote

silverpig wrote:
-march=athlon64 -02 -pipe -fomit-frame-pointer

I did the install initially with -03 and almost everything worked. I couldn't compile firefox though. I changed to -02 and it emerged just fine.


I use CFLAGS="-march=athlon64 -O3 -ftracer -pipe" and everything, including Firefox, works fine.

If you're after a benchmark challenge, try the new Povray benchmark http://www.haveland.com/index.htm?povbench/index.php
There are some AMD64 users who claim times under 5 minutes; that's four times faster than I've been able to achieve (20m 46s)
Even when using the same optimisations that some provided, I haven't beaten that time. I suspect shenanigans...
Back to top
View user's profile Send private message
hvengel
Guru
Guru


Joined: 19 Sep 2004
Posts: 515

PostPosted: Sat Apr 30, 2005 6:52 pm    Post subject: Reply with quote

I have one application, libpano12, that will seg fault with anything other than -O0. But everything else on my systems has been built with -O2 without problems. Other amd64 users have also reported the same problem with libpano12 on Gentoo and other distros. Just an FYI that the correct optimizations are more dependant on the specific piece of software than anything else.
Back to top
View user's profile Send private message
Joffer
Guru
Guru


Joined: 10 Sep 2002
Posts: 585
Location: Arendal, Norway

PostPosted: Sun Aug 07, 2005 10:24 pm    Post subject: Reply with quote

Reading this tread have made me see I'm probably a lucky guy. What I mean is I've compiled my entire system with an insane amount of CFLAGS and it is _stable_ 8):
Code:
CFLAGS="-O3 -march=athlon64 -mtune=athlon64 gftracer -fprefetch-loop-arrays -pipe -funroll-loops -mfpmath=sse -fweb -frename-registers -fmove-all-movables -fpeel-loops -freduce-all-givs -mno-align-stringops -minline-all-stringops -mno-push-args -momit-leaf-frame-pointer -fomit-frame-pointer"

I am however going about now and cleaning up this mess, and choose a more sane CFLAG and later try more tweaking on "single" application where I actually can measure a difference, like divx/mp3/ogg encoding and such, and not my entire system.

Your discussion made me think about using these cflags:
Code:
CFLAGS="-march="athlon64 -mtune=athlon64 -O2 -pipe -fweb -frename-registers -ftracer"

_________________
As of April 2006 - Athlon64 X2 4200+ 1GB RAM - amd64-2006.0 profiled system with portage 2.1_preX, ck-sources-2.6.16, glibc-2.4-r1 (overlay w/-Bdirect&-hashvals), binutils-2.16.91.0.6 (overlay), gcc-4.1, Xorg 7
Back to top
View user's profile Send private message
nxsty
Veteran
Veteran


Joined: 23 Jun 2004
Posts: 1556
Location: .se

PostPosted: Mon Aug 08, 2005 7:53 am    Post subject: Reply with quote

hvengel wrote:
The AMD document also recommends the use of thier AMD Core Math Library (ACML). Just for giggles I tried emerge -pv acml and it is masked by ~amd64. I have some stuff that I run that I think might benifit from faster math libraries. Has anyone unmasked this? Does it help? Do you need to set a use flag?


Check out this bug:
https://bugs.gentoo.org/show_bug.cgi?id=100289
Back to top
View user's profile Send private message
lightvhawk0
Guru
Guru


Joined: 07 Nov 2003
Posts: 388

PostPosted: Sat Aug 20, 2005 7:31 am    Post subject: Reply with quote

I think the reason why most people with crazy flags get away with them is because many packages filter out "Unstable" CFLAGS
also is there an overlay for the patched version of glibc
_________________
If God has made us in his image, we have returned him the favor. - Voltaire
Back to top
View user's profile Send private message
revertex
l33t
l33t


Joined: 23 Apr 2003
Posts: 806

PostPosted: Fri Oct 21, 2005 9:03 am    Post subject: Reply with quote

@Joffer:

"--march=athlon64" doesn't include "mfpmath=sse"?

and why does ppl still add "-fomit-frame-pointer" to their cflags when use -O, -O2, -O3 or-Os?

I've found it in gcc onlinedocs page:

Quote:
-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.

On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage.

Enabled at levels -O, -O2, -O3, -Os.


it seems "-fomit-frame-pointer" is absolutely redundant if you use "-O?"
Back to top
View user's profile Send private message
crazycat
l33t
l33t


Joined: 26 Aug 2003
Posts: 838
Location: Hamburg, Germany

PostPosted: Fri Oct 21, 2005 9:25 am    Post subject: Reply with quote

-march=athlon64 uses -mfpmath=sse by default (mfpmath=i387 generally has better performance on my system) , -fomit-frame-pointer does nothing on amd64 and people who are using it just don't know much about cflags. I don't know exactly about -pipe but i think it prevents gcc to use temporary files and pipes instead.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Goto page Previous  1, 2, 3, 4, 5  Next
Page 2 of 5

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum