View previous topic :: View next topic |
Author |
Message |
gian Apprentice
Joined: 26 Jul 2004 Posts: 212 Location: Europe
|
Posted: Wed Dec 01, 2004 9:13 pm Post subject: fast math problems |
|
|
I did get strange results using the --fast-math option with our home made circuit simulator .... (we use some lapack, blas and nag-like functions) |
|
Back to top |
|
|
toofastforyahuh Apprentice
Joined: 18 May 2004 Posts: 165
|
Posted: Tue Dec 07, 2004 8:39 pm Post subject: |
|
|
Just to be fair I just ran another datapoint. Since I also built my box for video, it's interesting to see what kind of performance I get.
I compiled mjpegtools 3 times and tested each with kino.
This is realistic since I actually use kino to do this stuff and building a DVD with mpeg2enc takes basically all day, so any performance difference is noticeable and welcome.
I took a 30.33 second DV file and ran it through kino's export MPEG tool.
I set it to type 8 (DVD) and used the following flags:
Code: |
mpeg2enc -v 0 -4 2 -2 1 -q 3 -D 10 -H -c -b 8000
mp2enc -v 0 -r 48000 -b 224
|
These are decent quality settings for video and passable settings for audio.
With mjpegtools compiled with the following flags:
Code: |
-march=k8 -O2 -pipe: 3:44 encode time
-march=k8 -O3 -pipe: 3:39 encode time
-march=k8 -O3 -funroll-all-loops -fpeel-loops -ftracer -pipe: 3:28 encode time
|
I repeated the latter run again and got exactly the same result, so I think the margin of error is low.
So here, too, my nitro CFLAGS actually do provide a benefit (about 7%). Again, since DVD encoding with mpeg2enc is an all-day affair I think here 7% savings will add up quickly to a noticeable wall-clock savings.
But I still advise caution for the rest of the system. Don't go gung-ho on CFLAGS for everything. It can backfire if you aren't careful. |
|
Back to top |
|
|
toofastforyahuh Apprentice
Joined: 18 May 2004 Posts: 165
|
Posted: Wed Dec 08, 2004 8:29 am Post subject: |
|
|
Aha! And a good case where it seems to backfire is gzip.
For me gzip is consistently faster with -O2 than with -O3 or my nitro flags.
So there you have it. No universal answer except: stay sane. Build your system sanely and safely, then only focus on the few programs you really care about and do some careful benchmark profiling. Don't add random CFLAGS just because they sound good. You'll only give the "Gentoo is Rice" crowd a laugh if it blows up on you. |
|
Back to top |
|
|
barry Apprentice
Joined: 01 May 2002 Posts: 170 Location: UK
|
Posted: Wed Dec 08, 2004 12:51 pm Post subject: |
|
|
You should avoid -funroll-all-loops generally. The GCC documentation advises against using it. -funroll-loops has more chance of being beneficial, but shouldn't be used in CFLAGS because it can break packages and may slow down just as much software as it speeds up.
-O2 together with -fweb and -frename-registers is probably the safest bet. |
|
Back to top |
|
|
toofastforyahuh Apprentice
Joined: 18 May 2004 Posts: 165
|
Posted: Wed Dec 08, 2004 4:45 pm Post subject: |
|
|
Check the benchmarks above. So far, for the programs I've cared about, I've seen only a positive trend with unrolling loops, and with -O3, so it's unlikely -O2 -frename-registers is better still, but you're welcome to try xmame and mjpegtools to see if you can rise above my nitro flags. Again, I don't use crazy flags for the whole system. I refuse to compromise stability. Benchmarks on specific programs, though, definitely point toward some gravy to be had. So far that's all I can call it, though---gravy. The real meal is in going amd64 in the first place. |
|
Back to top |
|
|
barry Apprentice
Joined: 01 May 2002 Posts: 170 Location: UK
|
Posted: Wed Dec 08, 2004 10:27 pm Post subject: |
|
|
The only difference between -O2 and -O3 is the inclusion of -fweb, frename-registers and -finline-functions. -funroll-loops can certainly help speed up certain applications, so it's a good idea to use it on those, but definitely not system-wide. The same goes for -ffast-math. |
|
Back to top |
|
|
zinion Guru
Joined: 27 Oct 2004 Posts: 541 Location: Ruhgebietshausen
|
Posted: Fri Dec 10, 2004 10:37 am Post subject: |
|
|
What exactly can happen if I use them system-wide? Because I do it and my system runs really fine the last time... _________________ Es ist schön und warm
hier im Gentoo-Land |
|
Back to top |
|
|
toofastforyahuh Apprentice
Joined: 18 May 2004 Posts: 165
|
Posted: Fri Dec 10, 2004 10:59 am Post subject: |
|
|
Are you sure? How do you know? I could play tempest on xmame if I compiled it with -O3 on my old SGI but the vector coordinates and colors were corrupted! Just because you haven't had a failure yet doesn't mean you don't have one just waiting, lurking in the shadows.
Using wonky CFLAGS can lead to disaster if you aren't careful.
Besides, it isn't always a win. -O2 is sometimes better (as I just proved with gzip), and if you believe acovea sometimes -O1 with a few flags is better than anything else. If you just use some tricked out quirky flags blindly you're just gambling that all programs will 1. compile at all, 2. compile correctly, 3. work better. All programs do not behave the same.
It is much wiser to build a stable system with safer CFLAGS and then profile your usage.
It's not worth your time to build a 3% faster alsamixer and risk its stability, but it definitely is worth your time to understand how a bottleneck program behaves. For example, if you spend a day just waiting for mpeg2enc to finish encoding a DVD for you. A 7% speedup makes sense there because you really need it. |
|
Back to top |
|
|
NismoC32 Apprentice
Joined: 07 Apr 2003 Posts: 214
|
Posted: Fri Dec 10, 2004 11:53 am Post subject: |
|
|
What CFLAG should I use during Installastion og Gentoo -x86_64 ?
Having problem getting it installed.
It stops on "gettext" if i use stage 1 and "cracklib" on stage 2 and "ucl" on stage 3.
Dont't know if it has someting to do with CFLAG setting or other things though.
I have used this settings:
"-02 -march=athlon64 -ftrace -fprefetch-loop-arrays -pipe". |
|
Back to top |
|
|
toofastforyahuh Apprentice
Joined: 18 May 2004 Posts: 165
|
Posted: Fri Dec 10, 2004 5:30 pm Post subject: |
|
|
Start with -02 -march=athlon64 -pipe and see if that helps. |
|
Back to top |
|
|
superwutze Tux's lil' helper
Joined: 09 Dec 2004 Posts: 137 Location: Europe/Vienna
|
Posted: Fri Dec 10, 2004 5:40 pm Post subject: |
|
|
i did a stage1 with 2004.3 and used this settings from start on:
CFLAGS="-O3 -march=opteron -funroll-loops -pipe -ftracer"
there was no problem and still there is none. _________________ bill who? micro what? |
|
Back to top |
|
|
borkdox Tux's lil' helper
Joined: 16 Jan 2004 Posts: 123
|
Posted: Sat Dec 11, 2004 5:03 pm Post subject: |
|
|
NismoC32 wrote: | What CFLAG should I use during Installastion og Gentoo -x86_64 ?
Having problem getting it installed.
It stops on "gettext" if i use stage 1 and "cracklib" on stage 2 and "ucl" on stage 3.
Dont't know if it has someting to do with CFLAG setting or other things though.
I have used this settings:
"-02 -march=athlon64 -ftrace -fprefetch-loop-arrays -pipe". |
I installed gentoo with "-march=athlon64 -O2 -pipe -fweb -frename-registers" and it was very smooth install overall. |
|
Back to top |
|
|
teilo Apprentice
Joined: 20 Jun 2003 Posts: 276 Location: Minneapolis, MN
|
Posted: Sat Dec 11, 2004 6:47 pm Post subject: |
|
|
elocal wrote: | NismoC32 wrote: | What CFLAG should I use during Installastion og Gentoo -x86_64 ?
Having problem getting it installed.
It stops on "gettext" if i use stage 1 and "cracklib" on stage 2 and "ucl" on stage 3.
Dont't know if it has someting to do with CFLAG setting or other things though.
I have used this settings:
"-02 -march=athlon64 -ftrace -fprefetch-loop-arrays -pipe". |
I installed gentoo with "-march=athlon64 -O2 -pipe -fweb -frename-registers" and it was very smooth install overall. |
I use the same, but with -ftracer also. Whole system is compiled with the same flags (except for when they are filtered, obviously). Everything runs smoothly. Everything is stable. _________________ Teilo who is called Teilo |
|
Back to top |
|
|
NismoC32 Apprentice
Joined: 07 Apr 2003 Posts: 214
|
Posted: Mon Dec 13, 2004 7:41 am Post subject: |
|
|
I had to remove all CFLAGS eccept -02 and -pipe and skip emerge sync during installation to get it to work.
After reboot I could put inn more flags, I now have:
-O2 -march=athlon64 -fweb -funroll-loops -pipe and it works OK.
I have synced the portage tree and emerge all new packages.
And I now have a smooth working fast speeding fine tuned monster file server...
Thanks guys... |
|
Back to top |
|
|
cybrjackle Apprentice
Joined: 09 Jan 2003 Posts: 248 Location: USA
|
Posted: Wed Feb 02, 2005 3:09 am Post subject: |
|
|
Code: | CFLAGS="-march=athlon64 -O2 -fweb -frename-registers -ftracer -pipe" |
Been using those for awhile now and the box stays pretty sane |
|
Back to top |
|
|
silverpig Tux's lil' helper
Joined: 10 Dec 2003 Posts: 143 Location: Vancouver BC
|
Posted: Wed Feb 02, 2005 6:08 am Post subject: |
|
|
-march=athlon64 -02 -pipe -fomit-frame-pointer
I did the install initially with -03 and almost everything worked. I couldn't compile firefox though. I changed to -02 and it emerged just fine. _________________ 'Cause I can. |
|
Back to top |
|
|
WuppieCat n00b
Joined: 17 Oct 2002 Posts: 38 Location: Cheshire, UK
|
Posted: Thu Apr 28, 2005 2:08 pm Post subject: |
|
|
lavish wrote: | Trevoke wrote: | I have; if there are no differences, then why keep several separate options? |
U have? lol eheh
There will be some differences in gcc >=4.0 |
I thought I had read the same ie that gcc-4.0 would differentiate between the four potential -march cflags for amd x86-64 chips. However when I tried to find the quote I couldn't at all - in fact everything, including the release docs for gcc-4.0, points to there being no difference between these still. Is this something that has slipped to gcc-4.1 or have I just imagined it?
What also makes me curious about this (now I have considered it) is that there is no difference in the feature set between the various chips, they are differentiated by having additional HT links and different cache sizes etc. In fact there is more difference between chip revisions ie the new venice cores having SSE3/Enhanced Branch Prediction etc. Anyone know the answer to this conundrum? |
|
Back to top |
|
|
hvengel Guru
Joined: 19 Sep 2004 Posts: 515
|
Posted: Sat Apr 30, 2005 3:01 am Post subject: |
|
|
The AMD document also recommends the use of thier AMD Core Math Library (ACML). Just for giggles I tried emerge -pv acml and it is masked by ~amd64. I have some stuff that I run that I think might benifit from faster math libraries. Has anyone unmasked this? Does it help? Do you need to set a use flag? |
|
Back to top |
|
|
ozbird Apprentice
Joined: 21 Oct 2003 Posts: 185
|
Posted: Sat Apr 30, 2005 6:42 am Post subject: |
|
|
silverpig wrote: | -march=athlon64 -02 -pipe -fomit-frame-pointer
I did the install initially with -03 and almost everything worked. I couldn't compile firefox though. I changed to -02 and it emerged just fine. |
I use CFLAGS="-march=athlon64 -O3 -ftracer -pipe" and everything, including Firefox, works fine.
If you're after a benchmark challenge, try the new Povray benchmark http://www.haveland.com/index.htm?povbench/index.php
There are some AMD64 users who claim times under 5 minutes; that's four times faster than I've been able to achieve (20m 46s)
Even when using the same optimisations that some provided, I haven't beaten that time. I suspect shenanigans... |
|
Back to top |
|
|
hvengel Guru
Joined: 19 Sep 2004 Posts: 515
|
Posted: Sat Apr 30, 2005 6:52 pm Post subject: |
|
|
I have one application, libpano12, that will seg fault with anything other than -O0. But everything else on my systems has been built with -O2 without problems. Other amd64 users have also reported the same problem with libpano12 on Gentoo and other distros. Just an FYI that the correct optimizations are more dependant on the specific piece of software than anything else. |
|
Back to top |
|
|
Joffer Guru
Joined: 10 Sep 2002 Posts: 585 Location: Arendal, Norway
|
Posted: Sun Aug 07, 2005 10:24 pm Post subject: |
|
|
Reading this tread have made me see I'm probably a lucky guy. What I mean is I've compiled my entire system with an insane amount of CFLAGS and it is _stable_ : Code: | CFLAGS="-O3 -march=athlon64 -mtune=athlon64 gftracer -fprefetch-loop-arrays -pipe -funroll-loops -mfpmath=sse -fweb -frename-registers -fmove-all-movables -fpeel-loops -freduce-all-givs -mno-align-stringops -minline-all-stringops -mno-push-args -momit-leaf-frame-pointer -fomit-frame-pointer" |
I am however going about now and cleaning up this mess, and choose a more sane CFLAG and later try more tweaking on "single" application where I actually can measure a difference, like divx/mp3/ogg encoding and such, and not my entire system.
Your discussion made me think about using these cflags: Code: | CFLAGS="-march="athlon64 -mtune=athlon64 -O2 -pipe -fweb -frename-registers -ftracer" |
_________________ As of April 2006 - Athlon64 X2 4200+ 1GB RAM - amd64-2006.0 profiled system with portage 2.1_preX, ck-sources-2.6.16, glibc-2.4-r1 (overlay w/-Bdirect&-hashvals), binutils-2.16.91.0.6 (overlay), gcc-4.1, Xorg 7 |
|
Back to top |
|
|
nxsty Veteran
Joined: 23 Jun 2004 Posts: 1556 Location: .se
|
Posted: Mon Aug 08, 2005 7:53 am Post subject: |
|
|
hvengel wrote: | The AMD document also recommends the use of thier AMD Core Math Library (ACML). Just for giggles I tried emerge -pv acml and it is masked by ~amd64. I have some stuff that I run that I think might benifit from faster math libraries. Has anyone unmasked this? Does it help? Do you need to set a use flag? |
Check out this bug:
https://bugs.gentoo.org/show_bug.cgi?id=100289 |
|
Back to top |
|
|
lightvhawk0 Guru
Joined: 07 Nov 2003 Posts: 388
|
Posted: Sat Aug 20, 2005 7:31 am Post subject: |
|
|
I think the reason why most people with crazy flags get away with them is because many packages filter out "Unstable" CFLAGS
also is there an overlay for the patched version of glibc _________________ If God has made us in his image, we have returned him the favor. - Voltaire |
|
Back to top |
|
|
revertex l33t
Joined: 23 Apr 2003 Posts: 806
|
Posted: Fri Oct 21, 2005 9:03 am Post subject: |
|
|
@Joffer:
"--march=athlon64" doesn't include "mfpmath=sse"?
and why does ppl still add "-fomit-frame-pointer" to their cflags when use -O, -O2, -O3 or-Os?
I've found it in gcc onlinedocs page:
Quote: | -fomit-frame-pointer
Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.
On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage.
Enabled at levels -O, -O2, -O3, -Os. |
it seems "-fomit-frame-pointer" is absolutely redundant if you use "-O?" |
|
Back to top |
|
|
crazycat l33t
Joined: 26 Aug 2003 Posts: 838 Location: Hamburg, Germany
|
Posted: Fri Oct 21, 2005 9:25 am Post subject: |
|
|
-march=athlon64 uses -mfpmath=sse by default (mfpmath=i387 generally has better performance on my system) , -fomit-frame-pointer does nothing on amd64 and people who are using it just don't know much about cflags. I don't know exactly about -pipe but i think it prevents gcc to use temporary files and pipes instead. |
|
Back to top |
|
|
|