Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Acovea analysis results against real world programs
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
Twist
Guru
Guru


Joined: 03 Jan 2003
Posts: 414
Location: San Diego

PostPosted: Sun Dec 26, 2004 12:55 pm    Post subject: Acovea analysis results against real world programs Reply with quote

Well, it's not very good. I have been testing my acovea flag results (posted here ) against more traditional "optimized" CFLAGS. The results have not argued strongly in favor of using Acovea based recommendations.

My system is as follows:
Athlon64 3400+ w/1GB memory
Gentoo 2004.3 stable, with exceptions noted
gcc-3.4.3, glibc-2.3.4.20040808-r1


For each test, I would run the given app against sample data three times with my "normal" CFLAGS, then recompile and run it three times against the acovea CFLAGS, averaging the results. No other significant load existed at the time on the machine. No window system was running (GDM was and therefor xorg, as were my standard services like NFS and Samba, but they weren't actively doing anything). The actual tests were performed from an SSH session from another machine.

My original acovea results:
Code:

 Score |  So?  | Switch (annotation)
------------------------------------------------------------------------------
  35.8 |  Yes  | -minline-all-stringops
  32.6 |  Yes  | -mno-push-args
  31.8 | Maybe | -finline-functions (-O3)
  31.8 |  Yes  | -fexpensive-optimizations (-O2)
  30.4 | Maybe | -fschedule-insns (-O2)
  30.3 | Maybe | -fpeel-loops
  30.1 |  Yes  | -fno-if-conversion2 (! -O1)
  29.8 |  Yes  | -fno-defer-pop (! -O1)
  29.7 |  Yes  | -fcse-skip-blocks (-O2)
  29.1 | Maybe | -frerun-loop-opt (-O2)
  28.3 |  Yes  | -fsched-interblock (-O2 GCC 3.3)
  28.2 |  Yes  | -foptimize-sibling-calls (-O2)
  27.4 |  Yes  | -falign-jumps (-O2 GCC 3.3)
  27.4 | Maybe | -fstrict-aliasing (-O2)
  26.9 | Maybe | -fno-merge-constants (! -O1)
  26.5 | Maybe | -finline-limit
  26.1 | Maybe | -falign-functions
  25.7 | Maybe | -fno-delayed-branch (! -O1)
  25.4 | Maybe | -fpeephole2 (-O2)
  25.4 | Maybe | -freorder-functions (-O2 GCC 3.3)
  25.0 | Maybe | -fno-signaling-nans (fast math)
  25.0 | Maybe | -freorder-blocks (-O2)
  24.7 |   No  | -fstrength-reduce (-O2)
  24.4 | Maybe | -frerun-cse-after-loop (-O2)
  24.3 |  Yes  | -fmove-all-movables
  24.2 | Maybe | -fcse-follow-jumps (-O2)
  23.6 | Maybe | -fschedule-insns2 (-O2)
  23.2 | Maybe | -fno-math-errno (fast math)
  22.8 |  Yes  | -fsched-spec (-O2 GCC 3.3)
  22.8 | Maybe | -maccumulate-outgoing-args
  22.5 | Maybe | -fdelete-null-pointer-checks (-O2)
  22.5 | Maybe | -falign-labels (-O2 GCC 3.3)
  22.4 | Maybe | -fno-thread-jumps (! -O1)
  22.3 | Maybe | -mieee-fp
  22.2 | Maybe | -ftracer
  22.0 | Maybe | -mno-align-stringops
  21.4 | Maybe | -fno-crossjumping (! -O1)
  21.3 | Maybe | -fno-cprop-registers (! -O1)
  21.3 |  Yes  | -funit-at-a-time
  21.1 | Maybe | -frename-registers (-O3)
  20.9 | Maybe | -ffinite-math-only (fast math)
  20.8 | Maybe | -fno-trapping-math (fast math)
  20.6 | Maybe | -funswitch-loops
  20.4 |   No  | -fweb
  20.2 | Maybe | -fcaller-saves (-O2)
  20.1 |   No  | -falign-loops (-O2 GCC 3.3)
  19.9 |   No  | -fgcse (-O2)
  19.1 |   No  | -fno-omit-frame-pointer (! -O1)
  17.3 |   No  | -funsafe-math-optimizations (fast math)
  17.1 |   No  | -fno-if-conversion (! -O1)
  15.6 |   No  | -fregmove (-O2)
  15.4 | Maybe | -fbranch-target-load-optimize
  15.1 |   No  | -fprefetch-loop-arrays
  13.6 |   No  | -fnew-ra
  13.4 |   No  | -fno-inline
  12.2 |   No  | -freduce-all-givs
  12.2 |   No  | -funroll-all-loops
  11.5 |   No  | -fforce-mem (-O2)
   8.7 |   No  | -funroll-loops
   5.2 |   No  | -fno-loop-optimize (! -O1)
   4.6 |   No  | -ffloat-store
   0.0 |   No  | -fno-guess-branch-probability (! -O1)
   0.0 |   No  | -fbranch-target-load-optimize2
   0.0 |   No  | -mfpmath=387
   0.0 |   No  | -mfpmath=sse
   0.0 |   No  | -mfpmath=sse,387


My "normal" optimized CFLAGS:

Code:

CFLAGS="-O3 -march=athlon64 -mtune=athlon64 -ftracer -pipe"


CFLAGS recommended by acovea, see note below:
Code:

CFLAGS="-O? -march=athlon64 -mtune=athlon64 -minline-all-stringops -mno-push-args -fexpensive-optimizations -fno-if-conversion2 -fno-defer-pop -fcse-skip-blocks -fsched-interblock -foptimize-sibling-calls -falign-jumps -fno-strength-reduce  -fmove-all-movables -fsched-spec -funit-at-a-time -fno-web -fno-align-loops -fno-gcse -fomit-frame-pointer -fno-unsafe-math-optimizations -fif-conversion -fno-regmove -fno-prefetch-loop-arrays -fno-new-ra -finline -fno-reduce-all-givs -fno-unroll-all-loops -fno-force-mem -fno-unroll-loops -floop-optimize -fno-float-store -fguess-branch-probability -fno-branch-target-load-optimize2"


Acovea "alt" set:
Code:

CFLAGS="-O3 -march=athlon64 -mtune=athlon64 -minline-all-stringops -mno-push-args -fno-if-conversion2 -fno-defer-pop -fno-strength-reduce -fmove-all-movables -funit-at-a-time -fno-align-loops -fno-gcse -fno-regmove -fno-force-mem -pipe"


Note: I am aware -march normally implies -mtune. I leave -mtune present in the case that -march is filtered for some reason. For the acovea flags, I used the following methodology: I explicitly include all flags marked as "Yes", explicitly exclude all flags marked as "No", and then vary from -O1 to -O2 and finally -O3. For the acovea "alt" set I use -O3 and only explicitly include "Yes" indications, some of which it should be noted are logical not conditions against compilation methods.

TESTS

Test for flac-1.1.1

In this test I encoded Tchaikovsky's 1812 Overture using the "--best" flag option for flac.

Results:

Code:

ACOVEA -O1:
real    2m3.003s
user    2m2.616s
sys     0m0.313s

ACOVEA -O2:
real    2m4.853s
user    2m4.430s
sys     0m0.333s

ACOVEA -O3:
real    2m4.395s
user    2m3.971s
sys     0m0.348s

ACOVEA alt:
real    1m2.734s
user    1m2.348s
sys     0m0.323s

REGULAR:
real    1m9.937s
user    1m9.545s
sys     0m0.326s



Test for lame-3.96.1
In this test I encoded the above 1812 Overture from raw .wav to mp3 using no special options.

Code:

ACOVEA -O1:
real    1m12.179s
user    1m11.916s
sys     0m0.210s

ACOVEA -O2:
real    1m10.361s
user    1m10.109s
sys     0m0.203s

ACOVEA -O3:
FAILED - Segmentation fault (compiled twice to make sure)

ACOVEA alt:
FAILED - Segmentation fault (compiled twice to make sure)

REGULAR:
real    1m6.611s
user    1m6.354s
sys     0m0.189s



Test for bzip2-1.0.2-r3

In this test I compressed the raw .WAV of the previously used Tchaikovsky's 1812 Overture. The file is fairly large, with a size of 166368764 bytes. No flags for bzip2 were used.

Results:

Code:

ACOVEA -O1:
real    0m50.877s
user    0m50.321s
sys     0m0.475s

ACOVEA -O2:
real    0m48.955s
user    0m48.435s
sys     0m0.447s

ACOVEA -O3:
real    0m46.516s
user    0m45.972s
sys     0m0.471s

ACOVEA alt:
real    0m42.366s
user    0m41.845s
sys     0m0.460s

REGULAR:
real    0m43.687s
user    0m43.162s
sys     0m0.450s


Conclusions
I am aware my test cases are drawn from a specific class of programs, that being encode/decode style logic. This is the easiest case to find reproducable results with; if others want to try more complex types of programs with 100% reproducable data sets, by all means please do!

In the examples given, Acovea based results can't really be recommended. It's true in one case they resulted in an approximately 11% performance increase for the flac encoding, but in other tests it either performed worse, much worse, or failed to execute compared to "normal" optimizing CFLAGS. The interaction of the flags recommended appears highly situational and largely just noise when compared with the GCC "meta" flags of -O settings.

I would hazard a guess that acovea's default benchmarks are simply not indicative of the programs I used to test, and therefore made little if any headway in optimizing. Short of running an acovea style analysis of each program individually, I'm not sure how this would be fixed.

In the meantime, I'm sticking with my default CFLAGS =)

-Twist
Back to top
View user's profile Send private message
ebrostig
Bodhisattva
Bodhisattva


Joined: 20 Jul 2002
Posts: 3152
Location: Orlando, Fl

PostPosted: Mon Dec 27, 2004 1:11 am    Post subject: Reply with quote

It is difficult to set individual flags that will give an overall improvement in speed. It all depends on what the program you want to run does and how it does it internally. In order to optimize a specific program you will have to perform the type of tests that you have done and adjust flags individually. That is not desirable in general.

The gcc suite sets internally many flags based on the -O? flag, they are all documented in the gcc man pages.

I have done numerous tests myself on my AMD64 3200+ and have come up with a set of flags that overall gives the most optimal performance and stability. The last is not the least important, as you found out with some programs that segfaulted when run.

In general, it is best to stick with a minimal amount of flags and use the ones recommended for each platform.

I think you have done a great job and I applaud you for your persistence in testing the various combination. Great write-up!

Erik
_________________
'Yes, Firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
Back to top
View user's profile Send private message
georgz
Tux's lil' helper
Tux's lil' helper


Joined: 06 Dec 2002
Posts: 137
Location: Munich, Germany

PostPosted: Wed Dec 29, 2004 12:30 pm    Post subject: Reply with quote

Quote:
I have done numerous tests myself on my AMD64 3200+ and have come up with a set of flags that overall gives the most optimal performance and stability.


Which flags do you use? Are different flags suggested/recommended for 64bit or 32bit installations with Athlon64?
Back to top
View user's profile Send private message
smokeslikeapoet
Tux's lil' helper
Tux's lil' helper


Joined: 03 Apr 2003
Posts: 96
Location: Cordova, TN USA

PostPosted: Wed Dec 29, 2004 12:36 pm    Post subject: Reply with quote

Instead of using acovea I benchmarked my system in much the same way. I used Lame and some default optimizations. I md5 summed all of the resulting mp3s. -O3 gave me the best time. The I started adding other combinations of cflags until I started noticing speed improvements. Again I md5 summed the resulting mp3s. I threw out the cflags that gave me different md5 sums, most notably -ffast-math. Then I started taking the cflags out that gave me no significant improvement in encoding time until I was left with the minimal cflags that reduced my encoding time by 40%. In case you were wondering here are my -cflags for my Athlon 1800+ on an Epox Via 8HKA+.
Code:
CFLAGS="-march=athlon-xp -mtune=athlon-xp -O3 -pipe -fomit-frame-pointer -fforce-addr -falign-functions=16 -falign-jumps=16 -falign-loops=16 -falign-labels=1 -fprefetch-loop-arrays -maccumulate-outgoing-args"

I doubt acovea would give me any significant improvement.
_________________
-SmokesLikeaPoet

Folding@Home
Back to top
View user's profile Send private message
MighMoS
Guru
Guru


Joined: 24 Apr 2003
Posts: 416
Location: @ ~

PostPosted: Wed Dec 29, 2004 5:15 pm    Post subject: Reply with quote

I'm curious as to people using -O3, due to the fact that most tests agree that inlining functions slow down code on modern processors. As well as redundant CFLAGS such as specifying -fomit-frame-pointer on -O2 and above, because the GCC man page states that this is already implied.

Not to start another rant again, but actually reading the man (or info :P ) pages can help a lot too, and save time.
_________________
jabber: MighMoS@jabber.org

localhost # export HOME=`which heart`
Back to top
View user's profile Send private message
Twist
Guru
Guru


Joined: 03 Jan 2003
Posts: 414
Location: San Diego

PostPosted: Wed Dec 29, 2004 8:48 pm    Post subject: Reply with quote

Quote:
I'm curious as to people using -O3, due to the fact that most tests agree that inlining functions slow down code on modern processors.


Qualify "most test results". I think that's probably "some test results I read", as I find that is most often the case and then people generalize. Not trying to knock against you, it's just been my very common experience.

The answer is I don't trust any of them as a generalization and try to test it myself to see. GCC has evolved recently at a very fast pace and its level of support for different processors varies considerably. What is true for one class of processor with a specific cycle rate, cache, and instruction set may be completely different for another. Thus, I test it myself.

Quote:
As well as redundant CFLAGS such as specifying -fomit-frame-pointer on -O2 and above,


For a very simple reason, and yes many of them have RTFM. If you RTFM the portage manual, you will realize that occasionally portage will filter some flags without telling you at the ebuild level. It's therefore valid to string individual flags after your "meta" optimization flag, in the hopes that if the ebuild filters say -O3 you will still retain some optimization behaviors. In fairness however anything that filters "-O2" would most likely filter all flags, so not much point there.

The specific combination you point out, "-O2 -fomit-frame-pointer", is not the default behavior for Intel class processors. From the gcc man page:

Quote:
"-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.


Since omitting the frame pointer is destructive to rewinding on Intel class processors, GCC does not do this until explicitly indicated on those systems. So hopefully you didn't give your pet peeve advice to anybody running an Intel class system =)

-Twist
Back to top
View user's profile Send private message
MighMoS
Guru
Guru


Joined: 24 Apr 2003
Posts: 416
Location: @ ~

PostPosted: Wed Dec 29, 2004 9:37 pm    Post subject: Reply with quote

Twist wrote:
Since omitting the frame pointer is destructive to rewinding on Intel class processors, GCC does not do this until explicitly indicated on those systems. So hopefully you didn't give your pet peeve advice to anybody running an Intel class system =)

-Twist
Actually, I havn't, because I just read up on it the other day. Sorry about the small rant there, and you are right about "most test results". *backs away slowly*
_________________
jabber: MighMoS@jabber.org

localhost # export HOME=`which heart`
Back to top
View user's profile Send private message
ciaranm
Retired Dev
Retired Dev


Joined: 19 Jul 2003
Posts: 1719
Location: In Hiding

PostPosted: Wed Dec 29, 2004 10:29 pm    Post subject: Reply with quote

MighMoS wrote:
I'm curious as to people using -O3, due to the fact that most tests agree that inlining functions slow down code on modern processors.

Because most of the people you see who post their CFLAGS are the sort who don't have a clue what they're doing, and who just assume that bigger numbers and longer CFLAGS lines equates to faster code.
Back to top
View user's profile Send private message
rhill
Developer
Developer


Joined: 22 Oct 2004
Posts: 1629
Location: sk.ca

PostPosted: Wed Dec 29, 2004 11:21 pm    Post subject: Reply with quote

thanks twist, i was getting all set to go into MythBusters mode, but you ranted for me. :lol:

seriously there needs to be a GCC Myths FAQ

Quote:
-O2 does not include -fomit-frame-pointers on intel archs


Quote:
-mfpmath=sse,387 is BROKEN in any current release and will eat your children


Quote:
-mmmx and -msse -msse2 are a waste of time and also BROKEN


stuff like that, but written by someone who knows what they are talking about.

--de.
_________________
by design, by neglect
for a fact or just for effect
Back to top
View user's profile Send private message
ciaranm
Retired Dev
Retired Dev


Joined: 19 Jul 2003
Posts: 1719
Location: In Hiding

PostPosted: Wed Dec 29, 2004 11:29 pm    Post subject: Reply with quote

dirtyepic wrote:
stuff like that, but written by someone who knows what they are talking about.

I used to have one of those, but I got too much abuse from lovech^W clueless ricers over it, so I got rid of it.

Seriously though, I'm trying to get the following in as official policy on how we handle CFLAGS:

Quote:

Guidelines for Flag Filtering

If a package breaks with any reasonable CFLAGS, it is best to filter the problematic flag if a bug report is received. Reasonable CFLAGS are -march=, -mcpu=, -mtune= (depending upon arch), -O2, -Os and -fomit-frame-pointer. Note that -Os should usually be replaced with -O2 rather than being stripped entirely. The -fstack-protector flag should probably be in this group too, although our hardened team claim that this flag never ever breaks anything...

If a package breaks with other CFLAGS, it is perfectly ok to close the bug with a WONTFIX suggesting that the user picks more sensible global CFLAGS. Similarly, if a bug report is received and is determined or suspected to be caused by daft CFLAGS, an INVALID resolution is appropriate.


Take from that what you will about what you should have in make.conf...
Back to top
View user's profile Send private message
Twist
Guru
Guru


Joined: 03 Jan 2003
Posts: 414
Location: San Diego

PostPosted: Thu Dec 30, 2004 12:06 am    Post subject: Reply with quote

Quote:
Actually, I havn't, because I just read up on it the other day. Sorry about the small rant there, and you are right about "most test results". *backs away slowly*


LOL ok I guess I came across a bit too strong there. I was honestly just trying to convey the idea that -fomit-frame-pointer was not automatic with -O or above on Intel arch machines.

As for the 'most test results' thing, it's a common problem that I fall into myself, even as a coder and somebody who is very conversant with compilers and their behavior. This is why conceptually I like Acovea; it seems that it's either flawed somewhat in implementation (not enough breadth to the example benchmark code) or simply that GCC is prone to many contradictory behaviors that can't be generalized across an architecture, but must be taken in context to a specific set of code. I tend to favor the latter myself, but again it means nothing without more extensive testing =)

Quote:
I used to have one of those, but I got too much abuse from lovech^W clueless ricers over it, so I got rid of it.

Seriously though, I'm trying to get the following in as official policy on how we handle CFLAGS:


I think that is an ok set of rules for the general case, sure. While it's annoying to get non-bugs submitted by Gentoo users who are doing unreasonble things with the compiler, it sort of comes with the territory and is part of the Gentoo flexibility/experience, so I would urge you to not turn to the dark side of bitterness on this issue =). I think the "stable" keyword ebuilds should all be responsible for handling any set of input CFLAGS to retain stable behavior (note that this most likely means rejecting almost all of them) and that your proposed policy would get us there.

If wishes were fishes though...I'd love to use the participatory nature of the Gentoo community to get definitive on some of this stuff. For instance, while we can label -fomit-frame-pointer as "safe" in that it doesn't break any known ebuilds, it would be great if we had a bug-buddy like facility to actually KNOW that for sure as part of the base install. Except maybe not as cumbersome and ugly as bug-buddy =). Something like -ftracer with the newer GCC releases, which (according to the GCC mailing list) should be entirely safe and improve the ability of other optimizations. -funit-at-a-time should also be safe, short of consuming extra memory for compiles, but I honestly don't have a feel at all for whether it breaks anything as I don't use it. It would be great if we could poll and consolidate results with some of these flag variants automatically.

Ah well. In the meantime, don't try this at home! Experienced coder here attempting compilations on a closed course with appropriate safety gear. The sponsers remind you to not exceed your ability or that of your gear by sticking with stable keywords and not overriding ebuild behavior. Thank you, drive through.

-Twist
Back to top
View user's profile Send private message
ciaranm
Retired Dev
Retired Dev


Joined: 19 Jul 2003
Posts: 1719
Location: In Hiding

PostPosted: Thu Dec 30, 2004 12:12 am    Post subject: Reply with quote

If you want stable, don't set CFLAGS at all in make.conf. Just rely upon the profile-provided settings. Gentoo developers are not here to correct every single possible stupid thing you can do with make.conf.
Back to top
View user's profile Send private message
rhill
Developer
Developer


Joined: 22 Oct 2004
Posts: 1629
Location: sk.ca

PostPosted: Thu Dec 30, 2004 1:19 am    Post subject: Reply with quote

that kinda throws the whole 'freedom of choice' philosophy out the window though. sorry, just poking your buttons. :lol: i do appreciate the all work you do here for us and gentoo in general.

seriously though, i was surprised that "-pipe" isn't on that whitelist. are there actually situations where -pipe needs to be filtered or has caused problems (just curious).
_________________
by design, by neglect
for a fact or just for effect


Last edited by rhill on Thu Dec 30, 2004 1:22 am; edited 1 time in total
Back to top
View user's profile Send private message
ciaranm
Retired Dev
Retired Dev


Joined: 19 Jul 2003
Posts: 1719
Location: In Hiding

PostPosted: Thu Dec 30, 2004 1:21 am    Post subject: Reply with quote

dirtyepic wrote:
that kinda throws the whole 'freedom of choice' philosophy out the window though. sorry, just poking your buttons. :lol:

Oh, you're free to use other flags, and developers are free to ignore any bugs you submit if you do.

Quote:
seriously though, i was surprised that "-pipe" isn't on that whitelist. are there actually situations where -pipe needs to be filtered or has caused problems (just curious).

-pipe doesn't count, it's not an optimisation flag and it doesn't alter the code produced. No problems with it though, guess I could explicitly say so...
Back to top
View user's profile Send private message
rhill
Developer
Developer


Joined: 22 Oct 2004
Posts: 1629
Location: sk.ca

PostPosted: Thu Dec 30, 2004 1:25 am    Post subject: Reply with quote

ciaranm wrote:
Oh, you're free to use other flags, and developers are free to ignore any bugs you submit if you do.


yeah, definitely. no argument there.

Quote:
-pipe doesn't count, it's not an optimisation flag and it doesn't alter the code produced. No problems with it though, guess I could explicitly say so...


oh ok. it is a CFLAG however, and the guideline didn't mention optimization flags only. i'm unfamiliar with how the filtering works of course, so perhaps the mistake was mine.

cheers.
_________________
by design, by neglect
for a fact or just for effect
Back to top
View user's profile Send private message
Hypnos
Advocate
Advocate


Joined: 18 Jul 2002
Posts: 2868
Location: Omnipresent

PostPosted: Thu Dec 30, 2004 4:54 am    Post subject: Reply with quote

Twist,

Thanks for you work -- I'm glad someone has done something useful with my reporting scripts.

Comments:

* It seems that, apart from compilation problems, your Acovea "alt" CFLAGS did pretty well. This suggests that Acovea, for the algorithms you have chosen, has more reliably found negatives than affirmatives (apparently, the "maybe"'s from -O3 provided a big performance boost).

* The algorithms you have chosen are far more complex and heuristic than those employed by Acovea as benchmarks. On the former, this means that memory-intensive optimizations might be beneficial since you are moving a lot of data and burning a lot of cycles anyway. On the latter, I'm not knowledgeable enough to impute how this would affect the performance of specific switches ....

* Is not GCC optimization for AMD notoriously bad? As you say in another post, the cross-dependencies of the various switches might be too extensive for even Acovea to dissect with its evolution.

* Here are my CFLAGS for my P4-Mobile:

Code:
CFLAGS="-pipe -Wall -O2 -march=pentium4 -mcpu=pentium4 -maccumulate-outgoing-args -minline-all-stringops -fmove-all-movables -fno-if-conversion2 -fno-crossjumping -fno-delayed-branch -fno-omit-frame-pointer -fno-merge-constants -fno-thread-jumps"


I can't say one way or the other on performance movements (apart from placebo), but these flags have been prodigiously stable.
_________________
Personal overlay | Simple backup scheme
Back to top
View user's profile Send private message
Twist
Guru
Guru


Joined: 03 Jan 2003
Posts: 414
Location: San Diego

PostPosted: Thu Dec 30, 2004 6:01 am    Post subject: Reply with quote

Hypnos,

BTW, before anything else, wanted to thank you for your ebuild and test scripts for Acovea. Fine work that I was too lazy to do myself.

Quote:
It seems that, apart from compilation problems, your Acovea "alt" CFLAGS did pretty well.


Yes - I would hazard to guess that GCC is decent about deciding on its own when a method is negative (probably based on total instruction/tick count) and simply doesn't use it. So although those options came out as "no" according to Acovea, in real use GCC might benefit from them occasionally.

Quote:
The algorithms you have chosen are far more complex and heuristic than those employed by Acovea as benchmarks.


The biggest fault I can find with my "real world" examples is that they are all memory intensive. They all pump a lot of data in total, they all want to do lots of fairly wide address space lookups and compares, etc. However, it's the nature of the beast that these type of apps are not only good demonstrations but also where I tend to spend a lot of wait time in real life. For purely algorithmic benchmarks, I could have used nbench or the like, and for heavy mathmatics, xfractint or celestia on a complex solution I suppose. Might still go back and do that.

Quote:
Is not GCC optimization for AMD notoriously bad?


AMD themselves are actively helping the GCC crew in getting their instruction scheduling up to par, and it is reportedly vastly improved in the later versions. Since I tested with 3.4.3, I figured that was good enough. It's definitely true that the GCC 2.9 series was simply awful with AMD procs, and early 3 series (aside from general brokeness and stability issues) wasn't renowned either. I could and probably will run the same kind of comparison on one of my P4 machines, I just haven't gotten around to it yet.

-Twist
Back to top
View user's profile Send private message
moocha
Watchman
Watchman


Joined: 21 Oct 2003
Posts: 5722
Location: Cluj-Napoca, Romania

PostPosted: Thu Dec 30, 2004 6:24 am    Post subject: Reply with quote

Twist wrote:
Something like -ftracer with the newer GCC releases, which (according to the GCC mailing list) should be entirely safe and improve the ability of other optimizations.


Which only goes to show that the GCC mailing list can't be entirely trusted, since -ftracer breaks teTeX in a very weird fashion (executables don't crash but weirdly duplicate the file name they get passed, which of course causes the file not to be found). For details see http://bugs.gentoo.org/show_bug.cgi?id=50417 (ebuild *still* doesn't filter that flag, and I'm pretty peeved about it.. I even begged nicely :D)

As far as I'm aware, teTeX is the only package broken by -ftracer though. I use a bashrc-based filtering so teTeX doesn't get passed -ftracer but the rest do.

My own flags (development desktop, dual P3, lots of L2 cache):
Code:
CFLAGS="-march=pentium3 -mtune=pentium3 -O2 -pipe \
-fno-ident -fomit-frame-pointer -momit-leaf-frame-pointer -ftracer \
-fweb -frename-registers -finline-functions -finline-limit=280"


The last line actually takes -O2 to -O3 - it's there because many ebuilds filter -O3. I chose to ignore that, but then that's my choice, and I wholeheartedly agree with the default restrictive filtering.

As to your Acovea findings - it's hardly surprising. The best optimizations for any software are, in this order:
(a) Having a good design from the start and not as an afterthought
(b) Using algorithms that are best suited for the task
(c) Using the compiler's profiling facilities to identify bottlenecks
.
.
.
(somewhere around letter m) Compiler flags

;-)
_________________
Military Commissions Act of 2006: http://tinyurl.com/jrcto

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin
Back to top
View user's profile Send private message
Hypnos
Advocate
Advocate


Joined: 18 Jul 2002
Posts: 2868
Location: Omnipresent

PostPosted: Thu Dec 30, 2004 7:00 am    Post subject: Reply with quote

moocha wrote:
As to your Acovea findings - it's hardly surprising. The best optimizations for any software are, in this order:
(a) Having a good design from the start and not as an afterthought
(b) Using algorithms that are best suited for the task
(c) Using the compiler's profiling facilities to identify bottlenecks
.
.
.
(somewhere around letter m) Compiler flags

;-)

Ah, but as Twist shows above, compiler flags can certainly be deleterious! :D
_________________
Personal overlay | Simple backup scheme
Back to top
View user's profile Send private message
dberkholz
Developer
Developer


Joined: 18 Mar 2003
Posts: 1008
Location: Minneapolis, MN, USA

PostPosted: Thu Dec 30, 2004 9:17 pm    Post subject: Reply with quote

moocha wrote:
As far as I'm aware, teTeX is the only package broken by -ftracer though. I use a bashrc-based filtering so teTeX doesn't get passed -ftracer but the rest do.

-ftracer also broke gtk+ last time I tried it. That was a lot of fun to track down, since the problem resulted in a mysterious collection of broken apps that used gtk+.
Back to top
View user's profile Send private message
mbalino
n00b
n00b


Joined: 09 Aug 2004
Posts: 30
Location: Edmonton

PostPosted: Thu Dec 30, 2004 10:30 pm    Post subject: Reply with quote

"-march=athlon-xp -m3dnow -msse -mfpmath=sse -mmmx -O3 -pipe -fforce-addr -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -falign-functions=4 -maccumulate-outgoing-args -ffast-math -fprefetch-loop-arrays"

This my flags for BARTON 3000+ w/1024DDR400 SATA150 80Gb
KT600 / VT8237

All system are functionally since 15/11/2004 wtihout any problem.

kernel 2.6.9-ac12 and 2.6.10-ck1 are tested
Back to top
View user's profile Send private message
hq4ever
Apprentice
Apprentice


Joined: 15 Aug 2004
Posts: 167

PostPosted: Fri Dec 31, 2004 4:28 pm    Post subject: Reply with quote

mbalino wrote:
"-m3dnow -msse -mfpmath=sse -mmmx


I'm sorry for this newb question here but where does the "m" in front of these flag's came from ?

Shouldn't it be "-3dnow -sse -fpmath=sse -mmx" like here http://gentoo-portage.com/USE ?
_________________
"God doesn't play dice with the universe",
Albert Einstein.

sig: http://www.jr.co.il/humor/signatur.txt
avatar: david lanham, http://www.dlanham.com/goodies.htm
Back to top
View user's profile Send private message
Twist
Guru
Guru


Joined: 03 Jan 2003
Posts: 414
Location: San Diego

PostPosted: Fri Dec 31, 2004 6:31 pm    Post subject: Reply with quote

Quote:
I'm sorry for this newb question here but where does the "m" in front of these flag's came from ?

Shouldn't it be "-3dnow -sse -fpmath=sse -mmx" like here http://gentoo-portage.com/USE ?


USE flags are specific to Gentoo and indicate a system-level interest (or not) in the application/feature indicated by the flag.

Compile flags are switches to indicate to GCC particular code generation behavior. In this case, -f indicates an "option", whereas -m indicates a "machine option". Most commonly -m is something that is specific to the processor type that is the compile target.

It is correct to use -m to specify fpmath, sse, and mmx switches. All are particular to the processor, not to code generation in general.

-Twist
Back to top
View user's profile Send private message
procyon112
n00b
n00b


Joined: 28 Apr 2005
Posts: 16
Location: Seattle, Washington, USA

PostPosted: Sat Apr 30, 2005 1:34 am    Post subject: Invalid test Reply with quote

This test in invalid. Because you are evolving compile flags independently for each test, then accepting the ones that on average give you the best performance, the test is not even as good as:
1) start with no optimizations and run each program, taking a reading.
2) turn on an optimization, test, take a reading.
3) turn on a different optimization and test.
4) The optimizations that give benefits, use, the others drop.

The genetic algorithm is probably worse, because it does not do a comprehensive test, and takes MUCH longer. The GA test is supposed to show which flags work best IN TANDEM, so taking the best average results will probably result in worse performance than O2 or O3, which the gcc team has probably already tested for best average performance independantly. What you need to do is:
1) Only include in the list of flags to test, those which you will have no qualms using in your final system build, ie, leave out -malign-double
2) For each generation of the GA, *ALL* benchmarks are run and a rating is given to that "set" of flags as the GA fitness function
3) run the GA until you are satisfied with the overall results (since the set of flags is rather small as far as GA's are concerned, 20 generations should be good with a population of 50-100).
4) use ALL the flags of the winner GA on your system, because what you are testing is not "flag -fomg-fast is beneficial" but rather "flags -fsometimes-good -falmost-never -fduh-use-me-always and -mim-a-typewriter when used in tandem beats -O3 on average"

Basically, what I am saying, is that if you run 6 independant GA's then take the average results, your data is completely meaningless and you're better off sticking with the tried and true "-O2 -pipe". Rewrite this GA if you want to get real data out of it.
Back to top
View user's profile Send private message
Hypnos
Advocate
Advocate


Joined: 18 Jul 2002
Posts: 2868
Location: Omnipresent

PostPosted: Sat Apr 30, 2005 4:16 am    Post subject: Re: Invalid test Reply with quote

procyon112 wrote:
This test in invalid. Because you are evolving compile flags independently for each test, then accepting the ones that on average give you the best performance, the test is not even as good as:
1) start with no optimizations and run each program, taking a reading.
2) turn on an optimization, test, take a reading.
3) turn on a different optimization and test.
4) The optimizations that give benefits, use, the others drop.

Yes, except that you lose information about poor interactions altogether. By picking out the best average flags, you are not just extracting the switches which are beneficial over a variety of algorithms, but also those that "play nice" with others. This varies from machine to machine, it seems.

Quote:
What you need to do is:
1) Only include in the list of flags to test, those which you will have no qualms using in your final system build, ie, leave out -malign-double
2) For each generation of the GA, *ALL* benchmarks are run and a rating is given to that "set" of flags as the GA fitness function
3) run the GA until you are satisfied with the overall results (since the set of flags is rather small as far as GA's are concerned, 20 generations should be good with a population of 50-100).
4) use ALL the flags of the winner GA on your system, because what you are testing is not "flag -fomg-fast is beneficial" but rather "flags -fsometimes-good -falmost-never -fduh-use-me-always and -mim-a-typewriter when used in tandem beats -O3 on average"

This is not too different from now, except for step 3. The danger here is that you overoptimize to this particular aggregate situation, which is only a rough mapping to the space of all apps you will be compiling. By testing each algorithm separately, you have a larger base of variegated populations whose best traits you can extract statistically.

The bottom line is that I'm testing for "nice" flags, you are trying to find an optimum. In the case that interactions are very important to performance (i.e., strong correlation) as you contend, there's no way that the small Acovea tests can predict the performance of real world apps, so the discussion is moot -- every app would have to be optimized seperately anyway. If the optimizing interactions are weak but the interactions that cause breakage are strong (as I contend), then you want to draw "valuable" traits from a broad base of organisms. (*)

This is all borne out by the reports on the old thread (mostly anecdotal): programs aren't any faster, but programs build more reliably and execute with far more stability than the canonical -O2 or -O3.

One good suggestion you make is to diligently weed out flags that you would never use anyway, like "-malign-double", from the set of available flags -- they might cause bad interactions with certain flags that are otherwise valuable.


(*) It should be noted that the intended purpose of Acovea is to test compilers against the different supplied benchmarks, or a specific algorithm against a specific compiler. My scripts generate the inference I describe.
_________________
Personal overlay | Simple backup scheme
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum