Gentoo Forums

Posted: **Thu Apr 22, 2004 8:41 pm**

Ack, this is *slow*, 3 hours to do the alma test, and there's 6 more to go. I guess the problem is that I have a SMP system and its only using one CPU. At least that means I have 1 CPU free to do other stuff without a noticeable performance hit, still its gonna take forever at this rate.

Vag

Posted: **Thu Apr 22, 2004 8:52 pm**

Of course it's slow, that's the point - it's very thorough...

Posted: **Fri Apr 23, 2004 2:14 pm**

So after ~150 total hours of compiling, I have results from the acovea script run under two different conditions. These were done on a 700 MHz P3 laptop with 512 MB RAM. Just to get to the point, because this will be a long post, the conditions under which the acovea scripts are run influence the output.

My first run was done in console mode. I just left my laptop on my desk, and did not do anything with it other than let it generate heat while running the acovea script.

Code: Select all

 Score |  So?  | Switch (annotation)
------------------------------------------------------------------------------
  31.7 |  Yes  | -malign-double
  31.5 |  Yes  | -fcaller-saves (-O2)
  31.2 |  Yes  | -foptimize-sibling-calls (-O2)
  30.9 |  Yes  | -freorder-blocks (-O2)
  30.4 |  Yes  | -fsched-interblock (-O2 GCC 3.3)
  29.8 | Maybe | -ftracer
  29.2 |  Yes  | -fdelete-null-pointer-checks (-O2)
  29.1 | Maybe | -funsafe-math-optimizations (fast math)
  29.1 |  Yes  | -fmove-all-movables
  29.0 |  Yes  | -fno-if-conversion2 (! -O1)
  28.6 | Maybe | -fgcse (-O2)
  27.5 | Maybe | -finline-limit
  27.1 |  Yes  | -fno-thread-jumps (! -O1)
  27.1 | Maybe | -finline-functions (-O3)
  26.1 |  Yes  | -fno-defer-pop (! -O1)
  26.1 |  Yes  | -fsched-spec (-O2 GCC 3.3)
  26.0 | Maybe | -fstrict-aliasing (-O2)
  25.7 |  Yes  | -ffinite-math-only (fast math)
  25.6 | Maybe | -fexpensive-optimizations (-O2)
  25.6 | Maybe | -fno-math-errno (fast math)
  25.0 | Maybe | -fno-trapping-math (fast math)
  24.9 |  Yes  | -fpeephole2 (-O2)
  24.8 | Maybe | -fschedule-insns2 (-O2)
  24.8 |  Yes  | -falign-jumps (-O2 GCC 3.3)
  24.6 |  Yes  | -falign-labels (-O2 GCC 3.3)
  24.4 | Maybe | -fprefetch-loop-arrays
  24.3 | Maybe | -mno-align-stringops
  23.7 | Maybe | -freorder-functions (-O2 GCC 3.3)
  23.6 | Maybe | -frename-registers (-O3)
  23.2 | Maybe | -falign-loops (-O2 GCC 3.3)
  22.5 | Maybe | -fcse-follow-jumps (-O2)
  21.8 | Maybe | -fno-delayed-branch (! -O1)
  21.8 | Maybe | -fno-omit-frame-pointer (! -O1)
  21.6 | Maybe | -fno-crossjumping (! -O1)
  21.6 | Maybe | -frerun-cse-after-loop (-O2)
  21.4 | Maybe | -fcse-skip-blocks (-O2)
  20.9 | Maybe | -mieee-fp
  20.9 | Maybe | -frerun-loop-opt (-O2)
  20.7 | Maybe | -fno-cprop-registers (! -O1)
  20.6 | Maybe | -maccumulate-outgoing-args
  20.2 | Maybe | -fno-signaling-nans (fast math)
  19.6 | Maybe | -fno-merge-constants (! -O1)
  19.3 |   No  | -fforce-mem (-O2)
  19.0 | Maybe | -mno-push-args
  18.2 |   No  | -fno-if-conversion (! -O1)
  18.2 | Maybe | -minline-all-stringops
  15.6 |   No  | -freduce-all-givs
  15.1 |   No  | -fstrength-reduce (-O2)
  11.8 |   No  | -fnew-ra
  11.6 |   No  | -fno-guess-branch-probability (! -O1)
  11.1 |   No  | -fschedule-insns (-O2)
  10.4 |   No  | -ffloat-store
  10.4 |   No  | -fregmove (-O2)
   9.6 |   No  | -fno-inline
   9.4 |   No  | -funroll-all-loops
   8.5 |   No  | -fomit-frame-pointer
   7.2 |   No  | -funroll-loops
   0.0 |   No  | -fno-loop-optimize (! -O1)
   0.0 |   No  | -mfpmath=387
   0.0 |   No  | -mfpmath=sse
   0.0 |   No  | -mfpmath=sse,387
   0.0 |   No  | -momit-leaf-frame-pointer

My second run was done from within a Konsole terminal while running KDE. During the 84 hours it took to do this run, I was using my computer for desktop-type work -- email, surfing, wordprocessing, etc. I did not do any other emerging/compiling while this was going on.

Code: Select all

 Score |  So?  | Switch (annotation)
------------------------------------------------------------------------------
  35.4 |  Yes  | -maccumulate-outgoing-args
  32.5 | Maybe | -fstrict-aliasing (-O2)
  32.3 | Maybe | -fgcse (-O2)
  31.8 |  Yes  | -fno-cprop-registers (! -O1)
  31.6 |  Yes  | -fno-trapping-math (fast math)
  30.6 |  Yes  | -fexpensive-optimizations (-O2)
  30.4 |  Yes  | -fno-delayed-branch (! -O1)
  30.0 |  Yes  | -falign-jumps (-O2 GCC 3.3)
  29.9 |  Yes  | -frerun-loop-opt (-O2)
  29.8 |  Yes  | -minline-all-stringops
  29.8 |  Yes  | -mieee-fp
  29.7 |  Yes  | -fmove-all-movables
  28.9 |  Yes  | -fno-omit-frame-pointer (! -O1)
  28.8 |  Yes  | -fsched-interblock (-O2 GCC 3.3)
  28.8 | Maybe | -freorder-blocks (-O2)
  28.5 |  Yes  | -freorder-functions (-O2 GCC 3.3)
  28.0 |  Yes  | -fno-merge-constants (! -O1)
  27.8 |  Yes  | -frerun-cse-after-loop (-O2)
  27.7 |  Yes  | -fschedule-insns2 (-O2)
  27.4 |  Yes  | -fdelete-null-pointer-checks (-O2)
  27.4 |  Yes  | -ffinite-math-only (fast math)
  27.2 |  Yes  | -finline-functions (-O3)
  26.3 |  Yes  | -fcse-skip-blocks (-O2)
  26.2 |  Yes  | -falign-labels (-O2 GCC 3.3)
  26.0 | Maybe | -falign-loops (-O2 GCC 3.3)
  25.9 |  Yes  | -fno-if-conversion2 (! -O1)
  25.7 | Maybe | -fcse-follow-jumps (-O2)
  25.5 |  Yes  | -fcaller-saves (-O2)
  25.3 | Maybe | -fno-thread-jumps (! -O1)
  25.1 |  Yes  | -fpeephole2 (-O2)
  24.6 | Maybe | -fforce-mem (-O2)
  24.5 |  Yes  | -fprefetch-loop-arrays
  24.3 | Maybe | -frename-registers (-O3)
  24.2 | Maybe | -funsafe-math-optimizations (fast math)
  23.8 |  Yes  | -foptimize-sibling-calls (-O2)
  23.6 | Maybe | -fno-defer-pop (! -O1)
  23.3 |  Yes  | -fstrength-reduce (-O2)
  23.3 |  Yes  | -fsched-spec (-O2 GCC 3.3)
  23.2 | Maybe | -mno-push-args
  23.0 | Maybe | -ftracer
  22.7 |  Yes  | -fregmove (-O2)
  22.0 | Maybe | -fno-crossjumping (! -O1)
  21.8 | Maybe | -malign-double
  21.4 | Maybe | -freduce-all-givs
  20.6 | Maybe | -finline-limit
  19.7 | Maybe | -fno-math-errno (fast math)
  19.4 | Maybe | -fno-signaling-nans (fast math)
  18.3 |   No  | -fschedule-insns (-O2)
  17.1 | Maybe | -mno-align-stringops
  16.7 |   No  | -fno-if-conversion (! -O1)
  16.5 |   No  | -fno-inline
  12.4 |   No  | -ffloat-store
  11.6 |   No  | -fno-guess-branch-probability (! -O1)
  11.0 | Maybe | -mfpmath=sse
  10.2 |   No  | -funroll-loops
   9.8 |   No  | -funroll-all-loops
   9.5 |   No  | -fnew-ra
   9.3 |   No  | -fno-loop-optimize (! -O1)
   8.9 |   No  | -fomit-frame-pointer
   0.0 |   No  | -mfpmath=387
   0.0 |   No  | -mfpmath=sse,387
   0.0 |   No  | -momit-leaf-frame-pointer

As you can see, the results are quite different. I don't know why the running environment should affect the acovea results, and I'm not sure which set of recommendations I should use.

Any comments would be welcome.

Posted: **Fri Apr 23, 2004 6:36 pm**

This all sounds like great geeky fun, but I'm curious if there is a way to do the tests in steps (some sort of pause/restart feature). My old Athlon Tbird 933MHz can't devote the 48-72hours it would take in a single sitting, and it would be nice to let it simply run overnight and be able to stop it when necessary and restart the next night...

Posted: **Fri Apr 23, 2004 6:41 pm**

you can do the tests one by one, the longest I've had a test take so far (down to the last 2) is 6 hours. The tests each generate a .run file that the perl script in this thread uses to generate its recommendations.

Vag

Posted: **Fri Apr 23, 2004 7:04 pm**

Daagar wrote:This all sounds like great geeky fun, but I'm curious if there is a way to do the tests in steps (some sort of pause/restart feature). My old Athlon Tbird 933MHz can't devote the 48-72hours it would take in a single sitting, and it would be nice to let it simply run overnight and be able to stop it when necessary and restart the next night...

CTRL-Z the same way you can pause any process ;)

Then later when you want it to run again, "fg" to resume in the foreground, or "bg" to resume in the background.

Posted: **Fri Apr 23, 2004 9:06 pm**

wilburpan wrote:As you can see, the results are quite different. I don't know why the running environment should affect the acovea results, and I'm not sure which set of recommendations I should use.

Use the former set. As previously stated, Acovea uses real time, not CPU time. Thus, the latter set are meaningless (sorry!). Acovea should be run with as little overhead as is possible. This includes stopping any distributed computing project clients, such as SETI@home or mprime, whilst the run is in progress.

Posted: **Sat Apr 24, 2004 11:00 am**

I wonder what it'll take to rewrite acovea to use user time instead of real time, so it won't be neccessary to run the tests while the system is otherwise left idle.

Posted: **Sat Apr 24, 2004 5:14 pm**

darkless wrote:I wonder what it'll take to rewrite acovea to use user time instead of real time, so it won't be neccessary to run the tests while the system is otherwise left idle.

I tried to benchmark many programs (mainly for floating point) in working environments. Both system & user time are NOT accurate. I think it's a scheduler-related problem (not only linux, but also cray, sp4, compaq). The only way to get good values from a benchmark is running it in a single user mode system.

Obviously considering user time instead of system time it's better.

I'm running acovea on my new dual opteron 242. Stay tuned

Posted: **Sat Apr 24, 2004 5:50 pm**

Vagabond, thanks. Didn't realize that each test was separate, so I can run one per night or something. As for the suggestion to just ctrl-z the process, yes of course I realized that would work, but I have to usually reboot back to Windows for the family during the day making that solution not practical.

Posted: **Sun Apr 25, 2004 9:04 pm**

As promised, these are the results of acovea on my system
(Dual Opteron 242, 2x512MB, MSI mobo)

Code: Select all

 Score |  So?  | Switch (annotation)
------------------------------------------------------------------------------
  45.1 |  Yes  | -funsafe-math-optimizations (fast math)
  43.1 |  Yes  | -ftracer
  36.2 |  Yes  | -fcaller-saves (-O2)
  36.1 |  Yes  | -fforce-mem (-O2)
  35.6 | Maybe | -mieee-fp
  34.5 |  Yes  | -fno-defer-pop (! -O1)
  34.0 |  Yes  | -falign-jumps (-O2 GCC 3.3)
  33.5 | Maybe | -fschedule-insns (-O2)
  33.2 |  Yes  | -fdelete-null-pointer-checks (-O2)
  33.1 |  Yes  | -fpeephole2 (-O2)
  32.7 | Maybe | -fregmove (-O2)
  32.7 |  Yes  | -finline-limit
  32.3 |  Yes  | -falign-labels (-O2 GCC 3.3)
  32.1 |  Yes  | -fcse-skip-blocks (-O2)
  32.0 | Maybe | -fgcse (-O2)
  31.6 |  Yes  | -freorder-blocks (-O2)
  30.7 |  Yes  | -fcse-follow-jumps (-O2)
  30.7 |  Yes  | -frename-registers (-O3)
  30.6 |  Yes  | -mno-align-stringops
  30.3 |  Yes  | -fno-if-conversion2 (! -O1)
  29.9 | Maybe | -fno-thread-jumps (! -O1)
  29.5 | Maybe | -fstrict-aliasing (-O2)
  29.3 |  Yes  | -maccumulate-outgoing-args
  28.9 | Maybe | -finline-functions (-O3)
  28.7 |  Yes  | -minline-all-stringops
  28.5 | Maybe | -fno-crossjumping (! -O1)
  28.4 |  Yes  | -fno-cprop-registers (! -O1)
  27.7 |  Yes  | -fsched-interblock (-O2 GCC 3.3)
  27.6 | Maybe | -fstrength-reduce (-O2)
  26.7 | Maybe | -fno-delayed-branch (! -O1)
  26.4 |  Yes  | -freorder-functions (-O2 GCC 3.3)
  26.3 | Maybe | -fno-omit-frame-pointer (! -O1)
  25.8 |  Yes  | -fmove-all-movables
  25.5 | Maybe | -fschedule-insns2 (-O2)
  25.4 | Maybe | -falign-loops (-O2 GCC 3.3)
  25.0 | Maybe | -fsched-spec (-O2 GCC 3.3)
  24.9 |   No  | -fprefetch-loop-arrays
  24.6 |  Yes  | -fexpensive-optimizations (-O2)
  24.0 |  Yes  | -ffinite-math-only (fast math)
  22.5 | Maybe | -fno-inline
  21.8 | Maybe | -mno-push-args
  21.4 | Maybe | -fno-signaling-nans (fast math)
  20.9 | Maybe | -funroll-loops
  20.8 | Maybe | -fno-merge-constants (! -O1)
  19.8 | Maybe | -freduce-all-givs
  19.4 | Maybe | -fno-math-errno (fast math)
  19.2 |   No  | -funroll-all-loops
  19.0 | Maybe | -foptimize-sibling-calls (-O2)
  18.7 |   No  | -fnew-ra
  18.5 |   No  | -mfpmath=387
  15.8 | Maybe | -fno-trapping-math (fast math)
  14.8 |   No  | -fno-if-conversion (! -O1)
  14.6 |   No  | -ffloat-store
  14.3 | Maybe | -frerun-cse-after-loop (-O2)
  12.3 |   No  | -frerun-loop-opt (-O2)
  11.3 |   No  | -mfpmath=sse,387
  10.3 |   No  | -fno-guess-branch-probability (! -O1)
   0.0 |   No  | -fno-loop-optimize (! -O1)
   0.0 |   No  | -mfpmath=sse

Then I tried "nbench" with std optimization

Code: Select all

CFLAGS=-s -static -Wall -O2
CPU                 : Dual AuthenticAMD AMD Opteron(tm) Processor 242 1604MHz
L2 Cache            : 1024 KB
OS                  : Linux 2.6.5-gentoo-r1
C compiler          : 3.3.3
MEMORY INDEX        : 11.142
INTEGER INDEX       : 10.406
FLOATING-POINT INDEX: 15.911

and with acovea optimization

Code: Select all

CFLAGS = -s -static -Wall -O1 -funsafe-math-optimizations -ftracer -fcaller-saves -fforce-mem -fno-defer-pop -falign-jumps -fdelete-null-pointer-checks -fpeephole2 -finline-limit=600 -falign-labels -fcse-skip-blocks -freorder-blocks -fcse-follow-jumps -frename-registers -mno-align-stringops -fno-if-conversion2 -maccumulate-outgoing-args -minline-all-stringops -fno-cprop-registers -fsched-interblock -freorder-functions -fmove-all-movables -fexpensive-optimizations -ffinite-math-only

CPU                 : Dual AuthenticAMD AMD Opteron(tm) Processor 242 1604MHz
L2 Cache            : 1024 KB
OS                  : Linux 2.6.5-gentoo-r1
C compiler          : 3.3.3 
MEMORY INDEX        : 10.553
INTEGER INDEX       : 9.486
FLOATING-POINT INDEX: 17.037

As you can see, floating-point is 7% better, but memory (-5%) and integer (-9%) suggest that acovea flags are not useful for workstation use.

Also, -funsafe-math-optimizations gives the boost, but it's deprecated.
I think -O2 is the best choice for my gentoo installation, but I will try other combinations , removing deprecated or conflicting flags suggested by acovea.

Posted: **Sun Apr 25, 2004 10:19 pm**

I don't know if nbench is the best choice...

These are Native Mode (a.k.a. Algorithm Level) tests; benchmarks designed to expose the capabilities of a system's CPU, FPU, and memory system.

I still don't know what the best benchmark would be though... I asked Scott in another thead, still waiting to hear back from him.

Posted: **Sun Apr 25, 2004 10:49 pm**

aethyr wrote:I don't know if nbench is the best choice...
These are Native Mode (a.k.a. Algorithm Level) tests; benchmarks designed to expose the capabilities of a system's CPU, FPU, and memory system.
I still don't know what the best benchmark would be though... I asked Scott in another thead, still waiting to hear back from him.

Acovea uses similar alghoritms. It is good in an "ideal" world, where the machine is dedicated to number-crunching.

For multi-pourpose machines (ie workstations), there are a lot of parameters, and I think the best benchmark is X/Kde/Gnome/OpenOffice startup

Posted: **Sun Apr 25, 2004 11:03 pm**

poisson wrote:As you can see, floating-point is 7% better, but memory (-5%) and integer (-9%) suggest that acovea flags are not useful for workstation use.

Also, -funsafe-math-optimizations gives the boost, but it's deprecated.
I think -O2 is the best choice for my gentoo installation, but I will try other combinations , removing deprecated or conflicting flags suggested by acovea.

The above combination requires that you use -funsafe-math-optimizations, otherwise you're breaking Acovea's method. Removing it from your profile will give you an entirely different set of results.

Posted: **Mon Apr 26, 2004 7:37 am**

To put that in other words: If there are certain flags known to break things for you (or you just don't feel like using them) then prevent Acovea from using them in the first place.

This can be done by modifying eg. /usr/share/acovea/config/gcc34_pentium4.acovea to not include specific flags.

Personally, I don't feel like doing an "emerge -e world" right now, so I'd like to prevent acovea from using -malign-double. Also, the -funit-at-a-time flag has been known to break a few apps, and it somewhat increases compile time as well, so that might be another candidate for removal, until GCC-3.4 gets more widely adopted by software developers and/or gets more mature.

Posted: **Mon Apr 26, 2004 7:59 am**

The above combination requires that you use -funsafe-math-optimizations, otherwise you're breaking Acovea's method. Removing it from your profile will give you an entirely different set of results.

Acovea method works fine for specific problems, the profiles are always kept separate. IMHO putting all together in make.conf will slow down the whole system.

Other tests I made indicate that -O3 optimization is generally better. But gcc people warned about -O3 and x86-64 ... so I use -O2 for the moment.

I found another interesting Acovea application: what are the best optimizations for pentium-m? You know, such processor is an hybrid between pentium3 and pentium4, with 1M L2 cache. I started with "alma", but I don't like to stress my laptop

Posted: **Mon Apr 26, 2004 9:13 am**

It might be possible to construct a benchmark that puts the entire X/glib/GTK+/GNOME (or X/Qt/KDE) code stack through the ringer (maybe based on an automated gtk-demo), and returns a single fitness number to Acovea. That would likely be more realistic (w.r.t. desktop performance) than running through tight computational loops as with the current benchmarks bundled with Acovea.

Of course, the compile-run cycle for the entire stack would be rather time-consuming; even limiting to just glib/GTK+, you're looking at ~8MB of source code and ~4.5MB of machine code.

Perhaps to start with one could restrict to glib, and run the test routines that come with it as a composite Acovea bench (with per test weights), but that would have no X dependence whatsoever.

Thoughts?

--------------

One the issue of running the tests one at a time, you could always just hit ctl-z to freeze the script when you wake up and then hit it again to start it up when you go to bed.

Posted: **Mon Apr 26, 2004 11:49 am**

Could one safely use cflags suggested by Acovea when bootstrapping?

Posted: **Mon Apr 26, 2004 12:08 pm**

ett_gramse_nap wrote:Could one safely use cflags suggested by Acovea when bootstrapping?

I don't know if I would risk binutils/glibc on such flags ....

Posted: **Mon Apr 26, 2004 3:54 pm**

Hi all, I've run acovea in console without other processes running and after ~29 hours it finished and here is the result.

[System: Athlon XP 2100+@1916hz, 512mb DDR Corsair, Asus A7V8X Motherboard]

Code: Select all

 Score |  So?  | Switch (annotation)
------------------------------------------------------------------------------
  36.2 |  Yes  | -fno-delayed-branch (! -O1)
  33.1 | Maybe | -fprefetch-loop-arrays
  32.9 | Maybe | -funsafe-math-optimizations (fast math)
  32.9 | Maybe | -fstrict-aliasing (-O2)
  31.2 |  Yes  | -fno-signaling-nans (fast math)
  30.4 |  Yes  | -falign-labels (-O2 GCC 3.3)
  29.7 | Maybe | -minline-all-stringops
  29.5 | Maybe | -ftracer
  27.8 |  Yes  | -fno-cprop-registers (! -O1)
  27.5 |  Yes  | -frerun-cse-after-loop (-O2)
  27.1 |   No  | -fforce-mem (-O2)
  27.1 |  Yes  | -fsched-interblock (-O2 GCC 3.3)
  27.0 |  Yes  | -fno-defer-pop (! -O1)
  26.7 |  Yes  | -mno-align-stringops
  26.7 | Maybe | -fcse-follow-jumps (-O2)
  26.3 |  Yes  | -fsched-spec (-O2 GCC 3.3)
  26.2 | Maybe | -finline-functions (-O3)
  26.0 |  Yes  | -fpeephole2 (-O2)
  26.0 |  Yes  | -fno-math-errno (fast math)
  25.8 |  Yes  | -freorder-functions (-O2 GCC 3.3)
  25.7 | Maybe | -fcse-skip-blocks (-O2)
  25.0 | Maybe | -falign-jumps (-O2 GCC 3.3)
  24.6 | Maybe | -fno-trapping-math (fast math)
  23.8 |   No  | -fstrength-reduce (-O2)
  23.7 |  Yes  | -fno-crossjumping (! -O1)
  23.4 | Maybe | -fno-if-conversion2 (! -O1)
  23.3 | Maybe | -mieee-fp
  22.7 | Maybe | -ffinite-math-only (fast math)
  22.7 | Maybe | -fno-merge-constants (! -O1)
  21.9 | Maybe | -frename-registers (-O3)
  21.5 | Maybe | -fregmove (-O2)
  20.8 |   No  | -fgcse (-O2)
  20.7 |   No  | -fcaller-saves (-O2)
  20.3 |   No  | -fschedule-insns2 (-O2)
  19.5 |   No  | -falign-loops (-O2 GCC 3.3)
  19.4 | Maybe | -freorder-blocks (-O2)
  19.3 | Maybe | -fno-thread-jumps (! -O1)
  18.0 |   No  | -fno-if-conversion (! -O1)
  17.9 | Maybe | -finline-limit
  17.9 | Maybe | -fno-omit-frame-pointer (! -O1)
  16.6 |   No  | -maccumulate-outgoing-args
  16.3 |   No  | -mno-push-args
  15.2 |   No  | -foptimize-sibling-calls (-O2)
  14.9 |   No  | -fno-inline
  14.7 |   No  | -fdelete-null-pointer-checks (-O2)
  14.6 |   No  | -frerun-loop-opt (-O2)
  13.9 |   No  | -fexpensive-optimizations (-O2)
  12.7 |   No  | -freduce-all-givs
  12.5 |   No  | -fmove-all-movables
  10.9 | Maybe | -mfpmath=sse,387
  10.9 |   No  | -fnew-ra
   8.3 |   No  | -fschedule-insns (-O2)
   7.6 |   No  | -fno-guess-branch-probability (! -O1)
   6.5 |   No  | -ffloat-store
   6.4 |   No  | -funroll-all-loops
   4.6 |   No  | -funroll-loops
   3.7 |   No  | -fno-loop-optimize (! -O1)
   0.0 |   No  | -mfpmath=387
   0.0 |   No  | -mfpmath=sse

In particular:

Code: Select all

 36.2 |  Yes  | -fno-delayed-branch (! -O1)
  31.2 |  Yes  | -fno-signaling-nans (fast math)
  30.4 |  Yes  | -falign-labels (-O2 GCC 3.3)
  27.8 |  Yes  | -fno-cprop-registers (! -O1)
  27.5 |  Yes  | -frerun-cse-after-loop (-O2)
  27.1 |  Yes  | -fsched-interblock (-O2 GCC 3.3)
  27.0 |  Yes  | -fno-defer-pop (! -O1)
  26.7 |  Yes  | -mno-align-stringops
  26.3 |  Yes  | -fsched-spec (-O2 GCC 3.3)
  26.0 |  Yes  | -fpeephole2 (-O2)
  26.0 |  Yes  | -fno-math-errno (fast math)
  25.8 |  Yes  | -freorder-functions (-O2 GCC 3.3)
  23.7 |  Yes  | -fno-crossjumping (! -O1)

Can I safely put these settings in my cflags? Are there any other flags which acovea doesn't show but which is better to put in [-Wall -pipe ?]?
And what's the meaning of a bang before -O1 ?
Many thanks for answers.

Posted: **Mon Apr 26, 2004 4:04 pm**

! -O1 (read as 'not -O1') means it is explicitly turning off an option that is normally enabled when you specify -O1.

Posted: **Mon Apr 26, 2004 4:19 pm**

So if I put -O2 in my cflags, would it comprise also the flags with ! -O1 or do I have to put them anyhow?

Posted: **Mon Apr 26, 2004 4:19 pm**

poisson wrote:
The above combination requires that you use -funsafe-math-optimizations, otherwise you're breaking Acovea's method. Removing it from your profile will give you an entirely different set of results.
Acovea method works fine for specific problems, the profiles are always kept separate. IMHO putting all together in make.conf will slow down the whole system.

So based on these findings, are we basically saying that while Acovea does the job it was set out to do, but that the current set of benchmarks is not really appropriate for the use gentoo'ers are trying to use it for (getting a set of CFLAGS for their make.conf)? Or is this something specific to the amd64 architecture, since others such as Hypnos have claimed general improvements to their system since implementing acovea-suggested flags?

Basically, for the benefit of others reading this thread, is it currently worth the 30-72hours necessary to run acovea to generate system-wide CFLAGS?

Posted: **Mon Apr 26, 2004 9:20 pm**

Daagar wrote:Basically, for the benefit of others reading this thread, is it currently worth the 30-72hours necessary to run acovea to generate system-wide CFLAGS?

Depends -- would you be happy with a 0-3% performance improvement for that time invested?

Posted: **Mon Apr 26, 2004 11:40 pm**

Hypnos wrote:
Daagar wrote:Basically, for the benefit of others reading this thread, is it currently worth the 30-72hours necessary to run acovea to generate system-wide CFLAGS?
Depends -- would you be happy with a 0-3% performance improvement for that time invested?

Heheh... for me personally, sure. I'm twisted like that. However, as the previous poster had found, there are instances where the performance goes _backwards_. I guess the question is whether the perofrmance gains will in general outweigh the reverse for an average gentoo'ers system (based on the assumption that most gentoo'ers are in a workstation envrionment, and not doing 24/7 number crunching). .

Gentoo Forums

Acovea-4.0.0 : Try out my ebuilds (and scripts)