
Code: Select all
Score | So? | Switch (annotation)
------------------------------------------------------------------------------
31.7 | Yes | -malign-double
31.5 | Yes | -fcaller-saves (-O2)
31.2 | Yes | -foptimize-sibling-calls (-O2)
30.9 | Yes | -freorder-blocks (-O2)
30.4 | Yes | -fsched-interblock (-O2 GCC 3.3)
29.8 | Maybe | -ftracer
29.2 | Yes | -fdelete-null-pointer-checks (-O2)
29.1 | Maybe | -funsafe-math-optimizations (fast math)
29.1 | Yes | -fmove-all-movables
29.0 | Yes | -fno-if-conversion2 (! -O1)
28.6 | Maybe | -fgcse (-O2)
27.5 | Maybe | -finline-limit
27.1 | Yes | -fno-thread-jumps (! -O1)
27.1 | Maybe | -finline-functions (-O3)
26.1 | Yes | -fno-defer-pop (! -O1)
26.1 | Yes | -fsched-spec (-O2 GCC 3.3)
26.0 | Maybe | -fstrict-aliasing (-O2)
25.7 | Yes | -ffinite-math-only (fast math)
25.6 | Maybe | -fexpensive-optimizations (-O2)
25.6 | Maybe | -fno-math-errno (fast math)
25.0 | Maybe | -fno-trapping-math (fast math)
24.9 | Yes | -fpeephole2 (-O2)
24.8 | Maybe | -fschedule-insns2 (-O2)
24.8 | Yes | -falign-jumps (-O2 GCC 3.3)
24.6 | Yes | -falign-labels (-O2 GCC 3.3)
24.4 | Maybe | -fprefetch-loop-arrays
24.3 | Maybe | -mno-align-stringops
23.7 | Maybe | -freorder-functions (-O2 GCC 3.3)
23.6 | Maybe | -frename-registers (-O3)
23.2 | Maybe | -falign-loops (-O2 GCC 3.3)
22.5 | Maybe | -fcse-follow-jumps (-O2)
21.8 | Maybe | -fno-delayed-branch (! -O1)
21.8 | Maybe | -fno-omit-frame-pointer (! -O1)
21.6 | Maybe | -fno-crossjumping (! -O1)
21.6 | Maybe | -frerun-cse-after-loop (-O2)
21.4 | Maybe | -fcse-skip-blocks (-O2)
20.9 | Maybe | -mieee-fp
20.9 | Maybe | -frerun-loop-opt (-O2)
20.7 | Maybe | -fno-cprop-registers (! -O1)
20.6 | Maybe | -maccumulate-outgoing-args
20.2 | Maybe | -fno-signaling-nans (fast math)
19.6 | Maybe | -fno-merge-constants (! -O1)
19.3 | No | -fforce-mem (-O2)
19.0 | Maybe | -mno-push-args
18.2 | No | -fno-if-conversion (! -O1)
18.2 | Maybe | -minline-all-stringops
15.6 | No | -freduce-all-givs
15.1 | No | -fstrength-reduce (-O2)
11.8 | No | -fnew-ra
11.6 | No | -fno-guess-branch-probability (! -O1)
11.1 | No | -fschedule-insns (-O2)
10.4 | No | -ffloat-store
10.4 | No | -fregmove (-O2)
9.6 | No | -fno-inline
9.4 | No | -funroll-all-loops
8.5 | No | -fomit-frame-pointer
7.2 | No | -funroll-loops
0.0 | No | -fno-loop-optimize (! -O1)
0.0 | No | -mfpmath=387
0.0 | No | -mfpmath=sse
0.0 | No | -mfpmath=sse,387
0.0 | No | -momit-leaf-frame-pointerCode: Select all
Score | So? | Switch (annotation)
------------------------------------------------------------------------------
35.4 | Yes | -maccumulate-outgoing-args
32.5 | Maybe | -fstrict-aliasing (-O2)
32.3 | Maybe | -fgcse (-O2)
31.8 | Yes | -fno-cprop-registers (! -O1)
31.6 | Yes | -fno-trapping-math (fast math)
30.6 | Yes | -fexpensive-optimizations (-O2)
30.4 | Yes | -fno-delayed-branch (! -O1)
30.0 | Yes | -falign-jumps (-O2 GCC 3.3)
29.9 | Yes | -frerun-loop-opt (-O2)
29.8 | Yes | -minline-all-stringops
29.8 | Yes | -mieee-fp
29.7 | Yes | -fmove-all-movables
28.9 | Yes | -fno-omit-frame-pointer (! -O1)
28.8 | Yes | -fsched-interblock (-O2 GCC 3.3)
28.8 | Maybe | -freorder-blocks (-O2)
28.5 | Yes | -freorder-functions (-O2 GCC 3.3)
28.0 | Yes | -fno-merge-constants (! -O1)
27.8 | Yes | -frerun-cse-after-loop (-O2)
27.7 | Yes | -fschedule-insns2 (-O2)
27.4 | Yes | -fdelete-null-pointer-checks (-O2)
27.4 | Yes | -ffinite-math-only (fast math)
27.2 | Yes | -finline-functions (-O3)
26.3 | Yes | -fcse-skip-blocks (-O2)
26.2 | Yes | -falign-labels (-O2 GCC 3.3)
26.0 | Maybe | -falign-loops (-O2 GCC 3.3)
25.9 | Yes | -fno-if-conversion2 (! -O1)
25.7 | Maybe | -fcse-follow-jumps (-O2)
25.5 | Yes | -fcaller-saves (-O2)
25.3 | Maybe | -fno-thread-jumps (! -O1)
25.1 | Yes | -fpeephole2 (-O2)
24.6 | Maybe | -fforce-mem (-O2)
24.5 | Yes | -fprefetch-loop-arrays
24.3 | Maybe | -frename-registers (-O3)
24.2 | Maybe | -funsafe-math-optimizations (fast math)
23.8 | Yes | -foptimize-sibling-calls (-O2)
23.6 | Maybe | -fno-defer-pop (! -O1)
23.3 | Yes | -fstrength-reduce (-O2)
23.3 | Yes | -fsched-spec (-O2 GCC 3.3)
23.2 | Maybe | -mno-push-args
23.0 | Maybe | -ftracer
22.7 | Yes | -fregmove (-O2)
22.0 | Maybe | -fno-crossjumping (! -O1)
21.8 | Maybe | -malign-double
21.4 | Maybe | -freduce-all-givs
20.6 | Maybe | -finline-limit
19.7 | Maybe | -fno-math-errno (fast math)
19.4 | Maybe | -fno-signaling-nans (fast math)
18.3 | No | -fschedule-insns (-O2)
17.1 | Maybe | -mno-align-stringops
16.7 | No | -fno-if-conversion (! -O1)
16.5 | No | -fno-inline
12.4 | No | -ffloat-store
11.6 | No | -fno-guess-branch-probability (! -O1)
11.0 | Maybe | -mfpmath=sse
10.2 | No | -funroll-loops
9.8 | No | -funroll-all-loops
9.5 | No | -fnew-ra
9.3 | No | -fno-loop-optimize (! -O1)
8.9 | No | -fomit-frame-pointer
0.0 | No | -mfpmath=387
0.0 | No | -mfpmath=sse,387
0.0 | No | -momit-leaf-frame-pointerCTRL-Z the same way you can pause any process ;)Daagar wrote:This all sounds like great geeky fun, but I'm curious if there is a way to do the tests in steps (some sort of pause/restart feature). My old Athlon Tbird 933MHz can't devote the 48-72hours it would take in a single sitting, and it would be nice to let it simply run overnight and be able to stop it when necessary and restart the next night...

Use the former set. As previously stated, Acovea uses real time, not CPU time. Thus, the latter set are meaningless (sorry!). Acovea should be run with as little overhead as is possible. This includes stopping any distributed computing project clients, such as SETI@home or mprime, whilst the run is in progress.wilburpan wrote:As you can see, the results are quite different. I don't know why the running environment should affect the acovea results, and I'm not sure which set of recommendations I should use.
I tried to benchmark many programs (mainly for floating point) in working environments. Both system & user time are NOT accurate. I think it's a scheduler-related problem (not only linux, but also cray, sp4, compaq). The only way to get good values from a benchmark is running it in a single user mode system.darkless wrote:I wonder what it'll take to rewrite acovea to use user time instead of real time, so it won't be neccessary to run the tests while the system is otherwise left idle.
Code: Select all
Score | So? | Switch (annotation)
------------------------------------------------------------------------------
45.1 | Yes | -funsafe-math-optimizations (fast math)
43.1 | Yes | -ftracer
36.2 | Yes | -fcaller-saves (-O2)
36.1 | Yes | -fforce-mem (-O2)
35.6 | Maybe | -mieee-fp
34.5 | Yes | -fno-defer-pop (! -O1)
34.0 | Yes | -falign-jumps (-O2 GCC 3.3)
33.5 | Maybe | -fschedule-insns (-O2)
33.2 | Yes | -fdelete-null-pointer-checks (-O2)
33.1 | Yes | -fpeephole2 (-O2)
32.7 | Maybe | -fregmove (-O2)
32.7 | Yes | -finline-limit
32.3 | Yes | -falign-labels (-O2 GCC 3.3)
32.1 | Yes | -fcse-skip-blocks (-O2)
32.0 | Maybe | -fgcse (-O2)
31.6 | Yes | -freorder-blocks (-O2)
30.7 | Yes | -fcse-follow-jumps (-O2)
30.7 | Yes | -frename-registers (-O3)
30.6 | Yes | -mno-align-stringops
30.3 | Yes | -fno-if-conversion2 (! -O1)
29.9 | Maybe | -fno-thread-jumps (! -O1)
29.5 | Maybe | -fstrict-aliasing (-O2)
29.3 | Yes | -maccumulate-outgoing-args
28.9 | Maybe | -finline-functions (-O3)
28.7 | Yes | -minline-all-stringops
28.5 | Maybe | -fno-crossjumping (! -O1)
28.4 | Yes | -fno-cprop-registers (! -O1)
27.7 | Yes | -fsched-interblock (-O2 GCC 3.3)
27.6 | Maybe | -fstrength-reduce (-O2)
26.7 | Maybe | -fno-delayed-branch (! -O1)
26.4 | Yes | -freorder-functions (-O2 GCC 3.3)
26.3 | Maybe | -fno-omit-frame-pointer (! -O1)
25.8 | Yes | -fmove-all-movables
25.5 | Maybe | -fschedule-insns2 (-O2)
25.4 | Maybe | -falign-loops (-O2 GCC 3.3)
25.0 | Maybe | -fsched-spec (-O2 GCC 3.3)
24.9 | No | -fprefetch-loop-arrays
24.6 | Yes | -fexpensive-optimizations (-O2)
24.0 | Yes | -ffinite-math-only (fast math)
22.5 | Maybe | -fno-inline
21.8 | Maybe | -mno-push-args
21.4 | Maybe | -fno-signaling-nans (fast math)
20.9 | Maybe | -funroll-loops
20.8 | Maybe | -fno-merge-constants (! -O1)
19.8 | Maybe | -freduce-all-givs
19.4 | Maybe | -fno-math-errno (fast math)
19.2 | No | -funroll-all-loops
19.0 | Maybe | -foptimize-sibling-calls (-O2)
18.7 | No | -fnew-ra
18.5 | No | -mfpmath=387
15.8 | Maybe | -fno-trapping-math (fast math)
14.8 | No | -fno-if-conversion (! -O1)
14.6 | No | -ffloat-store
14.3 | Maybe | -frerun-cse-after-loop (-O2)
12.3 | No | -frerun-loop-opt (-O2)
11.3 | No | -mfpmath=sse,387
10.3 | No | -fno-guess-branch-probability (! -O1)
0.0 | No | -fno-loop-optimize (! -O1)
0.0 | No | -mfpmath=sse
Code: Select all
CFLAGS=-s -static -Wall -O2
CPU : Dual AuthenticAMD AMD Opteron(tm) Processor 242 1604MHz
L2 Cache : 1024 KB
OS : Linux 2.6.5-gentoo-r1
C compiler : 3.3.3
MEMORY INDEX : 11.142
INTEGER INDEX : 10.406
FLOATING-POINT INDEX: 15.911
Code: Select all
CFLAGS = -s -static -Wall -O1 -funsafe-math-optimizations -ftracer -fcaller-saves -fforce-mem -fno-defer-pop -falign-jumps -fdelete-null-pointer-checks -fpeephole2 -finline-limit=600 -falign-labels -fcse-skip-blocks -freorder-blocks -fcse-follow-jumps -frename-registers -mno-align-stringops -fno-if-conversion2 -maccumulate-outgoing-args -minline-all-stringops -fno-cprop-registers -fsched-interblock -freorder-functions -fmove-all-movables -fexpensive-optimizations -ffinite-math-only
CPU : Dual AuthenticAMD AMD Opteron(tm) Processor 242 1604MHz
L2 Cache : 1024 KB
OS : Linux 2.6.5-gentoo-r1
C compiler : 3.3.3
MEMORY INDEX : 10.553
INTEGER INDEX : 9.486
FLOATING-POINT INDEX: 17.037
I still don't know what the best benchmark would be though... I asked Scott in another thead, still waiting to hear back from him.These are Native Mode (a.k.a. Algorithm Level) tests; benchmarks designed to expose the capabilities of a system's CPU, FPU, and memory system.
Acovea uses similar alghoritms. It is good in an "ideal" world, where the machine is dedicated to number-crunching.aethyr wrote:I don't know if nbench is the best choice...I still don't know what the best benchmark would be though... I asked Scott in another thead, still waiting to hear back from him.These are Native Mode (a.k.a. Algorithm Level) tests; benchmarks designed to expose the capabilities of a system's CPU, FPU, and memory system.

The above combination requires that you use -funsafe-math-optimizations, otherwise you're breaking Acovea's method. Removing it from your profile will give you an entirely different set of results.poisson wrote:As you can see, floating-point is 7% better, but memory (-5%) and integer (-9%) suggest that acovea flags are not useful for workstation use.
Also, -funsafe-math-optimizations gives the boost, but it's deprecated.
I think -O2 is the best choice for my gentoo installation, but I will try other combinations , removing deprecated or conflicting flags suggested by acovea.
Acovea method works fine for specific problems, the profiles are always kept separate. IMHO putting all together in make.conf will slow down the whole system.The above combination requires that you use -funsafe-math-optimizations, otherwise you're breaking Acovea's method. Removing it from your profile will give you an entirely different set of results.

Code: Select all
Score | So? | Switch (annotation)
------------------------------------------------------------------------------
36.2 | Yes | -fno-delayed-branch (! -O1)
33.1 | Maybe | -fprefetch-loop-arrays
32.9 | Maybe | -funsafe-math-optimizations (fast math)
32.9 | Maybe | -fstrict-aliasing (-O2)
31.2 | Yes | -fno-signaling-nans (fast math)
30.4 | Yes | -falign-labels (-O2 GCC 3.3)
29.7 | Maybe | -minline-all-stringops
29.5 | Maybe | -ftracer
27.8 | Yes | -fno-cprop-registers (! -O1)
27.5 | Yes | -frerun-cse-after-loop (-O2)
27.1 | No | -fforce-mem (-O2)
27.1 | Yes | -fsched-interblock (-O2 GCC 3.3)
27.0 | Yes | -fno-defer-pop (! -O1)
26.7 | Yes | -mno-align-stringops
26.7 | Maybe | -fcse-follow-jumps (-O2)
26.3 | Yes | -fsched-spec (-O2 GCC 3.3)
26.2 | Maybe | -finline-functions (-O3)
26.0 | Yes | -fpeephole2 (-O2)
26.0 | Yes | -fno-math-errno (fast math)
25.8 | Yes | -freorder-functions (-O2 GCC 3.3)
25.7 | Maybe | -fcse-skip-blocks (-O2)
25.0 | Maybe | -falign-jumps (-O2 GCC 3.3)
24.6 | Maybe | -fno-trapping-math (fast math)
23.8 | No | -fstrength-reduce (-O2)
23.7 | Yes | -fno-crossjumping (! -O1)
23.4 | Maybe | -fno-if-conversion2 (! -O1)
23.3 | Maybe | -mieee-fp
22.7 | Maybe | -ffinite-math-only (fast math)
22.7 | Maybe | -fno-merge-constants (! -O1)
21.9 | Maybe | -frename-registers (-O3)
21.5 | Maybe | -fregmove (-O2)
20.8 | No | -fgcse (-O2)
20.7 | No | -fcaller-saves (-O2)
20.3 | No | -fschedule-insns2 (-O2)
19.5 | No | -falign-loops (-O2 GCC 3.3)
19.4 | Maybe | -freorder-blocks (-O2)
19.3 | Maybe | -fno-thread-jumps (! -O1)
18.0 | No | -fno-if-conversion (! -O1)
17.9 | Maybe | -finline-limit
17.9 | Maybe | -fno-omit-frame-pointer (! -O1)
16.6 | No | -maccumulate-outgoing-args
16.3 | No | -mno-push-args
15.2 | No | -foptimize-sibling-calls (-O2)
14.9 | No | -fno-inline
14.7 | No | -fdelete-null-pointer-checks (-O2)
14.6 | No | -frerun-loop-opt (-O2)
13.9 | No | -fexpensive-optimizations (-O2)
12.7 | No | -freduce-all-givs
12.5 | No | -fmove-all-movables
10.9 | Maybe | -mfpmath=sse,387
10.9 | No | -fnew-ra
8.3 | No | -fschedule-insns (-O2)
7.6 | No | -fno-guess-branch-probability (! -O1)
6.5 | No | -ffloat-store
6.4 | No | -funroll-all-loops
4.6 | No | -funroll-loops
3.7 | No | -fno-loop-optimize (! -O1)
0.0 | No | -mfpmath=387
0.0 | No | -mfpmath=sse
Code: Select all
36.2 | Yes | -fno-delayed-branch (! -O1)
31.2 | Yes | -fno-signaling-nans (fast math)
30.4 | Yes | -falign-labels (-O2 GCC 3.3)
27.8 | Yes | -fno-cprop-registers (! -O1)
27.5 | Yes | -frerun-cse-after-loop (-O2)
27.1 | Yes | -fsched-interblock (-O2 GCC 3.3)
27.0 | Yes | -fno-defer-pop (! -O1)
26.7 | Yes | -mno-align-stringops
26.3 | Yes | -fsched-spec (-O2 GCC 3.3)
26.0 | Yes | -fpeephole2 (-O2)
26.0 | Yes | -fno-math-errno (fast math)
25.8 | Yes | -freorder-functions (-O2 GCC 3.3)
23.7 | Yes | -fno-crossjumping (! -O1)
So based on these findings, are we basically saying that while Acovea does the job it was set out to do, but that the current set of benchmarks is not really appropriate for the use gentoo'ers are trying to use it for (getting a set of CFLAGS for their make.conf)? Or is this something specific to the amd64 architecture, since others such as Hypnos have claimed general improvements to their system since implementing acovea-suggested flags?poisson wrote:Acovea method works fine for specific problems, the profiles are always kept separate. IMHO putting all together in make.conf will slow down the whole system.The above combination requires that you use -funsafe-math-optimizations, otherwise you're breaking Acovea's method. Removing it from your profile will give you an entirely different set of results.
Heheh... for me personally, sure. I'm twisted like that. However, as the previous poster had found, there are instances where the performance goes _backwards_. I guess the question is whether the perofrmance gains will in general outweigh the reverse for an average gentoo'ers system (based on the assumption that most gentoo'ers are in a workstation envrionment, and not doing 24/7 number crunching). .Hypnos wrote:Depends -- would you be happy with a 0-3% performance improvement for that time invested?Daagar wrote:Basically, for the benefit of others reading this thread, is it currently worth the 30-72hours necessary to run acovea to generate system-wide CFLAGS?