Page 6 of 16
Posted: Thu Apr 22, 2004 8:41 pm
by Vagabond
Ack, this is *slow*, 3 hours to do the alma test, and there's 6 more to go. I guess the problem is that I have a SMP system and its only using one CPU. At least that means I have 1 CPU free to do other stuff without a noticeable performance hit, still its gonna take forever at this rate.
Vag
Posted: Thu Apr 22, 2004 8:52 pm
by robmoss
Of course it's slow, that's the point - it's very thorough...
Posted: Fri Apr 23, 2004 2:14 pm
by wilburpan
So after ~150 total hours of compiling, I have results from the acovea script run under two different conditions. These were done on a 700 MHz P3 laptop with 512 MB RAM. Just to get to the point, because this will be a long post, the conditions under which the acovea scripts are run influence the output.
My first run was done in console mode. I just left my laptop on my desk, and did not do anything with it other than let it generate heat while running the acovea script.
Code: Select all
Score | So? | Switch (annotation)
------------------------------------------------------------------------------
31.7 | Yes | -malign-double
31.5 | Yes | -fcaller-saves (-O2)
31.2 | Yes | -foptimize-sibling-calls (-O2)
30.9 | Yes | -freorder-blocks (-O2)
30.4 | Yes | -fsched-interblock (-O2 GCC 3.3)
29.8 | Maybe | -ftracer
29.2 | Yes | -fdelete-null-pointer-checks (-O2)
29.1 | Maybe | -funsafe-math-optimizations (fast math)
29.1 | Yes | -fmove-all-movables
29.0 | Yes | -fno-if-conversion2 (! -O1)
28.6 | Maybe | -fgcse (-O2)
27.5 | Maybe | -finline-limit
27.1 | Yes | -fno-thread-jumps (! -O1)
27.1 | Maybe | -finline-functions (-O3)
26.1 | Yes | -fno-defer-pop (! -O1)
26.1 | Yes | -fsched-spec (-O2 GCC 3.3)
26.0 | Maybe | -fstrict-aliasing (-O2)
25.7 | Yes | -ffinite-math-only (fast math)
25.6 | Maybe | -fexpensive-optimizations (-O2)
25.6 | Maybe | -fno-math-errno (fast math)
25.0 | Maybe | -fno-trapping-math (fast math)
24.9 | Yes | -fpeephole2 (-O2)
24.8 | Maybe | -fschedule-insns2 (-O2)
24.8 | Yes | -falign-jumps (-O2 GCC 3.3)
24.6 | Yes | -falign-labels (-O2 GCC 3.3)
24.4 | Maybe | -fprefetch-loop-arrays
24.3 | Maybe | -mno-align-stringops
23.7 | Maybe | -freorder-functions (-O2 GCC 3.3)
23.6 | Maybe | -frename-registers (-O3)
23.2 | Maybe | -falign-loops (-O2 GCC 3.3)
22.5 | Maybe | -fcse-follow-jumps (-O2)
21.8 | Maybe | -fno-delayed-branch (! -O1)
21.8 | Maybe | -fno-omit-frame-pointer (! -O1)
21.6 | Maybe | -fno-crossjumping (! -O1)
21.6 | Maybe | -frerun-cse-after-loop (-O2)
21.4 | Maybe | -fcse-skip-blocks (-O2)
20.9 | Maybe | -mieee-fp
20.9 | Maybe | -frerun-loop-opt (-O2)
20.7 | Maybe | -fno-cprop-registers (! -O1)
20.6 | Maybe | -maccumulate-outgoing-args
20.2 | Maybe | -fno-signaling-nans (fast math)
19.6 | Maybe | -fno-merge-constants (! -O1)
19.3 | No | -fforce-mem (-O2)
19.0 | Maybe | -mno-push-args
18.2 | No | -fno-if-conversion (! -O1)
18.2 | Maybe | -minline-all-stringops
15.6 | No | -freduce-all-givs
15.1 | No | -fstrength-reduce (-O2)
11.8 | No | -fnew-ra
11.6 | No | -fno-guess-branch-probability (! -O1)
11.1 | No | -fschedule-insns (-O2)
10.4 | No | -ffloat-store
10.4 | No | -fregmove (-O2)
9.6 | No | -fno-inline
9.4 | No | -funroll-all-loops
8.5 | No | -fomit-frame-pointer
7.2 | No | -funroll-loops
0.0 | No | -fno-loop-optimize (! -O1)
0.0 | No | -mfpmath=387
0.0 | No | -mfpmath=sse
0.0 | No | -mfpmath=sse,387
0.0 | No | -momit-leaf-frame-pointer
My second run was done from within a Konsole terminal while running KDE. During the 84 hours it took to do this run, I was using my computer for desktop-type work -- email, surfing, wordprocessing, etc. I did not do any other emerging/compiling while this was going on.
Code: Select all
Score | So? | Switch (annotation)
------------------------------------------------------------------------------
35.4 | Yes | -maccumulate-outgoing-args
32.5 | Maybe | -fstrict-aliasing (-O2)
32.3 | Maybe | -fgcse (-O2)
31.8 | Yes | -fno-cprop-registers (! -O1)
31.6 | Yes | -fno-trapping-math (fast math)
30.6 | Yes | -fexpensive-optimizations (-O2)
30.4 | Yes | -fno-delayed-branch (! -O1)
30.0 | Yes | -falign-jumps (-O2 GCC 3.3)
29.9 | Yes | -frerun-loop-opt (-O2)
29.8 | Yes | -minline-all-stringops
29.8 | Yes | -mieee-fp
29.7 | Yes | -fmove-all-movables
28.9 | Yes | -fno-omit-frame-pointer (! -O1)
28.8 | Yes | -fsched-interblock (-O2 GCC 3.3)
28.8 | Maybe | -freorder-blocks (-O2)
28.5 | Yes | -freorder-functions (-O2 GCC 3.3)
28.0 | Yes | -fno-merge-constants (! -O1)
27.8 | Yes | -frerun-cse-after-loop (-O2)
27.7 | Yes | -fschedule-insns2 (-O2)
27.4 | Yes | -fdelete-null-pointer-checks (-O2)
27.4 | Yes | -ffinite-math-only (fast math)
27.2 | Yes | -finline-functions (-O3)
26.3 | Yes | -fcse-skip-blocks (-O2)
26.2 | Yes | -falign-labels (-O2 GCC 3.3)
26.0 | Maybe | -falign-loops (-O2 GCC 3.3)
25.9 | Yes | -fno-if-conversion2 (! -O1)
25.7 | Maybe | -fcse-follow-jumps (-O2)
25.5 | Yes | -fcaller-saves (-O2)
25.3 | Maybe | -fno-thread-jumps (! -O1)
25.1 | Yes | -fpeephole2 (-O2)
24.6 | Maybe | -fforce-mem (-O2)
24.5 | Yes | -fprefetch-loop-arrays
24.3 | Maybe | -frename-registers (-O3)
24.2 | Maybe | -funsafe-math-optimizations (fast math)
23.8 | Yes | -foptimize-sibling-calls (-O2)
23.6 | Maybe | -fno-defer-pop (! -O1)
23.3 | Yes | -fstrength-reduce (-O2)
23.3 | Yes | -fsched-spec (-O2 GCC 3.3)
23.2 | Maybe | -mno-push-args
23.0 | Maybe | -ftracer
22.7 | Yes | -fregmove (-O2)
22.0 | Maybe | -fno-crossjumping (! -O1)
21.8 | Maybe | -malign-double
21.4 | Maybe | -freduce-all-givs
20.6 | Maybe | -finline-limit
19.7 | Maybe | -fno-math-errno (fast math)
19.4 | Maybe | -fno-signaling-nans (fast math)
18.3 | No | -fschedule-insns (-O2)
17.1 | Maybe | -mno-align-stringops
16.7 | No | -fno-if-conversion (! -O1)
16.5 | No | -fno-inline
12.4 | No | -ffloat-store
11.6 | No | -fno-guess-branch-probability (! -O1)
11.0 | Maybe | -mfpmath=sse
10.2 | No | -funroll-loops
9.8 | No | -funroll-all-loops
9.5 | No | -fnew-ra
9.3 | No | -fno-loop-optimize (! -O1)
8.9 | No | -fomit-frame-pointer
0.0 | No | -mfpmath=387
0.0 | No | -mfpmath=sse,387
0.0 | No | -momit-leaf-frame-pointer
As you can see, the results are quite different. I don't know why the running environment should affect the acovea results, and I'm not sure which set of recommendations I should use.
Any comments would be welcome.
Posted: Fri Apr 23, 2004 6:36 pm
by Daagar
This all sounds like great geeky fun, but I'm curious if there is a way to do the tests in steps (some sort of pause/restart feature). My old Athlon Tbird 933MHz can't devote the 48-72hours it would take in a single sitting, and it would be nice to let it simply run overnight and be able to stop it when necessary and restart the next night...
Posted: Fri Apr 23, 2004 6:41 pm
by Vagabond
you can do the tests one by one, the longest I've had a test take so far (down to the last 2) is 6 hours. The tests each generate a .run file that the perl script in this thread uses to generate its recommendations.
Vag
Posted: Fri Apr 23, 2004 7:04 pm
by aethyr
Daagar wrote:This all sounds like great geeky fun, but I'm curious if there is a way to do the tests in steps (some sort of pause/restart feature). My old Athlon Tbird 933MHz can't devote the 48-72hours it would take in a single sitting, and it would be nice to let it simply run overnight and be able to stop it when necessary and restart the next night...
CTRL-Z the same way you can pause any process ;)
Then later when you want it to run again, "fg" to resume in the foreground, or "bg" to resume in the background.
Posted: Fri Apr 23, 2004 9:06 pm
by robmoss
wilburpan wrote:As you can see, the results are quite different. I don't know why the running environment should affect the acovea results, and I'm not sure which set of recommendations I should use.
Use the former set. As previously stated, Acovea uses real time, not CPU time. Thus, the latter set are meaningless (sorry!). Acovea should be run with as little overhead as is possible. This includes stopping any distributed computing project clients, such as SETI@home or mprime, whilst the run is in progress.
Posted: Sat Apr 24, 2004 11:00 am
by darkless
I wonder what it'll take to rewrite acovea to use user time instead of real time, so it won't be neccessary to run the tests while the system is otherwise left idle.
Posted: Sat Apr 24, 2004 5:14 pm
by poisson
darkless wrote:I wonder what it'll take to rewrite acovea to use user time instead of real time, so it won't be neccessary to run the tests while the system is otherwise left idle.
I tried to benchmark many programs (mainly for floating point) in working environments. Both system & user time are NOT accurate. I think it's a scheduler-related problem (not only linux, but also cray, sp4, compaq). The only way to get good values from a benchmark is running it in a single user mode system.
Obviously considering user time instead of system time it's better.
I'm running acovea on my new dual opteron 242. Stay tuned

Posted: Sat Apr 24, 2004 5:50 pm
by Daagar
Vagabond, thanks. Didn't realize that each test was separate, so I can run one per night or something. As for the suggestion to just ctrl-z the process, yes of course I realized that would work, but I have to usually reboot back to Windows for the family during the day making that solution not practical.
Posted: Sun Apr 25, 2004 9:04 pm
by poisson
As promised, these are the results of acovea on my system
(Dual Opteron 242, 2x512MB, MSI mobo)
Code: Select all
Score | So? | Switch (annotation)
------------------------------------------------------------------------------
45.1 | Yes | -funsafe-math-optimizations (fast math)
43.1 | Yes | -ftracer
36.2 | Yes | -fcaller-saves (-O2)
36.1 | Yes | -fforce-mem (-O2)
35.6 | Maybe | -mieee-fp
34.5 | Yes | -fno-defer-pop (! -O1)
34.0 | Yes | -falign-jumps (-O2 GCC 3.3)
33.5 | Maybe | -fschedule-insns (-O2)
33.2 | Yes | -fdelete-null-pointer-checks (-O2)
33.1 | Yes | -fpeephole2 (-O2)
32.7 | Maybe | -fregmove (-O2)
32.7 | Yes | -finline-limit
32.3 | Yes | -falign-labels (-O2 GCC 3.3)
32.1 | Yes | -fcse-skip-blocks (-O2)
32.0 | Maybe | -fgcse (-O2)
31.6 | Yes | -freorder-blocks (-O2)
30.7 | Yes | -fcse-follow-jumps (-O2)
30.7 | Yes | -frename-registers (-O3)
30.6 | Yes | -mno-align-stringops
30.3 | Yes | -fno-if-conversion2 (! -O1)
29.9 | Maybe | -fno-thread-jumps (! -O1)
29.5 | Maybe | -fstrict-aliasing (-O2)
29.3 | Yes | -maccumulate-outgoing-args
28.9 | Maybe | -finline-functions (-O3)
28.7 | Yes | -minline-all-stringops
28.5 | Maybe | -fno-crossjumping (! -O1)
28.4 | Yes | -fno-cprop-registers (! -O1)
27.7 | Yes | -fsched-interblock (-O2 GCC 3.3)
27.6 | Maybe | -fstrength-reduce (-O2)
26.7 | Maybe | -fno-delayed-branch (! -O1)
26.4 | Yes | -freorder-functions (-O2 GCC 3.3)
26.3 | Maybe | -fno-omit-frame-pointer (! -O1)
25.8 | Yes | -fmove-all-movables
25.5 | Maybe | -fschedule-insns2 (-O2)
25.4 | Maybe | -falign-loops (-O2 GCC 3.3)
25.0 | Maybe | -fsched-spec (-O2 GCC 3.3)
24.9 | No | -fprefetch-loop-arrays
24.6 | Yes | -fexpensive-optimizations (-O2)
24.0 | Yes | -ffinite-math-only (fast math)
22.5 | Maybe | -fno-inline
21.8 | Maybe | -mno-push-args
21.4 | Maybe | -fno-signaling-nans (fast math)
20.9 | Maybe | -funroll-loops
20.8 | Maybe | -fno-merge-constants (! -O1)
19.8 | Maybe | -freduce-all-givs
19.4 | Maybe | -fno-math-errno (fast math)
19.2 | No | -funroll-all-loops
19.0 | Maybe | -foptimize-sibling-calls (-O2)
18.7 | No | -fnew-ra
18.5 | No | -mfpmath=387
15.8 | Maybe | -fno-trapping-math (fast math)
14.8 | No | -fno-if-conversion (! -O1)
14.6 | No | -ffloat-store
14.3 | Maybe | -frerun-cse-after-loop (-O2)
12.3 | No | -frerun-loop-opt (-O2)
11.3 | No | -mfpmath=sse,387
10.3 | No | -fno-guess-branch-probability (! -O1)
0.0 | No | -fno-loop-optimize (! -O1)
0.0 | No | -mfpmath=sse
Then I tried "nbench" with std optimization
Code: Select all
CFLAGS=-s -static -Wall -O2
CPU : Dual AuthenticAMD AMD Opteron(tm) Processor 242 1604MHz
L2 Cache : 1024 KB
OS : Linux 2.6.5-gentoo-r1
C compiler : 3.3.3
MEMORY INDEX : 11.142
INTEGER INDEX : 10.406
FLOATING-POINT INDEX: 15.911
and with acovea optimization
Code: Select all
CFLAGS = -s -static -Wall -O1 -funsafe-math-optimizations -ftracer -fcaller-saves -fforce-mem -fno-defer-pop -falign-jumps -fdelete-null-pointer-checks -fpeephole2 -finline-limit=600 -falign-labels -fcse-skip-blocks -freorder-blocks -fcse-follow-jumps -frename-registers -mno-align-stringops -fno-if-conversion2 -maccumulate-outgoing-args -minline-all-stringops -fno-cprop-registers -fsched-interblock -freorder-functions -fmove-all-movables -fexpensive-optimizations -ffinite-math-only
CPU : Dual AuthenticAMD AMD Opteron(tm) Processor 242 1604MHz
L2 Cache : 1024 KB
OS : Linux 2.6.5-gentoo-r1
C compiler : 3.3.3
MEMORY INDEX : 10.553
INTEGER INDEX : 9.486
FLOATING-POINT INDEX: 17.037
As you can see, floating-point is 7% better, but memory (-5%) and integer (-9%) suggest that acovea flags are not useful for workstation use.
Also, -funsafe-math-optimizations gives the boost, but it's deprecated.
I think -O2 is the best choice for my gentoo installation, but I will try other combinations , removing deprecated or conflicting flags suggested by acovea.
Posted: Sun Apr 25, 2004 10:19 pm
by aethyr
I don't know if nbench is the best choice...
These are Native Mode (a.k.a. Algorithm Level) tests; benchmarks designed to expose the capabilities of a system's CPU, FPU, and memory system.
I still don't know what the best benchmark would be though... I asked Scott in another thead, still waiting to hear back from him.
Posted: Sun Apr 25, 2004 10:49 pm
by poisson
aethyr wrote:I don't know if nbench is the best choice...
These are Native Mode (a.k.a. Algorithm Level) tests; benchmarks designed to expose the capabilities of a system's CPU, FPU, and memory system.
I still don't know what the best benchmark would be though... I asked Scott in another thead, still waiting to hear back from him.
Acovea uses similar alghoritms. It is good in an "ideal" world, where the machine is dedicated to number-crunching.
For multi-pourpose machines (ie workstations), there are a lot of parameters, and I think the best benchmark is X/Kde/Gnome/OpenOffice startup

Posted: Sun Apr 25, 2004 11:03 pm
by robmoss
poisson wrote:As you can see, floating-point is 7% better, but memory (-5%) and integer (-9%) suggest that acovea flags are not useful for workstation use.
Also, -funsafe-math-optimizations gives the boost, but it's deprecated.
I think -O2 is the best choice for my gentoo installation, but I will try other combinations , removing deprecated or conflicting flags suggested by acovea.
The above combination
requires that you use -funsafe-math-optimizations, otherwise you're breaking Acovea's method. Removing it from your profile will give you an entirely different set of results.
Posted: Mon Apr 26, 2004 7:37 am
by darkless
To put that in other words: If there are certain flags known to break things for you (or you just don't feel like using them) then prevent Acovea from using them in the first place.
This can be done by modifying eg. /usr/share/acovea/config/gcc34_pentium4.acovea to not include specific flags.
Personally, I don't feel like doing an "emerge -e world" right now, so I'd like to prevent acovea from using -malign-double. Also, the -funit-at-a-time flag has been known to break a few apps, and it somewhat increases compile time as well, so that might be another candidate for removal, until GCC-3.4 gets more widely adopted by software developers and/or gets more mature.
Posted: Mon Apr 26, 2004 7:59 am
by poisson
The above combination requires that you use -funsafe-math-optimizations, otherwise you're breaking Acovea's method. Removing it from your profile will give you an entirely different set of results.
Acovea method works fine for specific problems, the profiles are always kept separate. IMHO putting all together in make.conf will slow down the whole system.
Other tests I made indicate that -O3 optimization is
generally better. But gcc people warned about -O3 and x86-64 ... so I use -O2 for the moment.
I found another interesting Acovea application: what are the best optimizations for pentium-m? You know, such processor is an hybrid between pentium3 and pentium4, with 1M L2 cache. I started with "alma", but I don't like to stress my laptop

Posted: Mon Apr 26, 2004 9:13 am
by Hypnos
It might be possible to construct a benchmark that puts the entire X/glib/GTK+/GNOME (or X/Qt/KDE) code stack through the ringer (maybe based on an automated gtk-demo), and returns a single fitness number to Acovea. That would likely be more realistic (w.r.t. desktop performance) than running through tight computational loops as with the current benchmarks bundled with Acovea.
Of course, the compile-run cycle for the entire stack would be rather time-consuming; even limiting to just glib/GTK+, you're looking at ~8MB of source code and ~4.5MB of machine code.
Perhaps to start with one could restrict to glib, and run the test routines that come with it as a composite Acovea bench (with per test weights), but that would have no X dependence whatsoever.
Thoughts?
--------------
One the issue of running the tests one at a time, you could always just hit ctl-z to freeze the script when you wake up and then hit it again to start it up when you go to bed.
Posted: Mon Apr 26, 2004 11:49 am
by ett_gramse_nap
Could one safely use cflags suggested by Acovea when bootstrapping?
Posted: Mon Apr 26, 2004 12:08 pm
by Hypnos
ett_gramse_nap wrote:Could one safely use cflags suggested by Acovea when bootstrapping?
I don't know if I would risk binutils/glibc on such flags ....
Posted: Mon Apr 26, 2004 3:54 pm
by solka
Hi all, I've run acovea in console without other processes running and after ~29 hours it finished and here is the result.
[System: Athlon XP 2100+@1916hz, 512mb DDR Corsair, Asus A7V8X Motherboard]
Code: Select all
Score | So? | Switch (annotation)
------------------------------------------------------------------------------
36.2 | Yes | -fno-delayed-branch (! -O1)
33.1 | Maybe | -fprefetch-loop-arrays
32.9 | Maybe | -funsafe-math-optimizations (fast math)
32.9 | Maybe | -fstrict-aliasing (-O2)
31.2 | Yes | -fno-signaling-nans (fast math)
30.4 | Yes | -falign-labels (-O2 GCC 3.3)
29.7 | Maybe | -minline-all-stringops
29.5 | Maybe | -ftracer
27.8 | Yes | -fno-cprop-registers (! -O1)
27.5 | Yes | -frerun-cse-after-loop (-O2)
27.1 | No | -fforce-mem (-O2)
27.1 | Yes | -fsched-interblock (-O2 GCC 3.3)
27.0 | Yes | -fno-defer-pop (! -O1)
26.7 | Yes | -mno-align-stringops
26.7 | Maybe | -fcse-follow-jumps (-O2)
26.3 | Yes | -fsched-spec (-O2 GCC 3.3)
26.2 | Maybe | -finline-functions (-O3)
26.0 | Yes | -fpeephole2 (-O2)
26.0 | Yes | -fno-math-errno (fast math)
25.8 | Yes | -freorder-functions (-O2 GCC 3.3)
25.7 | Maybe | -fcse-skip-blocks (-O2)
25.0 | Maybe | -falign-jumps (-O2 GCC 3.3)
24.6 | Maybe | -fno-trapping-math (fast math)
23.8 | No | -fstrength-reduce (-O2)
23.7 | Yes | -fno-crossjumping (! -O1)
23.4 | Maybe | -fno-if-conversion2 (! -O1)
23.3 | Maybe | -mieee-fp
22.7 | Maybe | -ffinite-math-only (fast math)
22.7 | Maybe | -fno-merge-constants (! -O1)
21.9 | Maybe | -frename-registers (-O3)
21.5 | Maybe | -fregmove (-O2)
20.8 | No | -fgcse (-O2)
20.7 | No | -fcaller-saves (-O2)
20.3 | No | -fschedule-insns2 (-O2)
19.5 | No | -falign-loops (-O2 GCC 3.3)
19.4 | Maybe | -freorder-blocks (-O2)
19.3 | Maybe | -fno-thread-jumps (! -O1)
18.0 | No | -fno-if-conversion (! -O1)
17.9 | Maybe | -finline-limit
17.9 | Maybe | -fno-omit-frame-pointer (! -O1)
16.6 | No | -maccumulate-outgoing-args
16.3 | No | -mno-push-args
15.2 | No | -foptimize-sibling-calls (-O2)
14.9 | No | -fno-inline
14.7 | No | -fdelete-null-pointer-checks (-O2)
14.6 | No | -frerun-loop-opt (-O2)
13.9 | No | -fexpensive-optimizations (-O2)
12.7 | No | -freduce-all-givs
12.5 | No | -fmove-all-movables
10.9 | Maybe | -mfpmath=sse,387
10.9 | No | -fnew-ra
8.3 | No | -fschedule-insns (-O2)
7.6 | No | -fno-guess-branch-probability (! -O1)
6.5 | No | -ffloat-store
6.4 | No | -funroll-all-loops
4.6 | No | -funroll-loops
3.7 | No | -fno-loop-optimize (! -O1)
0.0 | No | -mfpmath=387
0.0 | No | -mfpmath=sse
In particular:
Code: Select all
36.2 | Yes | -fno-delayed-branch (! -O1)
31.2 | Yes | -fno-signaling-nans (fast math)
30.4 | Yes | -falign-labels (-O2 GCC 3.3)
27.8 | Yes | -fno-cprop-registers (! -O1)
27.5 | Yes | -frerun-cse-after-loop (-O2)
27.1 | Yes | -fsched-interblock (-O2 GCC 3.3)
27.0 | Yes | -fno-defer-pop (! -O1)
26.7 | Yes | -mno-align-stringops
26.3 | Yes | -fsched-spec (-O2 GCC 3.3)
26.0 | Yes | -fpeephole2 (-O2)
26.0 | Yes | -fno-math-errno (fast math)
25.8 | Yes | -freorder-functions (-O2 GCC 3.3)
23.7 | Yes | -fno-crossjumping (! -O1)
Can I safely put these settings in my cflags? Are there any other flags which acovea doesn't show but which is better to put in [-Wall -pipe ?]?
And what's the meaning of a bang before -O1 ?
Many thanks for answers.
Posted: Mon Apr 26, 2004 4:04 pm
by Daagar
! -O1 (read as 'not -O1') means it is explicitly turning off an option that is normally enabled when you specify -O1.
Posted: Mon Apr 26, 2004 4:19 pm
by solka
So if I put -O2 in my cflags, would it comprise also the flags with ! -O1 or do I have to put them anyhow?
Posted: Mon Apr 26, 2004 4:19 pm
by Daagar
poisson wrote:The above combination requires that you use -funsafe-math-optimizations, otherwise you're breaking Acovea's method. Removing it from your profile will give you an entirely different set of results.
Acovea method works fine for specific problems, the profiles are always kept separate. IMHO putting all together in make.conf will slow down the whole system.
So based on these findings, are we basically saying that while Acovea does the job it was set out to do, but that the current set of benchmarks is not really appropriate for the use gentoo'ers are trying to use it for (getting a set of CFLAGS for their make.conf)? Or is this something specific to the amd64 architecture, since others such as Hypnos have claimed general improvements to their system since implementing acovea-suggested flags?
Basically, for the benefit of others reading this thread, is it currently worth the 30-72hours necessary to run acovea to generate system-wide CFLAGS?
Posted: Mon Apr 26, 2004 9:20 pm
by Hypnos
Daagar wrote:Basically, for the benefit of others reading this thread, is it currently worth the 30-72hours necessary to run acovea to generate system-wide CFLAGS?
Depends -- would you be happy with a 0-3% performance improvement for that time invested?
Posted: Mon Apr 26, 2004 11:40 pm
by Daagar
Hypnos wrote:Daagar wrote:Basically, for the benefit of others reading this thread, is it currently worth the 30-72hours necessary to run acovea to generate system-wide CFLAGS?
Depends -- would you be happy with a 0-3% performance improvement for that time invested?
Heheh... for me personally, sure. I'm twisted like that. However, as the previous poster had found, there are instances where the performance goes _backwards_. I guess the question is whether the perofrmance gains will in general outweigh the reverse for an average gentoo'ers system (based on the assumption that most gentoo'ers are in a workstation envrionment, and not doing 24/7 number crunching). .