View previous topic :: View next topic |
Author |
Message |
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Sat Sep 11, 2004 3:12 am Post subject: |
|
|
taskara wrote: | you'll want something like:
Code: | -march=pentium4 -O1 -minline-all-stringops -frename-registers -finline-functions -mno-align-stringops -fsched-spec -fno-delayed-branch -mno-push-args-fcse-follow-jumps -fdelete-null-pointer-checks -fno-omit-frame-pointer -fno-if-conversion2 -falign-jumps -falign-loops -fno-math-errno -fcaller-saves -frerun-loop-opt -maccumulate-outgoing-args -fno-defer-pop -fno-trapping-math -fpeephole2 -fsched-interblock -foptimize-sibling-calls -falign-labels -fno-signaling-nans fprefetch-loop-arrays -fstrength-reduce -freduce-all-givs |
take out -malign-double as it breaks binaries.. and there might be some others that will kill your code, but I'm no expert. |
Don't use anything noted as "fast math" -- you're asking for trouble.
Also, -frename-registers might break the odd emerge, likely the same ones that fail on -O3.
Quote: | I would run Acovea with nothing much running in the background. If there are processes starting and stopping while you're running these tests (and therefore using your cpu / mem) you may not get an accurate result. |
Well, in principle, if you run Acovea with nothing else running, you're optimizing towards a dedicated system. You are more likely to optimize for smaller code size in a "dirtier" testing environment.
Quote: | Then again there's not much to suggest the results are accurate anyway. |
Accurate of what? How fast your desktop user experience will be? That problem is not even properly posed because the parameters are different for everybody. At a minimum, Acovea seems to a report flags that yield very stable binaries (buggy binaries can't pass on their flag "genes"). _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
c0balt Guru
Joined: 04 Jul 2004 Posts: 441 Location: Germany
|
Posted: Sat Sep 11, 2004 12:36 pm Post subject: |
|
|
Code: |
[mybox acovea]# ./outputscript
Score | So? | Switch (annotation)
------------------------------------------------------------------------------
41.8 | No | -finline-functions (-O3)
39.3 | No | -funsafe-math-optimizations (fast math)
35.6 | No | -fgcse (-O2)
32.5 | No | -freorder-functions (-O2 GCC 3.3)
32.2 | No | -fno-thread-jumps (! -O1)
31.6 | No | -fsched-spec (-O2 GCC 3.3)
31.4 | No | -malign-double
31.2 | No | -fpeephole2 (-O2)
31.1 | No | -fno-merge-constants (! -O1)
30.9 | No | -fschedule-insns2 (-O2)
30.8 | No | -fno-trapping-math (fast math)
30.7 | No | -fexpensive-optimizations (-O2)
30.2 | No | -freorder-blocks (-O2)
29.8 | No | -fno-if-conversion2 (! -O1)
29.6 | No | -fno-math-errno (fast math)
29.5 | No | -fpeel-loops
29.1 | No | -fno-cprop-registers (! -O1)
28.9 | No | -fprefetch-loop-arrays
28.8 | No | -frename-registers (-O3)
28.7 | No | -fcse-skip-blocks (-O2)
28.2 | No | -maccumulate-outgoing-args
28.1 | No | -funswitch-loops
28.0 | No | -frerun-cse-after-loop (-O2)
27.4 | No | -fcse-follow-jumps (-O2)
27.0 | No | -fno-crossjumping (! -O1)
26.8 | No | -fsched-interblock (-O2 GCC 3.3)
26.8 | No | -fweb
26.7 | No | -falign-labels (-O2 GCC 3.3)
26.6 | No | -fno-omit-frame-pointer (! -O1)
26.4 | No | -fstrict-aliasing (-O2)
26.3 | No | -fno-defer-pop (! -O1)
26.1 | No | -fno-delayed-branch (! -O1)
26.0 | No | -mno-push-args
25.8 | No | -freduce-all-givs
25.7 | No | -fregmove (-O2)
25.7 | No | -ftracer
25.6 | No | -frerun-loop-opt (-O2)
25.4 | No | -fcaller-saves (-O2)
25.2 | No | -fstrength-reduce (-O2)
25.1 | No | -fmove-all-movables
24.8 | No | -falign-functions
24.6 | No | -fno-signaling-nans (fast math)
24.3 | No | -funit-at-a-time
24.2 | No | -mno-align-stringops
23.8 | No | -falign-loops (-O2 GCC 3.3)
23.7 | No | -falign-jumps (-O2 GCC 3.3)
23.7 | No | -ffinite-math-only (fast math)
23.6 | No | -foptimize-sibling-calls (-O2)
23.5 | No | -fdelete-null-pointer-checks (-O2)
23.4 | No | -fnew-ra
23.4 | No | -mieee-fp
23.3 | No | -fforce-mem (-O2)
23.3 | No | -minline-all-stringops
22.6 | No | -finline-limit
21.1 | No | -fno-if-conversion (! -O1)
18.9 | No | -fno-inline
18.2 | No | -funroll-all-loops
16.7 | No | -fschedule-insns (-O2)
15.3 | No | -fbranch-target-load-optimize2
15.3 | No | -fno-loop-optimize (! -O1)
13.6 | No | -funroll-loops
13.5 | No | -fbranch-target-load-optimize
13.5 | No | -fno-guess-branch-probability (! -O1)
12.8 | No | -momit-leaf-frame-pointer
12.6 | No | -ffloat-store
12.2 | No | -mfpmath=sse
10.9 | No | -fomit-frame-pointer
9.1 | No | -mfpmath=387
5.8 | No | -mfpmath=sse,387
|
wtf, somethings gotta be wrong!? |
|
Back to top |
|
|
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Sat Sep 11, 2004 7:53 pm Post subject: |
|
|
c0balt wrote: | wtf, somethings gotta be wrong!? |
Umm, do you have enough statistics? It takes 1-2 days for all the tests to finish from the first script, then you run the second reporting script. _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
c0balt Guru
Joined: 04 Jul 2004 Posts: 441 Location: Germany
|
Posted: Sun Sep 12, 2004 12:10 am Post subject: |
|
|
Hypnos wrote: | c0balt wrote: | wtf, somethings gotta be wrong!? |
Umm, do you have enough statistics? It takes 1-2 days for all the tests to finish from the first script, then you run the second reporting script. |
yes, ive run the runscript.sh from page 2.
Though i added "-g 16" to runacovea, and changed that in outscript too.
16 Generations take less time and should be sufficient.
all the files are there:
Code: |
[mybox acovea]# ll
total 1696
-rw-r--r-- 1 root root 100919 Sep 10 11:07 alma.err
-rw-r--r-- 1 root root 147296 Sep 10 11:08 alma.run
-rw-r--r-- 1 root root 0 Sep 10 11:08 evo.err
-rw-r--r-- 1 root root 151896 Sep 10 15:48 evo.run
-rw-r--r-- 1 root root 592181 Sep 11 06:16 fft.err
-rw-r--r-- 1 root root 144936 Sep 11 06:16 fft.run
-rw-r--r-- 1 root root 0 Sep 11 06:16 huff.err
-rw-r--r-- 1 root root 147724 Sep 11 08:40 huff.run
-rw-r--r-- 1 root root 133415 Sep 11 08:44 lin.err
-rw-r--r-- 1 root root 838 Sep 11 08:40 lin.run
-rw-r--r-- 1 root root 0 Sep 11 08:44 mat1.err
-rw-r--r-- 1 root root 144813 Sep 11 10:52 mat1.run
-rwxr-xr-x 1 root root 6333 Sep 11 14:27 outputscript
-rw-r--r-- 1 root root 299 Sep 10 15:51 runscript.sh
-rw-r--r-- 1 root root 0 Sep 11 10:52 tree.err
-rw-r--r-- 1 root root 146138 Sep 11 14:26 tree.run
|
|
|
Back to top |
|
|
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Sun Sep 12, 2004 2:57 am Post subject: |
|
|
c0balt wrote: | Hypnos wrote: | c0balt wrote: | wtf, somethings gotta be wrong!? |
Umm, do you have enough statistics? It takes 1-2 days for all the tests to finish from the first script, then you run the second reporting script. |
yes, ive run the runscript.sh from page 2.
Though i added "-g 16" to runacovea, and changed that in outscript too.
16 Generations take less time and should be sufficient.
all the files are there: |
The same place you changed the generation # in outscript, switch off suppression of "statistical mumbo jumbo" and report the results here. I think enough flags aren't being built up to surpass the Gaussian limit in Poisson statistics and the confidence is being set to 0. _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
c0balt Guru
Joined: 04 Jul 2004 Posts: 441 Location: Germany
|
Posted: Sun Sep 12, 2004 11:11 am Post subject: |
|
|
Code: |
[mybox acovea]# ./outputscript
Mean | Std. Dev. | Conf. | Score | So? | Switch (annotation)
------------------------------------------------------------------------------
0.910 | 1.493 | 0.459 | 41.8 | No | -finline-functions (-O3)
0.860 | 1.417 | 0.458 | 39.3 | No | -funsafe-math-optimizations (fast
math)
0.787 | 1.314 | 0.453 | 35.6 | No | -fgcse (-O2)
0.705 | 1.152 | 0.461 | 32.5 | No | -freorder-functions (-O2 GCC 3.3)
0.700 | 1.145 | 0.461 | 32.2 | No | -fno-thread-jumps (! -O1)
0.688 | 1.128 | 0.459 | 31.6 | No | -fsched-spec (-O2 GCC 3.3)
0.682 | 1.118 | 0.460 | 31.4 | No | -malign-double
0.682 | 1.126 | 0.457 | 31.2 | No | -fpeephole2 (-O2)
0.682 | 1.129 | 0.456 | 31.1 | No | -fno-merge-constants (! -O1)
0.675 | 1.114 | 0.457 | 30.9 | No | -fschedule-insns2 (-O2)
0.672 | 1.109 | 0.457 | 30.8 | No | -fno-trapping-math (fast math)
0.682 | 1.149 | 0.449 | 30.7 | No | -fexpensive-optimizations (-O2)
0.667 | 1.116 | 0.452 | 30.2 | No | -freorder-blocks (-O2)
0.647 | 1.061 | 0.460 | 29.8 | No | -fno-if-conversion2 (! -O1)
0.653 | 1.084 | 0.454 | 29.6 | No | -fno-math-errno (fast math)
0.642 | 1.054 | 0.459 | 29.5 | No | -fpeel-loops
0.635 | 1.045 | 0.458 | 29.1 | No | -fno-cprop-registers (! -O1)
0.642 | 1.079 | 0.450 | 28.9 | No | -fprefetch-loop-arrays
0.628 | 1.031 | 0.459 | 28.8 | No | -frename-registers (-O3)
0.630 | 1.045 | 0.455 | 28.7 | No | -fcse-skip-blocks (-O2)
0.615 | 1.012 | 0.458 | 28.2 | No | -maccumulate-outgoing-args
0.613 | 1.005 | 0.459 | 28.1 | No | -funswitch-loops
0.610 | 1.001 | 0.459 | 28.0 | No | -frerun-cse-after-loop (-O2)
0.597 | 0.980 | 0.459 | 27.4 | No | -fcse-follow-jumps (-O2)
0.590 | 0.973 | 0.457 | 27.0 | No | -fno-crossjumping (! -O1)
0.588 | 0.970 | 0.457 | 26.8 | No | -fsched-interblock (-O2 GCC 3.3)
0.588 | 0.970 | 0.457 | 26.8 | No | -fweb
0.585 | 0.965 | 0.457 | 26.7 | No | -falign-labels (-O2 GCC 3.3)
0.585 | 0.972 | 0.454 | 26.6 | No | -fno-omit-frame-pointer (! -O1)
0.597 | 1.015 | 0.442 | 26.4 | No | -fstrict-aliasing (-O2)
0.575 | 0.950 | 0.457 | 26.3 | No | -fno-defer-pop (! -O1)
0.573 | 0.947 | 0.456 | 26.1 | No | -fno-delayed-branch (! -O1)
0.568 | 0.934 | 0.458 | 26.0 | No | -mno-push-args
0.568 | 0.942 | 0.455 | 25.8 | No | -freduce-all-givs
0.560 | 0.918 | 0.460 | 25.7 | No | -fregmove (-O2)
0.568 | 0.948 | 0.452 | 25.7 | No | -ftracer
0.557 | 0.916 | 0.459 | 25.6 | No | -frerun-loop-opt (-O2)
0.555 | 0.914 | 0.458 | 25.4 | No | -fcaller-saves (-O2)
0.555 | 0.924 | 0.454 | 25.2 | No | -fstrength-reduce (-O2)
0.547 | 0.902 | 0.458 | 25.1 | No | -fmove-all-movables
0.540 | 0.887 | 0.459 | 24.8 | No | -falign-functions
0.535 | 0.876 | 0.460 | 24.6 | No | -fno-signaling-nans (fast math)
0.532 | 0.882 | 0.455 | 24.3 | No | -funit-at-a-time
0.530 | 0.875 | 0.457 | 24.2 | No | -mno-align-stringops
0.517 | 0.849 | 0.459 | 23.8 | No | -falign-loops (-O2 GCC 3.3)
0.522 | 0.869 | 0.454 | 23.7 | No | -falign-jumps (-O2 GCC 3.3)
0.522 | 0.871 | 0.453 | 23.7 | No | -ffinite-math-only (fast math)
0.512 | 0.838 | 0.461 | 23.6 | No | -foptimize-sibling-calls (-O2)
0.520 | 0.867 | 0.453 | 23.5 | No | -fdelete-null-pointer-checks (-O2)
0.545 | 0.956 | 0.430 | 23.4 | No | -fnew-ra
0.517 | 0.864 | 0.452 | 23.4 | No | -mieee-fp
0.528 | 0.898 | 0.442 | 23.3 | No | -fforce-mem (-O2)
0.515 | 0.862 | 0.452 | 23.3 | No | -minline-all-stringops
0.495 | 0.816 | 0.458 | 22.6 | No | -finline-limit
0.480 | 0.823 | 0.439 | 21.1 | No | -fno-if-conversion (! -O1)
0.435 | 0.754 | 0.434 | 18.9 | No | -fno-inline
0.420 | 0.728 | 0.434 | 18.2 | No | -funroll-all-loops
0.383 | 0.658 | 0.437 | 16.7 | No | -fschedule-insns (-O2)
0.338 | 0.563 | 0.453 | 15.3 | No | -fbranch-target-load-optimize2
0.358 | 0.631 | 0.427 | 15.3 | No | -fno-loop-optimize (! -O1)
0.310 | 0.531 | 0.439 | 13.6 | No | -funroll-loops
0.295 | 0.485 | 0.459 | 13.5 | No | -fbranch-target-load-optimize
0.330 | 0.612 | 0.408 | 13.5 | No | -fno-guess-branch-probability (! -
O1)
0.295 | 0.511 | 0.435 | 12.8 | No | -momit-leaf-frame-pointer
0.295 | 0.528 | 0.426 | 12.6 | No | -ffloat-store
0.283 | 0.495 | 0.430 | 12.2 | No | -mfpmath=sse
0.258 | 0.463 | 0.424 | 10.9 | No | -fomit-frame-pointer
0.200 | 0.333 | 0.454 | 9.1 | No | -mfpmath=387
0.135 | 0.236 | 0.431 | 5.8 | No | -mfpmath=sse,387
|
dont tell me i have to run it again with 20 generations -_-;; |
|
Back to top |
|
|
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Sun Sep 12, 2004 11:25 am Post subject: |
|
|
c0balt wrote: | dont tell me i have to run it again with 20 generations -_-;; |
I think so -- the fluctuations are way bigger than the mean, so none of the flags have any confidence of benefit.
That being said, I don't know what magical thing happens between 16 and 20 generations that would cause the "genes" to settle down ... maybe your system is b0rked. _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
c0balt Guru
Joined: 04 Jul 2004 Posts: 441 Location: Germany
|
Posted: Sun Sep 12, 2004 11:47 am Post subject: |
|
|
Hypnos wrote: | ... maybe your system is b0rked. |
huh? any way to find that out?
I ran acovea some time ago, and the results where fine then, since then ive clocked down my CPU a bit, so the system should be more stable.. |
|
Back to top |
|
|
count_zero Guru
Joined: 17 May 2004 Posts: 460 Location: Little Rock, Arkansas, USA
|
Posted: Sun Sep 12, 2004 7:02 pm Post subject: |
|
|
Several people seem to be having the same problem as me, but I haven't seen the solution posted:
I run the script on page two, and I get a bunch of errors stating that acovea could not find the benchmark. If I run acovea according to the README, it works fine. Has someone figured out what is going wrong? _________________ "We must all hang together, or assuredly we shall all hang separately."
-Ben Franklin |
|
Back to top |
|
|
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Sun Sep 12, 2004 8:51 pm Post subject: |
|
|
c0balt wrote: | Hypnos wrote: | ... maybe your system is b0rked. |
huh? any way to find that out?
I ran acovea some time ago, and the results where fine then, since then ive clocked down my CPU a bit, so the system should be more stable.. |
*Shrug* You've exceeded my expertise. You'll know for sure something is wrong if you run the optimization again for 20 generations; also, make sure you didn't inadvertently change any of the statistical math in the reporting script when you changed the options at the top. _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Sun Sep 12, 2004 8:52 pm Post subject: |
|
|
count_zero wrote: | Several people seem to be having the same problem as me, but I haven't seen the solution posted:
I run the script on page two, and I get a bunch of errors stating that acovea could not find the benchmark. If I run acovea according to the README, it works fine. Has someone figured out what is going wrong? |
You didn't give the run script the name "runacovea", did you? _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
c0balt Guru
Joined: 04 Jul 2004 Posts: 441 Location: Germany
|
Posted: Sun Sep 12, 2004 8:58 pm Post subject: |
|
|
damn i dont get it, i just copied the script from page 2 again, changed the generations to 16, and now it works fine
dunno maybe i really changed something by mistake. so sorry for the trouble^^ |
|
Back to top |
|
|
count_zero Guru
Joined: 17 May 2004 Posts: 460 Location: Little Rock, Arkansas, USA
|
Posted: Mon Sep 13, 2004 1:29 am Post subject: |
|
|
Hypnos wrote: | count_zero wrote: | Several people seem to be having the same problem as me, but I haven't seen the solution posted:
I run the script on page two, and I get a bunch of errors stating that acovea could not find the benchmark. If I run acovea according to the README, it works fine. Has someone figured out what is going wrong? |
You didn't give the run script the name "runacovea", did you? |
No, I called the script "acovearun". _________________ "We must all hang together, or assuredly we shall all hang separately."
-Ben Franklin |
|
Back to top |
|
|
count_zero Guru
Joined: 17 May 2004 Posts: 460 Location: Little Rock, Arkansas, USA
|
Posted: Mon Sep 13, 2004 1:53 am Post subject: |
|
|
Aha, I think I found what was wrong. I changed the script on page 2 to look like this
Code: | #!/bin/sh
BENCHES="alma evo fft huff lin mat1 tree"
for bench in $BENCHES; do
echo ""
echo "*** $bench ***"
time runacovea -config gcc33_pentium4.acovea -bench ${bench}bench.c #removed backslash
1> ${bench}.run 2> ${bench}.err
done |
There was a backslash at the end of the line that begins "time runacovea" which I removed.
I'm not much of a programmer, is this bad? After I changed the script, it works (or at least appears to). _________________ "We must all hang together, or assuredly we shall all hang separately."
-Ben Franklin |
|
Back to top |
|
|
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Mon Sep 13, 2004 3:14 am Post subject: |
|
|
count_zero wrote: | Aha, I think I found what was wrong. I changed the script on page 2 to look like this
Code: | #!/bin/sh
BENCHES="alma evo fft huff lin mat1 tree"
for bench in $BENCHES; do
echo ""
echo "*** $bench ***"
time runacovea -config gcc33_pentium4.acovea -bench ${bench}bench.c #removed backslash
1> ${bench}.run 2> ${bench}.err
done |
There was a backslash at the end of the line that begins "time runacovea" which I removed.
I'm not much of a programmer, is this bad? After I changed the script, it works (or at least appears to). |
No, the backslash is essential -- it joins to the next line, which redirects the output into files which will be analyzed later.
There must've been some other error. _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
count_zero Guru
Joined: 17 May 2004 Posts: 460 Location: Little Rock, Arkansas, USA
|
Posted: Mon Sep 13, 2004 4:12 am Post subject: |
|
|
Well, I put the backslash back in and it's decided to work this time. Talk about weird. Thanks for the help! _________________ "We must all hang together, or assuredly we shall all hang separately."
-Ben Franklin |
|
Back to top |
|
|
ZeNTuRe n00b
Joined: 24 Jan 2004 Posts: 69
|
Posted: Tue Sep 14, 2004 5:37 pm Post subject: |
|
|
Is it normal getting segfaults running the first script?
Code: | amsterdam bin # ./testacovea
*** alma ***
./testacovea: line 10: 19670 Segmentation fault runacovea -config gcc33_pentium4.acovea -bench ${bench}bench.c >${bench}.run 2>${bench}.err
real 2m2.820s
user 1m48.556s
sys 0m4.082s
*** evo *** |
_________________ Did they touch God or did they touch the Sun? |
|
Back to top |
|
|
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Tue Sep 14, 2004 6:02 pm Post subject: |
|
|
ZeNTuRe wrote: | Is it normal getting segfaults running the first script?
Code: | amsterdam bin # ./testacovea
*** alma ***
./testacovea: line 10: 19670 Segmentation fault runacovea -config gcc33_pentium4.acovea -bench ${bench}bench.c >${bench}.run 2>${bench}.err
real 2m2.820s
user 1m48.556s
sys 0m4.082s
*** evo *** |
|
No -- your computer is broken. Or, possibly, you emerged Acovea itself with too-aggressive CFLAGx. _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
ZeNTuRe n00b
Joined: 24 Jan 2004 Posts: 69
|
Posted: Thu Sep 16, 2004 6:40 am Post subject: |
|
|
Hypnos wrote: | No -- your computer is broken. Or, possibly, you emerged Acovea itself with too-aggressive CFLAGx. |
Nah, only pushed my athlon 2200+ up to 2800+. Nominal clk is working fine now. _________________ Did they touch God or did they touch the Sun? |
|
Back to top |
|
|
taskara Advocate
Joined: 10 Apr 2002 Posts: 3763 Location: Australia
|
Posted: Thu Sep 23, 2004 10:42 pm Post subject: |
|
|
Firstly, does anyone include -pipe in their cpu flags?
Secondly (and this is slightly confusing atm), I was always under the impression (as are also perhaps 90% of all gentoo users) that -fomit-frame-pointer increased speed because it left out debugging information from the binary or some such (well "omits frame pointers" as the name suggests).
-fomit-frame-pointer is also in the gentoo docs as the suggested CFLAG. (ie -march=pentium4 -pipe -O3 -fomit-frame-pointer).
However, on all the tests I have run with Acovea, "-fomit-frame-pointer" get's an avid "NO".
Further to this, acovea seems to suggest -fno-omit-frame-pointer! Which suggests to me to leave in ALL pointers in each and every binary.
Does anyone have any opinions on this?
And lastly () cpu flags like -falign-loops, -falign-functions, and -falign-jumps etc can have a value. (ie -falign-functions=32)
Does anyone use such values in their Acovea generated CFLAGS, or do you leave it off and let it use a default?
Are there better values for certains cpus? And if so, are these listed somewhere?
Cheers!! _________________ Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer! |
|
Back to top |
|
|
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Fri Sep 24, 2004 1:41 am Post subject: |
|
|
taskara wrote: | Firstly, does anyone include -pipe in their cpu flags? |
Yes. It reduces the disk activity, which is nice during long emerges.
Quote: | Secondly (and this is slightly confusing atm), I was always under the impression (as are also perhaps 90% of all gentoo users) that -fomit-frame-pointer increased speed because it left out debugging information from the binary or some such (well "omits frame pointers" as the name suggests).
-fomit-frame-pointer is also in the gentoo docs as the suggested CFLAG. (ie -march=pentium4 -pipe -O3 -fomit-frame-pointer).
However, on all the tests I have run with Acovea, "-fomit-frame-pointer" get's an avid "NO".
Further to this, acovea seems to suggest -fno-omit-frame-pointer! Which suggests to me to leave in ALL pointers in each and every binary.
Does anyone have any opinions on this? |
Omitting frame pointers saves registers, which is only useful if your CPU is register starved. This is not an issue unless you are running a simulation or something similarly nasty with hand-coded platform-specific optimizations.
Quote: | And lastly () cpu flags like -falign-loops, -falign-functions, and -falign-jumps etc can have a value. (ie -falign-functions=32)
Does anyone use such values in their Acovea generated CFLAGS, or do you leave it off and let it use a default?
Are there better values for certains cpus? And if so, are these listed somewhere?
Cheers!! |
I, and it seems everyone else, just uses the default values. Figuring out the optimal values for each CPU would take an enormous amount of a regressions testing using tools like Acovea. _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
taskara Advocate
Joined: 10 Apr 2002 Posts: 3763 Location: Australia
|
Posted: Fri Sep 24, 2004 2:17 am Post subject: |
|
|
Thanks Hypnos,
Can I ask, do you use -fomit-frame-pointer? or -fno-omit-frame-pointer? What results did you get from your Acovea tests?
Cheers! _________________ Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer! |
|
Back to top |
|
|
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Fri Sep 24, 2004 2:19 am Post subject: |
|
|
taskara wrote: | Thanks Hypnos,
Can I ask, do you use -fomit-frame-pointer? or -fno-omit-frame-pointer? What results did you get from your Acovea tests?
Cheers! |
My Acovea CFLAGS, with the dangerous switches removed:
Code: | CFLAGS="-pipe -Wall -O2 -march=pentium4 -mcpu=pentium4 -maccumulate-outgoing-args -minline-all-stringops -fmove-all-movables -fno-if-conversion2 -fno-crossjumping -fno-delayed-branch -fno-omit-frame-pointer -fno-merge-constants -fno-thread-jumps" |
This is for a Mobile P4 1.6 GHz (_not_ Pentium-M). _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
taskara Advocate
Joined: 10 Apr 2002 Posts: 3763 Location: Australia
|
Posted: Fri Sep 24, 2004 2:43 am Post subject: |
|
|
thanks,
so you also use -f[b]no[/]-omit-frame-pointer.
I see you used -O2, rather than -O1 - so I assume from your Acovea results, there were not many -O2 optimisations that created slower code for you? And if there were, then you countered them if they had a "-no-[flag]" option.
Cheers. _________________ Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer! |
|
Back to top |
|
|
Hypnos Advocate
Joined: 18 Jul 2002 Posts: 2889 Location: Omnipresent
|
Posted: Fri Sep 24, 2004 2:45 am Post subject: |
|
|
taskara wrote: | thanks,
so you also use -f[b]no[/]-omit-frame-pointer.
I see you used -O2, rather than -O1 - so I assume from your Acovea results, there were not many -O2 optimisations that created slower code for you? And if there were, then you countered them if they had a "-no-[flag]" option. |
Precisely. _________________ Personal overlay | Simple backup scheme |
|
Back to top |
|
|
|