Q: What is the best CFLAGS?*
A: CFLAGS="-mtune=i386 -O2"
* -- assuming emerge is a good indicator of overall performance
I tested various gcc optimizations using emerge (i.e. using python) and it turns out that -mtune=i386 produces the fastest code. I also did some X benchmarks and -mtune=i386 also came out on top. I don't have hard data for this test using -Os, but some quick tests indicate that it really sucks.
Software: gcc-4.2.4 glibc-2.7-r2 vanilla-sources 2.6.25.13 x86_64
Hardware: Core2 Duo 6400 @ 3.2 GHz, 800MHz RAM, 5GB
I AM NOT TIMING COMPILING TIME!
Code: Select all
1. recompile python and portage with new CFLAGS
2. emerge -pevt world (dry run to get stuff into memory)
3. 20 times: time emerge -pevt world &> /dev/null
4. goto 1The surprising results:

(1 standard deviation error bar)
0 -mtune=i386 -O2
1 -mtune=generic -O2
2 -march=nocona -O2 -ftree-loop-im -funswitch-loops
3 -march=nocona -O2
4 -march=nocona -O2 -ftree-loop-linear -funroll-loops -ftree-loop-ivcanon
5 -mtune=i686 -O2
6 -march=nocona -O2 -ftree-loop-linear -ftree-loop-im -funswitch-loops
7 -march=nocona -O3
8 -march=nocona -O2 -ftree-loop-ivcanon -funroll-loops
9 -march=nocona -O2 -fvariable-expansion-in-unroller -funroll-loops
10 -march=nocona -Os
11 -march=nocona -O2 -ftree-loop-linear -funroll-loops -fvariable-expansion-in-unroller
12 -march=nocona -O2 -ftree-loop-linear
13 -march=nocona -O2 -funroll-loops
14 -march=nocona -O2 -ftree-loop-linear -funroll-loops
I AM NOT TIMING COMPILING TIME!




