View previous topic :: View next topic |
Author |
Message |
darkbasic Tux's lil' helper

Joined: 06 Sep 2006 Posts: 133
|
Posted: Mon Mar 01, 2010 2:05 pm Post subject: (not so) safe cflags and per package cflags |
|
|
I was experimenting a bit on my laptop and wondering if there is any way to gain some extra performance, avoiding use of ugly cflags in make.conf of course.
What I aim to do is tweaking a bit make.conf's cflags, but put there only optimizations which do not usually hurt performance (so nothing like -O3) and which build flawlessy the vast majority of the packages.
Second step is adding per package cflags for the few packages left which does not build with the tweaked global cflags.
Last but not the least build some packages which get benefits and which are known to be safe with the icc compiler (for example python gets an extra +15% boost and the kernel is _much more_ quicker in context switching, although maybe it's not so safe to compile it with icc) and tweak the gcc flags for some key packages.
I was wondering if someone already made some benchmarks and what flags does he use. I played mainly with icc and usually it is as fast as gcc-4.4 on the amd64 architecture (on a 45nm core2 duo), with few noticeable exceptions.
It would be interesting to make a database with the best flags for each package and keep make.conf as clean and safe as possible. _________________ Computers are like air conditioners:
they stop working properly when you open Windows...
Coltiva Linux, Windows si pianta da solo.
http://www.linuxsystems.it/ |
|
Back to top |
|
 |
nikaya Veteran


Joined: 13 May 2006 Posts: 1471 Location: Germany
|
|
Back to top |
|
 |
darkbasic Tux's lil' helper

Joined: 06 Sep 2006 Posts: 133
|
Posted: Tue Mar 02, 2010 11:57 am Post subject: |
|
|
Thank you for the link, but I already use /etc/portage/env and I already modified my bashrc to add custom flags for icc.
The hard part of the work is find which flags and which compiler are better for a package.
Something like:
Code: | ICC:
dev-lang/python -O3 -ipo -xSSE4.1 -gcc
media-sound/lame -O2 -ip -xSSE4.1 -gcc |
Code: | pybench
Test minimum run-time average run-time
this other diff this other diff
-------------------------------------------------------------------------------
BuiltinFunctionCalls: 112ms 133ms -15.9% 119ms 135ms -11.7%
BuiltinMethodLookup: 92ms 108ms -15.3% 95ms 110ms -13.5%
CompareFloats: 119ms 138ms -13.5% 122ms 139ms -12.1%
CompareFloatsIntegers: 110ms 125ms -12.0% 113ms 126ms -10.5%
CompareIntegers: 90ms 120ms -25.3% 91ms 121ms -24.8%
CompareInternedStrings: 122ms 110ms +10.9% 124ms 112ms +10.0%
CompareLongs: 86ms 99ms -13.2% 87ms 100ms -13.3%
CompareStrings: 85ms 127ms -33.3% 89ms 129ms -31.3%
CompareUnicode: 115ms 102ms +13.3% 117ms 103ms +14.3%
ComplexPythonFunctionCalls: 121ms 138ms -11.9% 125ms 140ms -10.8%
ConcatStrings: 131ms 148ms -11.5% 149ms 170ms -12.4%
ConcatUnicode: 124ms 192ms -35.5% 128ms 211ms -39.2%
CreateInstances: 125ms 130ms -3.9% 129ms 132ms -2.3%
CreateNewInstances: 101ms 96ms +4.7% 104ms 98ms +6.1%
CreateStringsWithConcat: 107ms 150ms -28.7% 109ms 153ms -28.6%
CreateUnicodeWithConcat: 112ms 116ms -3.1% 120ms 121ms -0.8%
DictCreation: 91ms 101ms -9.4% 94ms 102ms -7.7%
DictWithFloatKeys: 102ms 123ms -17.0% 105ms 123ms -15.1%
DictWithIntegerKeys: 101ms 118ms -14.1% 105ms 120ms -12.5%
DictWithStringKeys: 94ms 107ms -12.8% 96ms 108ms -11.3%
ForLoops: 68ms 78ms -12.3% 71ms 79ms -9.7%
IfThenElse: 106ms 105ms +1.1% 108ms 106ms +2.3%
ListSlicing: 107ms 112ms -4.9% 125ms 113ms +10.8%
NestedForLoops: 92ms 113ms -18.0% 96ms 113ms -15.8%
NestedListComprehensions: 116ms 145ms -20.1% 120ms 148ms -19.1%
NormalClassAttribute: 105ms 112ms -6.2% 107ms 113ms -4.9%
NormalInstanceAttribute: 104ms 99ms +5.1% 107ms 101ms +6.7%
PythonFunctionCalls: 116ms 126ms -7.6% 118ms 127ms -6.6%
PythonMethodCalls: 137ms 148ms -7.2% 141ms 150ms -5.6%
Recursion: 166ms 163ms +1.9% 171ms 163ms +4.6%
SecondImport: 94ms 103ms -8.6% 99ms 104ms -5.4%
SecondPackageImport: 97ms 110ms -11.5% 101ms 111ms -8.8%
SecondSubmoduleImport: 126ms 142ms -10.8% 130ms 143ms -9.0%
SimpleComplexArithmetic: 121ms 127ms -4.9% 123ms 128ms -3.4%
SimpleDictManipulation: 106ms 117ms -9.6% 111ms 121ms -8.5%
SimpleFloatArithmetic: 116ms 133ms -12.2% 121ms 135ms -10.6%
SimpleIntFloatArithmetic: 79ms 92ms -14.0% 81ms 93ms -12.7%
SimpleIntegerArithmetic: 80ms 94ms -14.7% 81ms 94ms -13.7%
SimpleListComprehensions: 97ms 121ms -19.6% 102ms 123ms -17.4%
SimpleListManipulation: 84ms 98ms -13.8% 87ms 98ms -10.9%
SimpleLongArithmetic: 106ms 108ms -1.8% 107ms 109ms -1.3%
SmallLists: 112ms 118ms -5.4% 114ms 120ms -5.2%
SmallTuples: 100ms 114ms -12.4% 101ms 116ms -12.8%
SpecialClassAttribute: 105ms 110ms -4.7% 107ms 112ms -5.2%
SpecialInstanceAttribute: 125ms 208ms -39.9% 127ms 211ms -39.9%
StringMappings: 118ms 107ms +10.2% 119ms 107ms +10.5%
StringPredicates: 110ms 135ms -18.6% 112ms 136ms -17.7%
StringSlicing: 111ms 117ms -5.2% 120ms 125ms -3.5%
TryExcept: 68ms 90ms -24.4% 69ms 90ms -24.0%
TryFinally: 97ms 106ms -8.3% 99ms 107ms -7.2%
TryRaiseExcept: 104ms 111ms -6.6% 105ms 113ms -6.8%
TupleSlicing: 114ms 133ms -14.7% 119ms 142ms -16.2%
UnicodeMappings: 109ms 147ms -26.2% 111ms 148ms -25.2%
UnicodePredicates: 117ms 128ms -8.7% 119ms 132ms -10.5%
UnicodeProperties: 116ms 117ms -1.3% 120ms 123ms -3.0%
UnicodeSlicing: 125ms 128ms -2.9% 128ms 134ms -4.6%
WithFinally: 135ms 150ms -9.6% 141ms 152ms -6.7%
WithRaiseExcept: 117ms 124ms -5.4% 122ms 125ms -2.5%
-------------------------------------------------------------------------------
Totals: 6247ms 7070ms -11.6% 6459ms 7217ms -10.5%
(this=iccO3.pybench, other=gccO2.pybench) |
If someone uses a package a lot and he found the better compiler/flags for it, sharing them we can easily make a little database with all the packages which benefits greatly from some optimizations.
Adding them to /etc/portage/packages.gcc-cflags or packages.icc-cflags portage will use automatically the better flags for any known package.
I'm pretty sure I'm not the only one who have already experimented custom cflags for some packages. _________________ Computers are like air conditioners:
they stop working properly when you open Windows...
Coltiva Linux, Windows si pianta da solo.
http://www.linuxsystems.it/ |
|
Back to top |
|
 |
Spaulding Apprentice


Joined: 16 Apr 2006 Posts: 159 Location: /dev/vagina
|
Posted: Tue Mar 02, 2010 6:10 pm Post subject: |
|
|
We can create a mailing list or www interface. User will be able to add his options and results. But I have only one question, whether it is worth? |
|
Back to top |
|
 |
darkbasic Tux's lil' helper

Joined: 06 Sep 2006 Posts: 133
|
Posted: Tue Mar 02, 2010 7:20 pm Post subject: |
|
|
It depends... usually gains are in the range of 1-15% which is quite enough in my opinion, sometimes even greater (for example Sun Studio's auto-parallelization technology doubles (2x!) the performance in SPEC CPU2006).
We haven't to find the best flags for every package, so if someone find a better flag/compiler for a package and share it, it is worth. _________________ Computers are like air conditioners:
they stop working properly when you open Windows...
Coltiva Linux, Windows si pianta da solo.
http://www.linuxsystems.it/ |
|
Back to top |
|
 |
robnotts Guru


Joined: 15 Mar 2004 Posts: 405 Location: Nottingham, UK
|
Posted: Wed Mar 03, 2010 5:58 am Post subject: |
|
|
For info, have successfully been running my laptop, which seems stable and fast, with these...
Code: | CFLAGS="-O2 -march=native -ftree-vectorize -fomit-frame-pointer -pipe"
CXXFLAGS="${CFLAGS}"
CPPFLAGS="${CFLAGS}"
LDFLAGS="-Wl,-O1 -Wl,--as-needed" |
...with overrides for...
Code: |
/etc/portage/env/app-office:
openoffice
CFLAGS="-O2 -march=native -fomit-frame-pointer -pipe"
CXXFLAGS="-pipe"
CPPFLAGS="-pipe"
/etc/portage/env/dev-db:
mysql
CFLAGS="-O2 -march=native -ftree-vectorize -fomit-frame-pointer -fno-strict-aliasing -pipe"
CXXFLAGS="-O2 -march=native -ftree-vectorize -fomit-frame-pointer -fno-strict-aliasing -pipe"
CPPFLAGS="-O2 -march=native -ftree-vectorize -fomit-frame-pointer -fno-strict-aliasing -pipe"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
/etc/portage/env/sys-libs:
libstdc++-v3
CFLAGS="-O2 -fomit-frame-pointer -pipe"
CXXFLAGS="-O2 -fomit-frame-pointer -pipe"
CPPFLAGS="-O2 -fomit-frame-pointer -pipe" |
... the openoffice and libstdc++-v3 flags are to allow them to compile at all, and the mysql flags workaround a bug as listed somewhere which was causing database corruption.
I specifically went for the vectorisation flags as I use this laptop as my main work machine for browsing, photo editing, etc, so wanted it to be as fast as possible, and use as few cpu resources as possible. it does run in 64bit.
I guess the next step would be to try some of the graphite loop manipulation flags on some of the media-libs to see if they make any difference.
Rob. _________________ ---
Gentoo Phenom][ X4 955 on AMD790 + Geforce 220GT 8GB/1.75TB (Desktop)
+ MythTV (3xFreeview,1xFreesat HD) on 1080p
Gentoo Turion64 X2 Geforce 6150 2GB/120GB (Laptop) |
|
Back to top |
|
 |
Yamakuzure Advocate


Joined: 21 Jun 2006 Posts: 2321 Location: Adendorf, Germany
|
Posted: Wed Mar 03, 2010 11:09 am Post subject: |
|
|
My make.conf uses this, and my laptop works flawlessly with it: Code: | ## CFLAGS:
#-------------#
CFLAGS="-march=native -O2 -pipe -mssse3" ## Default and safe flags
CFLAGS="${CFLAGS} -ftree-vectorize" ## For non tool chain
CFLAGS="${CFLAGS} -mno-push-args" ## Should not be added unless safety is known
## LDFLAGS:
#-------------#
LDFLAGS="${LDFLAGS} -Wl,--sort-common -Wl,--hash-style=gnu" ## Default and safe flags
LDFLAGS="${LDFLAGS} -Wl,--as-needed" ## Optimization - if merges break due to unknown symbols, disable this!
LDFLAGS="${LDFLAGS} -Wl,-O1 -s" ## Flags for stripping and optimizing binaries
| Note: The comments are for me, and not added for this post. So I do not know whether my "commented in" thoughts are entirely correct.
Note 2: I comment lines for individual packages as I see fit. _________________ Edited 220,176 times by Yamakuzure |
|
Back to top |
|
 |
darkbasic Tux's lil' helper

Joined: 06 Sep 2006 Posts: 133
|
Posted: Wed Mar 03, 2010 11:39 am Post subject: |
|
|
robnotts wrote: | I guess the next step would be to try some of the graphite loop manipulation flags on some of the media-libs to see if they make any difference. |
I'm experimenting with "-floop-interchange -floop-strip-mine -floop-block" but I still have to bench it. It seems to be quite safe.
Maybe something more aggressive like "-floop-parallelize-all -ftree-parallelize-loops=4" is worth trying too... _________________ Computers are like air conditioners:
they stop working properly when you open Windows...
Coltiva Linux, Windows si pianta da solo.
http://www.linuxsystems.it/ |
|
Back to top |
|
 |
SithMaddox Tux's lil' helper


Joined: 02 Jul 2004 Posts: 149
|
Posted: Sun Mar 14, 2010 8:25 pm Post subject: |
|
|
Yamakuzure wrote: | My make.conf uses this, and my laptop works flawlessly with it: Code: | ## CFLAGS:
#-------------#
CFLAGS="-march=native -O2 -pipe -mssse3" ## Default and safe flags
CFLAGS="${CFLAGS} -ftree-vectorize" ## For non tool chain
CFLAGS="${CFLAGS} -mno-push-args" ## Should not be added unless safety is known
## LDFLAGS:
#-------------#
LDFLAGS="${LDFLAGS} -Wl,--sort-common -Wl,--hash-style=gnu" ## Default and safe flags
LDFLAGS="${LDFLAGS} -Wl,--as-needed" ## Optimization - if merges break due to unknown symbols, disable this!
LDFLAGS="${LDFLAGS} -Wl,-O1 -s" ## Flags for stripping and optimizing binaries
| Note: The comments are for me, and not added for this post. So I do not know whether my "commented in" thoughts are entirely correct.
Note 2: I comment lines for individual packages as I see fit. |
Doesn't -march=native imply -mssse3? |
|
Back to top |
|
 |
darkbasic Tux's lil' helper

Joined: 06 Sep 2006 Posts: 133
|
Posted: Mon Mar 15, 2010 11:47 am Post subject: |
|
|
SithMaddox wrote: | Doesn't -march=native imply -mssse3? |
Uhm... I think so but I'm not sure... _________________ Computers are like air conditioners:
they stop working properly when you open Windows...
Coltiva Linux, Windows si pianta da solo.
http://www.linuxsystems.it/ |
|
Back to top |
|
 |
amade n00b

Joined: 30 Mar 2009 Posts: 8
|
Posted: Mon Mar 15, 2010 12:22 pm Post subject: |
|
|
Code: |
# gcc -march=native -Q --help=target
...
-msseregparm [disabled]
-mssse3 [disabled]
-mstack-arg-probe [disabled]
...
# gcc -march=native -Q --help=target -msse3
...
-msseregparm [disabled]
-mssse3 [enabled]
-mstack-arg-probe [disabled]
...
|
|
|
Back to top |
|
 |
loftwyr l33t


Joined: 29 Dec 2004 Posts: 970 Location: 43°38'23.62"N 79°27'8.60"W
|
Posted: Mon Mar 15, 2010 1:05 pm Post subject: |
|
|
That output only shows you which is explicitly set. It doesn't show what is implied by the march/mtune settings generated by native.
If you have a CPU that supports the sse sets, then they are enabled but msseX will still show as disabled. _________________ My emerge --info
Have you run revdep-rebuild lately? It's in gentoolkit and it's worth a shot if things don't work well.
Celebrating 5 years of Gentoo-ing. |
|
Back to top |
|
 |
mv Watchman


Joined: 20 Apr 2005 Posts: 6781
|
Posted: Mon Mar 15, 2010 7:08 pm Post subject: |
|
|
-mssse3 is implied by -march=native (if supported by the CPU, of course). Here is how to find out (if your CPU supports it):
Code: | gcc -v -c -Q -march=native -O2 -o /dev/null -x c - 2>&1 <<PROG
int main(){return 0;}
PROG |
|
|
Back to top |
|
 |
|