View previous topic :: View next topic |
Author |
Message |
nxsty Veteran
Joined: 23 Jun 2004 Posts: 1556 Location: .se
|
Posted: Sat Jan 14, 2006 9:55 pm Post subject: |
|
|
sigmalll: True, but the problem is that there is no magic CFLAGS that will make your system faster. So when choosing global CFLAGS they should always be "-O2 -march= -pipe" and perhaps also omitfp if you're on x86. It doesn't matter if -O3 is shown to be faster in some benchmarks, it shouldn't be used globaly anyway because 99% of the packages wont benefit from the higher optimizations and will in fact often run slower because of the extra bloat it causes. |
|
Back to top |
|
|
sigmalll Guru
Joined: 24 Aug 2003 Posts: 332
|
Posted: Sat Jan 14, 2006 11:12 pm Post subject: |
|
|
nxsty wrote: | sigmalll: True, but the problem is that there is no magic CFLAGS that will make your system faster. So when choosing global CFLAGS they should always be "-O2 -march= -pipe" and perhaps also omitfp if you're on x86. It doesn't matter if -O3 is shown to be faster in some benchmarks, it shouldn't be used globaly anyway because 99% of the packages wont benefit from the higher optimizations and will in fact often run slower because of the extra bloat it causes. |
I understand the argument that -O3 -fblah flags are a bit pointless for a lot software, who gives a monkeys if 'less' is 7% faster (likewise, does 7% slower really matter either). But setting a high level of optimisation globally does guarantee that all applications that may benifit, do. I don't think anybody in their right mind would want to have optimisations on an application by application basis, especially if they're running a media intensive desktop. (and I don't expect the devs to start adding performance based flags to ebuilds)
But the compiler is where the most performnce gains can be obtained, and good CFLAGS really do make parts of your system run faster.
(I do have to add that this is only really an issue because GCC's focus in the past has always been portability rather than performance. In some cases this actually makes Linux applications slower than their windows counterparts) |
|
Back to top |
|
|
cybrjackle Apprentice
Joined: 09 Jan 2003 Posts: 248 Location: USA
|
Posted: Sun Jan 15, 2006 3:54 am Post subject: |
|
|
I use hardcore ricing flags
Code: |
# grep CFLAGS /etc/make.conf
CFLAGS="-O2 -march=k8 -pipe"
CXXFLAGS="${CFLAGS}"
|
|
|
Back to top |
|
|
LoSeR_5150 Guru
Joined: 20 Mar 2005 Posts: 455 Location: San Francisco, CA
|
Posted: Sun Jan 15, 2006 11:46 am Post subject: |
|
|
I am fairly new to gentoo, but within my 10mo. i have wasted tons of time playing with cflags, well i guess not wasted because i think i learned from the experience. My cflags used to look like this:
Code: | CFLAGS="-march=athlon64 -O2 -ffast-math -fforce-addr -fmove-all-movables -fno-ident -fomit-frame-pointer -fpeel-loops -fprefetch-loop-arrays -frename-registers -ftracer -funrool-loops -funswitch-loops -fweb -pipe" |
What i have learned is that while some app, say nbench, might see decent gains (i mean like 5-7% woo ) it makes just as many apps slower by the same percentage if not more resulting in an overall slower system. So it has been my experience the fewer ricey cflags the better. My current sys flags are:
Code: | CFLAGS="-march=athlon64 -O2 -fforce-addr -fno-ident -ftracer -fweb -pipe" |
And i can say that my system runs much better (faster compile times, better app start times, better stability) now then when i had all my ricey cflags. I hate to say it but unless you are focusing on trying to speed up a certain app, maybe some type of media intensive app, it seems that it isnt worth the time to mess with your cflags extensively. Just my .02 _________________ Opteron 1356@2.4Ghz
6GB DDR2 800Mhz
128MB Quadro NVS 210S
640GB Western Digital HD
*Gentoo-x86_64-2.6.30-r1
Opteron175@2.2GHz
2GB DDR 400MHz
256MB Quadro 1400 Go
(2) 80GB Segate HDs: RAID0
*Gentoo-x86_64-2.6.30-r1 |
|
Back to top |
|
|
MorLipf Apprentice
Joined: 09 Nov 2004 Posts: 226 Location: Solingen, Germany
|
Posted: Sun Jan 15, 2006 11:51 am Post subject: |
|
|
My current Cflags are:
Code: | CFLAGS="-march=k8 -O3 -pipe -fomit-frame-pointer" |
Should I optimize them? |
|
Back to top |
|
|
nxsty Veteran
Joined: 23 Jun 2004 Posts: 1556 Location: .se
|
Posted: Sun Jan 15, 2006 12:31 pm Post subject: |
|
|
sigmalll wrote: | I understand the argument that -O3 -fblah flags are a bit pointless for a lot software, who gives a monkeys if 'less' is 7% faster (likewise, does 7% slower really matter either). But setting a high level of optimisation globally does guarantee that all applications that may benifit, do. I don't think anybody in their right mind would want to have optimisations on an application by application basis, especially if they're running a media intensive desktop. (and I don't expect the devs to start adding performance based flags to ebuilds)
But the compiler is where the most performnce gains can be obtained, and good CFLAGS really do make parts of your system run faster.
(I do have to add that this is only really an issue because GCC's focus in the past has always been portability rather than performance. In some cases this actually makes Linux applications slower than their windows counterparts) |
There is always a tradeoff betwen speed and size when the compiler is optimizing. -O2 is usually a good balance, it turns on most optimizations but still doesn't bloat code much. Higher optimizations like -O3 -funroll-loops and friends has the side effect that they make binaries larger. Larger binaries means more disc reads, more memory usage, slower execution and larger chans of cache misses. This is acceptable for the specific applications that actually benefits from the higher optimizations but for most things it's just unnecessary bloat that is bad for performance. In fact -Os is usually the best options since most applications don't benefit from the extra optimizations but they do benefit from the smaller binary size. Compiler optimizations is only a small part of performance, most comes from good written code. |
|
Back to top |
|
|
robak Apprentice
Joined: 14 Jan 2004 Posts: 209 Location: Germany
|
Posted: Wed Jan 18, 2006 4:41 am Post subject: |
|
|
i just tried a few CFLAGS to optimize POVray but the best result i could get is 28min 34 sec on this hardware:
AMD Athlon64 3000+ 1,8Ghz Venice-core
2*512 MB RAM in Dual-ChannelMode
can someone tell me how to optimmize the system to get better results.
i was compiling world for about 3 days now (i have "only" 135 packages to compile, so i could test a lot of FLAGS combinations) and i a little bit tied |
|
Back to top |
|
|
mbar Veteran
Joined: 19 Jan 2005 Posts: 1990 Location: Poland
|
Posted: Thu Jan 19, 2006 8:46 pm Post subject: |
|
|
or you could just overclock your CPU by mere 200 MHz and make it really fly faster, no ricer CFLAGS would do it instead
I settled on "-Os -march=k8 -pipe -fomit-frame-pointer -falign-functions=5" |
|
Back to top |
|
|
alexlm78 Veteran
Joined: 08 Dec 2003 Posts: 1265 Location: Guatemala,Guatemala
|
Posted: Wed Feb 08, 2006 5:50 pm Post subject: |
|
|
Interesting, i should try it.!!!!!!! _________________ "This is a different kind of world, you need a different kind of software"
Linux User# 315201
100% Chapin hecho en Guatemala |
|
Back to top |
|
|
HacTek n00b
Joined: 31 Jul 2005 Posts: 7 Location: New Zealand
|
Posted: Thu Feb 09, 2006 1:07 am Post subject: |
|
|
forgive my naivety, but is there a way to specify cflags on a package-by-package basis?
something similar to the way use flags can with the package.use file.
if not then why not?
seems like from this debate that a simple solution would be to set some safe and sensible cflags for the system.
perhaps Code: | -O2 -march=XX -pipe |
and then for a package which would benifit add say
Code: | category/packagename -fwhatever-you-want |
into a file called package.cflags
i reckon this could keep both sides of the fence happy.
you would get overall system stability and the ricers can have fun optimising an app without breaking other packages as easily.
might even take the preasure off the developers having to strip flags out of the ebuilds.
any reason why this wouldn't work? _________________ SELECT * FROM Managers WHERE Clue > 0
0 Rows Returned |
|
Back to top |
|
|
barry Apprentice
Joined: 01 May 2002 Posts: 170 Location: UK
|
Posted: Thu Feb 09, 2006 1:09 am Post subject: |
|
|
It's important to remember that a lot of these high performance optimisation flags are designed for developers to compile and test their software to get the best perfomance out of them. They were never intended for use to compile an entire production system blindly with.
As others have said, most of the packages that will receive massive perfomance gains with flags like -funroll-loops and -ffast-math already have these included in the ebuild so you don't need to enable them yourself.
"-march=k8 -O2 -pipe" should produce excellent optimised and rock solid code for everybody, and the builds like mplayer, lame and so on will already have safe higher performing flags applied.
"-march=k8 -O2 -pipe -frename-registers -fweb" is the same as doing -O3 but misses off a single flag that bloats code and quite often causes a slow down, so these flags will generally improve performance over a plain -O2 without breaking anything. |
|
Back to top |
|
|
SoTired Apprentice
Joined: 19 May 2004 Posts: 174
|
|
Back to top |
|
|
HacTek n00b
Joined: 31 Jul 2005 Posts: 7 Location: New Zealand
|
Posted: Thu Feb 09, 2006 2:42 am Post subject: |
|
|
it looks like a pretty good script to me.
not that i have any experience with bash scripting but i gotta learn sometime.
is there any active development happening with this approach?
looks like a good candidate for moving towards an offically supported feature. _________________ SELECT * FROM Managers WHERE Clue > 0
0 Rows Returned |
|
Back to top |
|
|
pacho2 Developer
Joined: 04 Mar 2005 Posts: 2599 Location: Oviedo, Spain
|
Posted: Thu Aug 31, 2006 5:05 pm Post subject: |
|
|
energyman76b wrote: | tnt wrote: | energyman76b wrote: | -msse3 is only 'save' if you know for sure that your CPU supports it (Venice Amd64). |
It's Sempron 2800+ 'BX' Palermo core and it has 'PNI' flag so it should have SSE3.
Anyway, thank you for '-funroll-all-loops' tip - very usefull one! |
I read some weeks ago, that some CPUs report the PNI flag, without having SSE3.
Try to run this:
cat test_pni.c
#include <stdint.h>
uint8_t __attribute__((aligned(64))) current[64];
uint8_t previous[64];
int main()
{
int i;
uint64_t result;
uint32_t _eax, _ebx, _ecx, _edx;
uint8_t _cpuid[13];
uint32_t *_cpuid0 = (uint32_t*) _cpuid;
uint32_t *_cpuid1 = (uint32_t*) ( _cpuid + 4 );
uint32_t *_cpuid2 = (uint32_t*) ( _cpuid + 8 );
uint8_t *ptr0 = current;
uint8_t *ptr1 = previous;
__asm__ __volatile__ (
"cpuid\n"
: "=a" (_eax),
"=b" (*_cpuid0), "=d" (*_cpuid1), "=c" (*_cpuid2)
: "a" (0) );
_cpuid[12] = 0;
printf( "cpuid(0) returns %d (%s)\n", _eax, _cpuid );
__asm__ __volatile__ (
"cpuid\n"
: "=a" (_eax), "=b" (_ebx), "=c" (_ecx), "=d" (_edx)
: "a" (1) );
printf( "cpuid(1) returns %08x %08x %08x %08x\n",
_eax, _ebx, _ecx, _edx );
memset( current, 0xaa, 64 );
memset( previous, 0x55, 64 );
for( i = 0; i < 4; i ++ ) {
__asm__ __volatile__ (
"movdqa %0, %%xmm0\n"
"movdqu %1, %%xmm1\n"
"psadbw %%xmm1, %%xmm0\n"
"paddw %%xmm0, %%xmm2\n"
"haddps %%xmm2, %%xmm2\n"
"haddps %%xmm2, %%xmm2\n"
: : "m" (*ptr0),
"m" (*ptr1) : "xmm0", "xmm1", "xmm2" );
ptr0 += 16;
ptr1 += 16;
}
__asm__ __volatile__ (
"movq %%xmm2, %0\n"
: "=m" (result) );
printf( "Result is %llu\n", result );
}
save it as test_pni.c, compile and run it.
If it throws errors, you do not have sse3.
If not, you have SSE3 and everything is fine. |
I have an Athlon 3200+ Winchester, I have compiled it and I get this output:
Code: | ./test_pni
cpuid(0) returns 1 (AuthenticAMD)
cpuid(1) returns 00020ff0 00000800 00000001 078bfbff
Result is 496498219533200
|
So, Does it support SSE3
Thanks for the information |
|
Back to top |
|
|
loftwyr l33t
Joined: 29 Dec 2004 Posts: 970 Location: 43°38'23.62"N 79°27'8.60"W
|
Posted: Thu Aug 31, 2006 5:46 pm Post subject: |
|
|
If you didn't have SSE3, it would have given an error instead of a result.
Your processor has SSE3 _________________ My emerge --info
Have you run revdep-rebuild lately? It's in gentoolkit and it's worth a shot if things don't work well.
Celebrating 5 years of Gentoo-ing. |
|
Back to top |
|
|
pacho2 Developer
Joined: 04 Mar 2005 Posts: 2599 Location: Oviedo, Spain
|
Posted: Thu Aug 31, 2006 6:26 pm Post subject: |
|
|
Thanks |
|
Back to top |
|
|
clytle374 Apprentice
Joined: 01 Aug 2006 Posts: 221
|
Posted: Fri Sep 01, 2006 5:38 am Post subject: |
|
|
I have decided that some here work for MS: either trying to make linux as slow as windows, or as unstable as windows.
Now i will have to break it to find out who. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|