[gcc 3.4] AMD's Recommended CFLAGS

Message

nxsty · Post by **nxsty** » Sat Jan 14, 2006 9:55 pm

sigmalll: True, but the problem is that there is no magic CFLAGS that will make your system faster. So when choosing global CFLAGS they should always be "-O2 -march= -pipe" and perhaps also omitfp if you're on x86. It doesn't matter if -O3 is shown to be faster in some benchmarks, it shouldn't be used globaly anyway because 99% of the packages wont benefit from the higher optimizations and will in fact often run slower because of the extra bloat it causes.

sigmalll · Post by **sigmalll** » Sat Jan 14, 2006 11:12 pm

nxsty wrote:sigmalll: True, but the problem is that there is no magic CFLAGS that will make your system faster. So when choosing global CFLAGS they should always be "-O2 -march= -pipe" and perhaps also omitfp if you're on x86. It doesn't matter if -O3 is shown to be faster in some benchmarks, it shouldn't be used globaly anyway because 99% of the packages wont benefit from the higher optimizations and will in fact often run slower because of the extra bloat it causes.

I understand the argument that -O3 -fblah flags are a bit pointless for a lot software, who gives a monkeys if 'less' is 7% faster (likewise, does 7% slower really matter either). But setting a high level of optimisation globally does guarantee that all applications that may benifit, do. I don't think anybody in their right mind would want to have optimisations on an application by application basis, especially if they're running a media intensive desktop. (and I don't expect the devs to start adding performance based flags to ebuilds)

But the compiler is where the most performnce gains can be obtained, and good CFLAGS really do make parts of your system run faster.

(I do have to add that this is only really an issue because GCC's focus in the past has always been portability rather than performance. In some cases this actually makes Linux applications slower than their windows counterparts)

cybrjackle · Post by **cybrjackle** » Sun Jan 15, 2006 3:54 am

I use hardcore ricing flags

Code: Select all

# grep CFLAGS /etc/make.conf
CFLAGS="-O2 -march=k8 -pipe"
CXXFLAGS="${CFLAGS}"

LoSeR_5150 · Post by **LoSeR_5150** » Sun Jan 15, 2006 11:46 am

I am fairly new to gentoo, but within my 10mo. i have wasted tons of time playing with cflags, well i guess not wasted because i think i learned from the experience. My cflags used to look like this:

Code: Select all

CFLAGS="-march=athlon64 -O2 -ffast-math -fforce-addr -fmove-all-movables -fno-ident -fomit-frame-pointer -fpeel-loops -fprefetch-loop-arrays -frename-registers -ftracer -funrool-loops -funswitch-loops -fweb -pipe"

What i have learned is that while some app, say nbench, might see decent gains (i mean like 5-7% woo

) it makes just as many apps slower by the same percentage if not more resulting in an overall slower system. So it has been my experience the fewer ricey cflags the better. My current sys flags are:

Code: Select all

CFLAGS="-march=athlon64 -O2 -fforce-addr -fno-ident -ftracer -fweb -pipe"

And i can say that my system runs much better (faster compile times, better app start times, better stability) now then when i had all my ricey cflags. I hate to say it but unless you are focusing on trying to speed up a certain app, maybe some type of media intensive app, it seems that it isnt worth the time to mess with your cflags extensively. Just my .02

MorLipf · Post by **MorLipf** » Sun Jan 15, 2006 11:51 am

My current Cflags are:

Code: Select all

CFLAGS="-march=k8 -O3 -pipe -fomit-frame-pointer"

Should I optimize them?

nxsty · Post by **nxsty** » Sun Jan 15, 2006 12:31 pm

sigmalll wrote:I understand the argument that -O3 -fblah flags are a bit pointless for a lot software, who gives a monkeys if 'less' is 7% faster (likewise, does 7% slower really matter either). But setting a high level of optimisation globally does guarantee that all applications that may benifit, do. I don't think anybody in their right mind would want to have optimisations on an application by application basis, especially if they're running a media intensive desktop. (and I don't expect the devs to start adding performance based flags to ebuilds)

But the compiler is where the most performnce gains can be obtained, and good CFLAGS really do make parts of your system run faster.

(I do have to add that this is only really an issue because GCC's focus in the past has always been portability rather than performance. In some cases this actually makes Linux applications slower than their windows counterparts)

There is always a tradeoff betwen speed and size when the compiler is optimizing. -O2 is usually a good balance, it turns on most optimizations but still doesn't bloat code much. Higher optimizations like -O3 -funroll-loops and friends has the side effect that they make binaries larger. Larger binaries means more disc reads, more memory usage, slower execution and larger chans of cache misses. This is acceptable for the specific applications that actually benefits from the higher optimizations but for most things it's just unnecessary bloat that is bad for performance. In fact -Os is usually the best options since most applications don't benefit from the extra optimizations but they do benefit from the smaller binary size. Compiler optimizations is only a small part of performance, most comes from good written code.

robak · Post by **robak** » Wed Jan 18, 2006 4:41 am

i just tried a few CFLAGS to optimize POVray but the best result i could get is 28min 34 sec on this hardware:

AMD Athlon64 3000+ 1,8Ghz Venice-core
2*512 MB RAM in Dual-ChannelMode

can someone tell me how to optimmize the system to get better results.
i was compiling world for about 3 days now (i have "only" 135 packages to compile, so i could test a lot of FLAGS combinations) and i a little bit tied

mbar · Post by **mbar** » Thu Jan 19, 2006 8:46 pm

or you could just overclock your CPU by mere 200 MHz and make it really fly faster, no ricer CFLAGS would do it instead

I settled on "-Os -march=k8 -pipe -fomit-frame-pointer -falign-functions=5"

alexlm78 · Post by **alexlm78** » Wed Feb 08, 2006 5:50 pm

Interesting, i should try it.!!!!!!!

HacTek · Post by **HacTek** » Thu Feb 09, 2006 1:07 am

forgive my naivety, but is there a way to specify cflags on a package-by-package basis?
something similar to the way use flags can with the package.use file.

if not then why not?

seems like from this debate that a simple solution would be to set some safe and sensible cflags for the system.
perhaps

Code: Select all

 -O2 -march=XX -pipe

and then for a package which would benifit add say

Code: Select all

category/packagename -fwhatever-you-want

into a file called package.cflags

i reckon this could keep both sides of the fence happy.
you would get overall system stability and the ricers can have fun optimising an app without breaking other packages as easily.

might even take the preasure off the developers having to strip flags out of the ebuilds.

any reason why this wouldn't work?

barry · Post by **barry** » Thu Feb 09, 2006 1:09 am

It's important to remember that a lot of these high performance optimisation flags are designed for developers to compile and test their software to get the best perfomance out of them. They were never intended for use to compile an entire production system blindly with.

As others have said, most of the packages that will receive massive perfomance gains with flags like -funroll-loops and -ffast-math already have these included in the ebuild so you don't need to enable them yourself.

"-march=k8 -O2 -pipe" should produce excellent optimised and rock solid code for everybody, and the builds like mplayer, lame and so on will already have safe higher performing flags applied.

"-march=k8 -O2 -pipe -frename-registers -fweb" is the same as doing -O3 but misses off a single flag that bloats code and quite often causes a slow down, so these flags will generally improve performance over a plain -O2 without breaking anything.

SoTired · Post by **SoTired** » Thu Feb 09, 2006 2:30 am

HacTek wrote:forgive my naivety, but is there a way to specify cflags on a package-by-package basis?
something similar to the way use flags can with the package.use file.

There is a way, it's just not officially endorsed, see http://forums.gentoo.org/viewtopic-t-28 ... art-0.html

HacTek · Post by **HacTek** » Thu Feb 09, 2006 2:42 am

it looks like a pretty good script to me.
not that i have any experience with bash scripting but i gotta learn sometime.

is there any active development happening with this approach?
looks like a good candidate for moving towards an offically supported feature.

Post by **pacho2** » Thu Aug 31, 2006 5:05 pm

energyman76b wrote:
tnt wrote:
energyman76b wrote:-msse3 is only 'save' if you know for sure that your CPU supports it (Venice Amd64).
It's Sempron 2800+ 'BX' Palermo core and it has 'PNI' flag so it should have SSE3.

Anyway, thank you for '-funroll-all-loops' tip - very usefull one!
I read some weeks ago, that some CPUs report the PNI flag, without having SSE3.
Try to run this:
cat test_pni.c
#include <stdint.h>

uint8_t __attribute__((aligned(64))) current[64];
uint8_t previous[64];

int main()
{
int i;
uint64_t result;
uint32_t _eax, _ebx, _ecx, _edx;
uint8_t _cpuid[13];
uint32_t *_cpuid0 = (uint32_t*) _cpuid;
uint32_t *_cpuid1 = (uint32_t*) ( _cpuid + 4 );
uint32_t *_cpuid2 = (uint32_t*) ( _cpuid + 8 );
uint8_t *ptr0 = current;
uint8_t *ptr1 = previous;

__asm__ __volatile__ (
"cpuid\n"
: "=a" (_eax),
"=b" (*_cpuid0), "=d" (*_cpuid1), "=c" (*_cpuid2)
: "a" (0) );
_cpuid[12] = 0;
printf( "cpuid(0) returns %d (%s)\n", _eax, _cpuid );
__asm__ __volatile__ (
"cpuid\n"
: "=a" (_eax), "=b" (_ebx), "=c" (_ecx), "=d" (_edx)
: "a" (1) );
printf( "cpuid(1) returns %08x %08x %08x %08x\n",
_eax, _ebx, _ecx, _edx );
memset( current, 0xaa, 64 );
memset( previous, 0x55, 64 );
for( i = 0; i < 4; i ++ ) {
__asm__ __volatile__ (
"movdqa %0, %%xmm0\n"
"movdqu %1, %%xmm1\n"
"psadbw %%xmm1, %%xmm0\n"
"paddw %%xmm0, %%xmm2\n"
"haddps %%xmm2, %%xmm2\n"
"haddps %%xmm2, %%xmm2\n"
: : "m" (*ptr0),
"m" (*ptr1) : "xmm0", "xmm1", "xmm2" );
ptr0 += 16;
ptr1 += 16;
}
__asm__ __volatile__ (
"movq %%xmm2, %0\n"
: "=m" (result) );
printf( "Result is %llu\n", result );
}

save it as test_pni.c, compile and run it.
If it throws errors, you do not have sse3.
If not, you have SSE3 and everything is fine.

I have an Athlon 3200+ Winchester, I have compiled it and I get this output:

Code: Select all

./test_pni
cpuid(0) returns 1 (AuthenticAMD)
cpuid(1) returns 00020ff0 00000800 00000001 078bfbff
Result is 496498219533200

So, Does it support SSE3

Thanks for the information

loftwyr · Post by **loftwyr** » Thu Aug 31, 2006 5:46 pm

If you didn't have SSE3, it would have given an error instead of a result.

Your processor has SSE3

Post by **pacho2** » Thu Aug 31, 2006 6:26 pm

Thanks

clytle374 · Post by **clytle374** » Fri Sep 01, 2006 5:38 am

I have decided that some here work for MS: either trying to make linux as slow as windows, or as unstable as windows.

Now i will have to break it to find out who.