Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[gcc 3.4] AMD's Recommended CFLAGS
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5  
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
nxsty
Veteran
Veteran


Joined: 23 Jun 2004
Posts: 1556
Location: .se

PostPosted: Sat Jan 14, 2006 9:55 pm    Post subject: Reply with quote

sigmalll: True, but the problem is that there is no magic CFLAGS that will make your system faster. So when choosing global CFLAGS they should always be "-O2 -march= -pipe" and perhaps also omitfp if you're on x86. It doesn't matter if -O3 is shown to be faster in some benchmarks, it shouldn't be used globaly anyway because 99% of the packages wont benefit from the higher optimizations and will in fact often run slower because of the extra bloat it causes.
Back to top
View user's profile Send private message
sigmalll
Guru
Guru


Joined: 24 Aug 2003
Posts: 332

PostPosted: Sat Jan 14, 2006 11:12 pm    Post subject: Reply with quote

nxsty wrote:
sigmalll: True, but the problem is that there is no magic CFLAGS that will make your system faster. So when choosing global CFLAGS they should always be "-O2 -march= -pipe" and perhaps also omitfp if you're on x86. It doesn't matter if -O3 is shown to be faster in some benchmarks, it shouldn't be used globaly anyway because 99% of the packages wont benefit from the higher optimizations and will in fact often run slower because of the extra bloat it causes.


I understand the argument that -O3 -fblah flags are a bit pointless for a lot software, who gives a monkeys if 'less' is 7% faster (likewise, does 7% slower really matter either). But setting a high level of optimisation globally does guarantee that all applications that may benifit, do. I don't think anybody in their right mind would want to have optimisations on an application by application basis, especially if they're running a media intensive desktop. (and I don't expect the devs to start adding performance based flags to ebuilds)

But the compiler is where the most performnce gains can be obtained, and good CFLAGS really do make parts of your system run faster.

(I do have to add that this is only really an issue because GCC's focus in the past has always been portability rather than performance. In some cases this actually makes Linux applications slower than their windows counterparts)
Back to top
View user's profile Send private message
cybrjackle
Apprentice
Apprentice


Joined: 09 Jan 2003
Posts: 248
Location: USA

PostPosted: Sun Jan 15, 2006 3:54 am    Post subject: Reply with quote

I use hardcore ricing flags

Code:

# grep CFLAGS /etc/make.conf
CFLAGS="-O2 -march=k8 -pipe"
CXXFLAGS="${CFLAGS}"


:roll:
Back to top
View user's profile Send private message
LoSeR_5150
Guru
Guru


Joined: 20 Mar 2005
Posts: 455
Location: San Francisco, CA

PostPosted: Sun Jan 15, 2006 11:46 am    Post subject: Reply with quote

I am fairly new to gentoo, but within my 10mo. i have wasted tons of time playing with cflags, well i guess not wasted because i think i learned from the experience. My cflags used to look like this:

Code:
CFLAGS="-march=athlon64 -O2 -ffast-math -fforce-addr -fmove-all-movables -fno-ident -fomit-frame-pointer -fpeel-loops -fprefetch-loop-arrays -frename-registers -ftracer -funrool-loops -funswitch-loops -fweb -pipe"


What i have learned is that while some app, say nbench, might see decent gains (i mean like 5-7% woo :roll: ) it makes just as many apps slower by the same percentage if not more resulting in an overall slower system. So it has been my experience the fewer ricey cflags the better. My current sys flags are:

Code:
CFLAGS="-march=athlon64 -O2 -fforce-addr -fno-ident -ftracer -fweb -pipe"


And i can say that my system runs much better (faster compile times, better app start times, better stability) now then when i had all my ricey cflags. I hate to say it but unless you are focusing on trying to speed up a certain app, maybe some type of media intensive app, it seems that it isnt worth the time to mess with your cflags extensively. Just my .02
_________________
Opteron 1356@2.4Ghz
6GB DDR2 800Mhz
128MB Quadro NVS 210S
640GB Western Digital HD
*Gentoo-x86_64-2.6.30-r1

Opteron175@2.2GHz
2GB DDR 400MHz
256MB Quadro 1400 Go
(2) 80GB Segate HDs: RAID0
*Gentoo-x86_64-2.6.30-r1
Back to top
View user's profile Send private message
MorLipf
Apprentice
Apprentice


Joined: 09 Nov 2004
Posts: 226
Location: Solingen, Germany

PostPosted: Sun Jan 15, 2006 11:51 am    Post subject: Reply with quote

My current Cflags are:

Code:
CFLAGS="-march=k8 -O3 -pipe -fomit-frame-pointer"


Should I optimize them?
Back to top
View user's profile Send private message
nxsty
Veteran
Veteran


Joined: 23 Jun 2004
Posts: 1556
Location: .se

PostPosted: Sun Jan 15, 2006 12:31 pm    Post subject: Reply with quote

sigmalll wrote:
I understand the argument that -O3 -fblah flags are a bit pointless for a lot software, who gives a monkeys if 'less' is 7% faster (likewise, does 7% slower really matter either). But setting a high level of optimisation globally does guarantee that all applications that may benifit, do. I don't think anybody in their right mind would want to have optimisations on an application by application basis, especially if they're running a media intensive desktop. (and I don't expect the devs to start adding performance based flags to ebuilds)

But the compiler is where the most performnce gains can be obtained, and good CFLAGS really do make parts of your system run faster.

(I do have to add that this is only really an issue because GCC's focus in the past has always been portability rather than performance. In some cases this actually makes Linux applications slower than their windows counterparts)


There is always a tradeoff betwen speed and size when the compiler is optimizing. -O2 is usually a good balance, it turns on most optimizations but still doesn't bloat code much. Higher optimizations like -O3 -funroll-loops and friends has the side effect that they make binaries larger. Larger binaries means more disc reads, more memory usage, slower execution and larger chans of cache misses. This is acceptable for the specific applications that actually benefits from the higher optimizations but for most things it's just unnecessary bloat that is bad for performance. In fact -Os is usually the best options since most applications don't benefit from the extra optimizations but they do benefit from the smaller binary size. Compiler optimizations is only a small part of performance, most comes from good written code.
Back to top
View user's profile Send private message
robak
Apprentice
Apprentice


Joined: 14 Jan 2004
Posts: 209
Location: Germany

PostPosted: Wed Jan 18, 2006 4:41 am    Post subject: Reply with quote

i just tried a few CFLAGS to optimize POVray but the best result i could get is 28min 34 sec on this hardware:

AMD Athlon64 3000+ 1,8Ghz Venice-core
2*512 MB RAM in Dual-ChannelMode

can someone tell me how to optimmize the system to get better results.
i was compiling world for about 3 days now (i have "only" 135 packages to compile, so i could test a lot of FLAGS combinations) and i a little bit tied ;)
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Thu Jan 19, 2006 8:46 pm    Post subject: Reply with quote

or you could just overclock your CPU by mere 200 MHz and make it really fly faster, no ricer CFLAGS would do it instead

I settled on "-Os -march=k8 -pipe -fomit-frame-pointer -falign-functions=5"
Back to top
View user's profile Send private message
alexlm78
Veteran
Veteran


Joined: 08 Dec 2003
Posts: 1265
Location: Guatemala,Guatemala

PostPosted: Wed Feb 08, 2006 5:50 pm    Post subject: Reply with quote

Interesting, i should try it.!!!!!!! :twisted:
_________________
"This is a different kind of world, you need a different kind of software"

Linux User# 315201
100% Chapin hecho en Guatemala
Back to top
View user's profile Send private message
HacTek
n00b
n00b


Joined: 31 Jul 2005
Posts: 7
Location: New Zealand

PostPosted: Thu Feb 09, 2006 1:07 am    Post subject: Reply with quote

forgive my naivety, but is there a way to specify cflags on a package-by-package basis?
something similar to the way use flags can with the package.use file.

if not then why not?

seems like from this debate that a simple solution would be to set some safe and sensible cflags for the system.
perhaps
Code:
 -O2 -march=XX -pipe


and then for a package which would benifit add say
Code:
category/packagename -fwhatever-you-want

into a file called package.cflags

i reckon this could keep both sides of the fence happy.
you would get overall system stability and the ricers can have fun optimising an app without breaking other packages as easily.

might even take the preasure off the developers having to strip flags out of the ebuilds.

any reason why this wouldn't work?
_________________
SELECT * FROM Managers WHERE Clue > 0
0 Rows Returned
Back to top
View user's profile Send private message
barry
Apprentice
Apprentice


Joined: 01 May 2002
Posts: 170
Location: UK

PostPosted: Thu Feb 09, 2006 1:09 am    Post subject: Reply with quote

It's important to remember that a lot of these high performance optimisation flags are designed for developers to compile and test their software to get the best perfomance out of them. They were never intended for use to compile an entire production system blindly with.

As others have said, most of the packages that will receive massive perfomance gains with flags like -funroll-loops and -ffast-math already have these included in the ebuild so you don't need to enable them yourself.

"-march=k8 -O2 -pipe" should produce excellent optimised and rock solid code for everybody, and the builds like mplayer, lame and so on will already have safe higher performing flags applied.

"-march=k8 -O2 -pipe -frename-registers -fweb" is the same as doing -O3 but misses off a single flag that bloats code and quite often causes a slow down, so these flags will generally improve performance over a plain -O2 without breaking anything.
Back to top
View user's profile Send private message
SoTired
Apprentice
Apprentice


Joined: 19 May 2004
Posts: 174

PostPosted: Thu Feb 09, 2006 2:30 am    Post subject: Reply with quote

HacTek wrote:
forgive my naivety, but is there a way to specify cflags on a package-by-package basis?
something similar to the way use flags can with the package.use file.


There is a way, it's just not officially endorsed, see https://forums.gentoo.org/viewtopic-t-280748-postdays-0-postorder-asc-start-0.html
Back to top
View user's profile Send private message
HacTek
n00b
n00b


Joined: 31 Jul 2005
Posts: 7
Location: New Zealand

PostPosted: Thu Feb 09, 2006 2:42 am    Post subject: Reply with quote

it looks like a pretty good script to me.
not that i have any experience with bash scripting but i gotta learn sometime.

is there any active development happening with this approach?
looks like a good candidate for moving towards an offically supported feature.
_________________
SELECT * FROM Managers WHERE Clue > 0
0 Rows Returned
Back to top
View user's profile Send private message
pacho2
Developer
Developer


Joined: 04 Mar 2005
Posts: 2599
Location: Oviedo, Spain

PostPosted: Thu Aug 31, 2006 5:05 pm    Post subject: Reply with quote

energyman76b wrote:
tnt wrote:
energyman76b wrote:
-msse3 is only 'save' if you know for sure that your CPU supports it (Venice Amd64).


It's Sempron 2800+ 'BX' Palermo core and it has 'PNI' flag so it should have SSE3.

Anyway, thank you for '-funroll-all-loops' tip - very usefull one!


I read some weeks ago, that some CPUs report the PNI flag, without having SSE3.
Try to run this:
cat test_pni.c
#include <stdint.h>

uint8_t __attribute__((aligned(64))) current[64];
uint8_t previous[64];

int main()
{
int i;
uint64_t result;
uint32_t _eax, _ebx, _ecx, _edx;
uint8_t _cpuid[13];
uint32_t *_cpuid0 = (uint32_t*) _cpuid;
uint32_t *_cpuid1 = (uint32_t*) ( _cpuid + 4 );
uint32_t *_cpuid2 = (uint32_t*) ( _cpuid + 8 );
uint8_t *ptr0 = current;
uint8_t *ptr1 = previous;

__asm__ __volatile__ (
"cpuid\n"
: "=a" (_eax),
"=b" (*_cpuid0), "=d" (*_cpuid1), "=c" (*_cpuid2)
: "a" (0) );
_cpuid[12] = 0;
printf( "cpuid(0) returns %d (%s)\n", _eax, _cpuid );
__asm__ __volatile__ (
"cpuid\n"
: "=a" (_eax), "=b" (_ebx), "=c" (_ecx), "=d" (_edx)
: "a" (1) );
printf( "cpuid(1) returns %08x %08x %08x %08x\n",
_eax, _ebx, _ecx, _edx );
memset( current, 0xaa, 64 );
memset( previous, 0x55, 64 );
for( i = 0; i < 4; i ++ ) {
__asm__ __volatile__ (
"movdqa %0, %%xmm0\n"
"movdqu %1, %%xmm1\n"
"psadbw %%xmm1, %%xmm0\n"
"paddw %%xmm0, %%xmm2\n"
"haddps %%xmm2, %%xmm2\n"
"haddps %%xmm2, %%xmm2\n"
: : "m" (*ptr0),
"m" (*ptr1) : "xmm0", "xmm1", "xmm2" );
ptr0 += 16;
ptr1 += 16;
}
__asm__ __volatile__ (
"movq %%xmm2, %0\n"
: "=m" (result) );
printf( "Result is %llu\n", result );
}

save it as test_pni.c, compile and run it.
If it throws errors, you do not have sse3.
If not, you have SSE3 and everything is fine.


I have an Athlon 3200+ Winchester, I have compiled it and I get this output:
Code:
./test_pni
cpuid(0) returns 1 (AuthenticAMD)
cpuid(1) returns 00020ff0 00000800 00000001 078bfbff
Result is 496498219533200


So, Does it support SSE3 :?: :?: 8O

Thanks for the information :)
Back to top
View user's profile Send private message
loftwyr
l33t
l33t


Joined: 29 Dec 2004
Posts: 970
Location: 43°38'23.62"N 79°27'8.60"W

PostPosted: Thu Aug 31, 2006 5:46 pm    Post subject: Reply with quote

If you didn't have SSE3, it would have given an error instead of a result.

Your processor has SSE3
_________________
My emerge --info
Have you run revdep-rebuild lately? It's in gentoolkit and it's worth a shot if things don't work well.
Celebrating 5 years of Gentoo-ing.
Back to top
View user's profile Send private message
pacho2
Developer
Developer


Joined: 04 Mar 2005
Posts: 2599
Location: Oviedo, Spain

PostPosted: Thu Aug 31, 2006 6:26 pm    Post subject: Reply with quote

Thanks :)
Back to top
View user's profile Send private message
clytle374
Apprentice
Apprentice


Joined: 01 Aug 2006
Posts: 221

PostPosted: Fri Sep 01, 2006 5:38 am    Post subject: Reply with quote

I have decided that some here work for MS: either trying to make linux as slow as windows, or as unstable as windows. :P

Now i will have to break it to find out who. :lol:
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Goto page Previous  1, 2, 3, 4, 5
Page 5 of 5

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum