Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[solved] Additional cflags or tweaks for piledriver cpu's?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
vexatious
Tux's lil' helper
Tux's lil' helper


Joined: 24 Aug 2010
Posts: 85

PostPosted: Thu Jan 23, 2014 2:53 am    Post subject: [solved] Additional cflags or tweaks for piledriver cpu's? Reply with quote

Been using the following CFLAGS for my piledriver CPU, as recommended by AMD's official GCC optimization guide (with a couple of my own: mfpmath=sse msseregparm):
Code:
export CFLAGS="-O2 -pipe -march=native -msse -msse2 -msse3 -msse4a -mno-3dnow -msseregparm -mfpmath=sse -fomit-frame-pointer -fopenmp -mprefer-avx128 -minline-all-stringops -fno-tree-pre -ftree-vectorize -funroll-all-loops -fprefetch-loop-arrays -mtune=bdver2"
# This flag gives "cannot compile executables" error with gcc-4.7.3.
# Didn't happen with gcc-4.7.1=WTH...
#--param prefetch-latency=300


Does anyone know of additional tweaks?

I'm asking because I noticed claims of up to 3x greater performance by merely using bdver CFLAG; according to this: http://phoronix.com/forums/showthread.php?64665-GCC-4-6-LLVM-Clang-3-0-Open64-Benchmarks

Intel also shows some performance gains of other types according to this: http://software.intel.com/en-us/blogs/2012/09/26/gcc-x86-performance-hints. Could any of these CFLAGS be applied for additional gains on piledriver?

So are there any more tried and true GCC optimizations for bulldozer/piledriver cpu's?

Regards
_________________
Gentoo
Slackware


Last edited by vexatious on Sat Jan 25, 2014 4:57 am; edited 1 time in total
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 8291
Location: Saint Amant, Acadiana

PostPosted: Thu Jan 23, 2014 3:08 am    Post subject: Reply with quote

-omg-optimized never failed on me.
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
Maitreya
Guru
Guru


Joined: 11 Jan 2006
Posts: 441

PostPosted: Thu Jan 23, 2014 7:53 am    Post subject: Reply with quote

Quote:

HOLY COW I'M TOTALLY GOING SO FAST OH F***


Just stick with native for automagic.
Back to top
View user's profile Send private message
shazeal
Apprentice
Apprentice


Joined: 03 May 2006
Posts: 206
Location: New Zealand

PostPosted: Thu Jan 23, 2014 7:13 pm    Post subject: Reply with quote

Jaglover wrote:
-omg-optimized never failed on me.


I lost 10 kilos following this guide, the extra pulling of hair and screaming definitely helps! Highly recommended AAAA++++

vexatious wrote:
Does anyone know of additional tweaks?


export CFLAGS="-O2 -pipe -march=native"

There fully optimized.
_________________
CFLAGS="-OmgWTFR1CE --fun-lol-loops --march=asmx86go"
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3846
Location: Austro Bavaria

PostPosted: Thu Jan 23, 2014 9:51 pm    Post subject: Reply with quote

shazeal wrote:
Jaglover wrote:
-omg-optimized never failed on me.


I lost 10 kilos following this guide, the extra pulling of hair and screaming definitely helps! Highly recommended AAAA++++

vexatious wrote:
Does anyone know of additional tweaks?


export CFLAGS="-O2 -pipe -march=native"

There fully optimized.


+1


careful with too many compiler flags. I used to use a few and had random compile errors. Means the emerge command suddenly stops with an error. If you can solve such issues, go ahead. But I think most bugs are flagged invalid because of too many compiler flags. They used to do it for myself a few years back.

So in short. If you can handle such random runtime errors, go ahead, if not stick to the plain simple. march-native.

I highly recommend using march-native and the stable compiler in portage if you want a fuss free life.

Usually those flags do not really give a big boost. I tried several times on my T9600 over a few years. Gain nearly nothing. You gain a lot of bugs thats it.

For those amd cpus I have read they need special fast memory. So better buy the fasted Dram with those double memory banks or what the are called. That is much a better investment in any OS.
Back to top
View user's profile Send private message
vexatious
Tux's lil' helper
Tux's lil' helper


Joined: 24 Aug 2010
Posts: 85

PostPosted: Fri Jan 24, 2014 3:00 am    Post subject: Reply with quote

No knowledge of piledriver I guess...

This isn't really a joke. I'm making slackware packages a lot faster vs vanilla slackware (gcc-4.8.2) with the CFLAGS I mentioned (packages get compressed with xz compression at -4e). Only a couple packages didn't like unrolled loops and/or required fPIC, and unrolling loops is a popular tweak despite giving extremely small gains in most cases (why is that?). Machine is more stable and responsive as well.
_________________
Gentoo
Slackware


Last edited by vexatious on Fri Jan 24, 2014 3:10 am; edited 2 times in total
Back to top
View user's profile Send private message
N8Fear
Tux's lil' helper
Tux's lil' helper


Joined: 15 Apr 2013
Posts: 140
Location: Berlin (Germany)

PostPosted: Fri Jan 24, 2014 3:06 am    Post subject: Reply with quote

vexatious wrote:
No knowledge of piledriver I guess...

No knowledge of compiler optimization I guess...

"-march=native -O2" expands to the cflags that your combination of cpu and compiler support. There is really no need to add anything else. You could try -O3 but that doesn't guarantee that the resulting binary is semantically the same as the source that was compiled.
One thing that you could add would be the graphite useflag to gcc and " -floop-interchange -floop-strip-mine -floop-block" to your cflags after rebuilding gcc.
Back to top
View user's profile Send private message
shazeal
Apprentice
Apprentice


Joined: 03 May 2006
Posts: 206
Location: New Zealand

PostPosted: Fri Jan 24, 2014 6:49 am    Post subject: Reply with quote

N8Fear wrote:
vexatious wrote:
No knowledge of piledriver I guess...

No knowledge of compiler optimization I guess...


True, but ricers gotta rice yo! Don't be hatin on the segfaults yo! :lol:
_________________
CFLAGS="-OmgWTFR1CE --fun-lol-loops --march=asmx86go"
Back to top
View user's profile Send private message
vexatious
Tux's lil' helper
Tux's lil' helper


Joined: 24 Aug 2010
Posts: 85

PostPosted: Fri Jan 24, 2014 2:24 pm    Post subject: Reply with quote

LOL @ shazeal.

N8Fear wrote:
vexatious wrote:
No knowledge of piledriver I guess...

No knowledge of compiler optimization I guess...

"-march=native -O2" expands to the cflags that your combination of cpu and compiler support. There is really no need to add anything else. You could try -O3 but that doesn't guarantee that the resulting binary is semantically the same as the source that was compiled.
One thing that you could add would be the graphite useflag to gcc and " -floop-interchange -floop-strip-mine -floop-block" to your cflags after rebuilding gcc.


Right. I understand a compiler is simply the most generic way build to software from generic code (mostly C). If I really wanted speed I'd have to code in assembly AFAIK (instead of letting compiler decide); that's a huge pain for large software packages however.

That graphite useflag and other cflags are something I have to try. I've heard the graphite useflag can make some speed gains.

Really appreciate the responses!

Regards
_________________
Gentoo
Slackware
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3846
Location: Austro Bavaria

PostPosted: Fri Jan 24, 2014 4:17 pm    Post subject: Reply with quote

vexatious wrote:
No knowledge of piledriver I guess...
Machine is more stable and responsive as well.


may I ask why it is more stable and responsive.

I ask out of curiousity, i am not intended to offend or flame.

just curious maybe i learn something.

any proof for your statement?

i used the graphite flag a year ago and had only hassle with my ~amd64 on my notebook. went back and hassle was gone.

isn*t responsiveness not about the kernel itself how it handles the tasks in the scheduler and the amount of memory in your box combined with the timeframe, or ticks, what kernel devs call them.

Please clarify, thanks. I ask out of curiousity, thats it.
Back to top
View user's profile Send private message
N8Fear
Tux's lil' helper
Tux's lil' helper


Joined: 15 Apr 2013
Posts: 140
Location: Berlin (Germany)

PostPosted: Fri Jan 24, 2014 5:42 pm    Post subject: Reply with quote

vexatious wrote:

Right. I understand a compiler is simply the most generic way build to software from generic code (mostly C). If I really wanted speed I'd have to code in assembly AFAIK (instead of letting compiler decide); that's a huge pain for large software packages however.

That graphite useflag and other cflags are something I have to try. I've heard the graphite useflag can make some speed gains.


A compiler is a program that translates a higher level language to machine language (which can be represented by assembler since assembler maps 1:1 to machine code). Coding in assembler doesn't increase the performance per se. You can easily write a program in assembly that is slow as hell. Writing in assembly allows you to optimize because you can optimize your code to a specific processor (i.e. you can optimize the use of caches by avoid cache line trashing, you can reorder independent instructions in a way that makes most use of pipelining or optimize the use of the branching prediction).
The problem is that your piece of code would be optimized for one specific processor. It would run fast on e.g. an i7 but not necessary on a piledriver (due to architectural differences it would most likely not be performing well). Because of that you would not only have to write a program in assembly, but you would have to create a different source for each processor it should run on (which requires intricate knowledge of the internals of a given processor).
You can also optimize in C (e.g. you can walk through an array of arrays by line and not by column because you avoid cache line trashing). It is also important to recognize that most code must not really be optimized hard because your processors tend to idle most of the time anyways (for big number crunching calculations in e.g. scientific applications that would not necessarily be true).

The next step on the ladder is to realize that modern compilers can really optimize in a very good way (that's what you try to archive with your heap of cflags). Compilers can for example unroll loops (to better use pipelining and branch prediction) or even eliminate loops completely if the result is static anyways (take a look of the assembly of a loop that does nothing than counting to MAX_INT: if you use optimization the compiler won't calculate anything but simply take MAX_INT). In fact modern compilers optimize good enough that you need quite a bit of knowledge to get more optimization even if you code assembly yourself.

This takes us to cflags: you don't get most out of your processor by taking a bunch of them and throw them at your program but by making a choice that actually fits to your processor architecture. If you want to archive this, you can either take a bunch of technical documentation an read enough to enable you to code optimized C or assembly yourself, you can take some arcane ricer flags from some forum post of someone that hopefully has done the research for you or you can trust people who know how things work. selecting -march=native expands to a set of cflags that some wise guys deem optimal for your architecture.
You can run:
Code:
gcc -march=native -E -v - </dev/null 2>&1 | sed -n 's/.* -v - //p'
to look how it expands for your machine. Mine is for example
Code:
-march=corei7 -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mno-avx -mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mno-xsave -mno-xsaveopt --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072 -mtune=corei7 -fno-strict-overflow -fPIE -fstack-protector-all -fstack-check=specific
.
It doesn't serve any benefits to add these yourself other than showing other people that you in reality have no idea what these flags really do....
This expansion shows just CPU specific stuff, which generic optimization options are used depends on your -Ox. You'll find an overview over what is selected here: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html.

I hope that this helps you (and likely others) to see that while there are benefits in using optimization, optimization is not archived by a random bunch of cflags that you throw into your compiler and that the best way to optimize is trusting the guys who know how to do it correctly (and therefore choose sane but efficient defaults)....
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Sat Jan 25, 2014 1:19 am    Post subject: Reply with quote

Split off after this point for really egregiously obvious Forum Guidelines violations. Vexatious, please remain civil.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 8291
Location: Saint Amant, Acadiana

PostPosted: Sat Jan 25, 2014 1:23 am    Post subject: Reply with quote

These forums are getting much more liberal than they used to be. Once I got banned from these forums for much less. I stayed away for many years. Or is liberal a wrong term perhaps?
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Sat Jan 25, 2014 1:26 am    Post subject: Reply with quote

N8Fear wrote:
...I hope that this helps you (and likely others) to see that while there are benefits in using optimization, optimization is not archived by a random bunch of cflags that you throw into your compiler and that the best way to optimize is trusting the guys who know how to do it correctly (and therefore choose sane but efficient defaults)....
Alas, this isn't always the whole story. When a CPU is new, -March=native sometimes doesn't know about specific features that can be safely enabled. Now, I don't personally know whether contemporary gcc is Piledriver-aware. Do you?

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
N8Fear
Tux's lil' helper
Tux's lil' helper


Joined: 15 Apr 2013
Posts: 140
Location: Berlin (Germany)

PostPosted: Sat Jan 25, 2014 1:30 am    Post subject: Reply with quote

Yeah - I do. Take a look at http://developer.amd.com/community/blog/2012/04/23/gcc-4-7-is-available-with-support-for-amd-opteron-6200-series-and-amd-fx-series-processors/.
So GCC 4.7.x should have piledriver support. Generally one can say: a new processor needs a new compiler version.
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Sat Jan 25, 2014 1:39 am    Post subject: Reply with quote

Thanks; good to know. However, it does not follow that all new compiler versions support all released CPUs. I think you were painting a slightly incomplete picture.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
N8Fear
Tux's lil' helper
Tux's lil' helper


Joined: 15 Apr 2013
Posts: 140
Location: Berlin (Germany)

PostPosted: Sat Jan 25, 2014 2:28 am    Post subject: Reply with quote

I think in most cases the cpu vendors try to get the optimization into gcc before the actual release of the processor (at least for amd and intel this will most likely be nearly always the case). You are likely correct that there may be times when the compiler doesn't correctly support the features - it'll normally make a "downgrade" to the highest supported version (e.g. core2 instead of corei7).

What holds in any case should be that choosing the correct cflags (at least the more arcane ones) should be left to people who know what they have to do...
Back to top
View user's profile Send private message
_______0
Guru
Guru


Joined: 15 Oct 2012
Posts: 521

PostPosted: Sat Jan 25, 2014 8:41 pm    Post subject: Reply with quote

not only extra cflags don't translate automagically into faster code but the programmer needs change the code to take advantage of a specific new CPU flag.

one example is unicode decoding with ssse and I've read now there's a project decoding it with the gpu.

Ultimately it comes down to the programmer coding it the most optimized way.
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Sun Jan 26, 2014 1:39 pm    Post subject: Reply with quote

N8Fear wrote:
I think in most cases the cpu vendors try to get the optimization into gcc before the actual release of the processor (at least for amd and intel this will most likely be nearly always the case). ...
This emphatically did not happen with Atom. When Atom was released, -march=native did what you said, choosing a safe subset, but this ignored several Atom features (which were already supported because they were present in other architectures) which could be enabled via additional CFLAGS.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
vexatious
Tux's lil' helper
Tux's lil' helper


Joined: 24 Aug 2010
Posts: 85

PostPosted: Tue Jan 28, 2014 2:29 pm    Post subject: Reply with quote

Thanks for all the help and I'm really sorry for giving bad feedback (especially in my deleted post about something racial). I really do appreciate the help and think this is one of the best forums with some of the smartest people!

God bless!
_________________
Gentoo
Slackware
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum