Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Making full use of cpu registers in CFLAGS
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next  
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
xedx
Tux's lil' helper
Tux's lil' helper


Joined: 23 May 2003
Posts: 93

PostPosted: Sat Jun 14, 2003 3:38 am    Post subject: Reply with quote

dgrant wrote:
from freehackers.org:

Quote:

Athlon-tbird, aka K7 (AMD)

CFLAGS="-march=athlon-tbird -O3 -pipe -fforce-addr -fomit-frame-pointer
-funroll-loops -falign-functions=4 -maccumulate-outgoing-args"
CXXFLAGS="${CFLAGS}"

note : -m3dnow and -mmmx optimisations are implied by march=athlon-tbird


It says that 3dnow and mmx optimization are implied by athlon-tbird?


that's why u dont need to put them (eg. -m3dnow) when you have -march={athlon,etc}
_________________
--+//+
Back to top
View user's profile Send private message
Radea
n00b
n00b


Joined: 08 May 2003
Posts: 59
Location: United States

PostPosted: Mon Jun 16, 2003 10:21 pm    Post subject: Reply with quote

Athlons have 128k L1 cache (I think it is Durons with 64), therefore would it not be better to use '-falign-functions=128' instead of '-falign-functions=64'
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Mon Jun 16, 2003 10:40 pm    Post subject: Reply with quote

Radea wrote:
Athlons have 128k L1 cache (I think it is Durons with 64), therefore would it not be better to use '-falign-functions=128' instead of '-falign-functions=64'

athlons have 64k level1 cache, and 256mb level 2 cache (barton athlons have 512), whereas duron's have 64k level 1 cache, and 128k level 2 cache.

they both have the same level 1 cache size, so '-falign-functions=64' is correct for both cpus.
[img:6b6c26e30b]http://www.penguinitis.com/images/athloncache.jpg[/img:6b6c26e30b]
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
Radea
n00b
n00b


Joined: 08 May 2003
Posts: 59
Location: United States

PostPosted: Mon Jun 16, 2003 10:56 pm    Post subject: Reply with quote

taskara wrote:
Radea wrote:
Athlons have 128k L1 cache (I think it is Durons with 64), therefore would it not be better to use '-falign-functions=128' instead of '-falign-functions=64'

athlons have 64k level1 cache, and 256mb level 2 cache (barton athlons have 512), whereas duron's have 64k level 1 cache, and 128k level 2 cache.

they both have the same level 1 cache size, so '-falign-functions=64' is correct for both cpus.

Why am I thinking 128 then? :( Im also thinking 384K total cache for non-Bartons, maybe adding AMD was adding the I-Cache and D-Cache as a marketing number? :? Either that or Im just going completly crazy :lol:

Edit
"All Athlon XP processors (including Barton) contain a 128K L1 cache, 64K for data, and 64K for instructions." |LINK|
That must be whats getting me. :P So it is 128K L1 cache but there are two types, or? :oops:
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Mon Jun 16, 2003 11:02 pm    Post subject: Reply with quote

hehe what's probably getting you is that athlons can COMBINE their level 1 and level 2 cache, where as pentium's can't, they remain seperate :)
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
pi-cubic
Tux's lil' helper
Tux's lil' helper


Joined: 25 May 2003
Posts: 143

PostPosted: Wed Jun 18, 2003 10:11 am    Post subject: Reply with quote

first: wow, there is much info in this thread and i feel quite overwhelmed by it...and i'm not even a native english speaker ;)

second: reading this thread and other sources, i finally have the following CFLAGS for my Intel Pentium 4M (laptop) machine:
Code:
CFLAGS="-march=pentium3 -mcpu=pentium4 -O3 -finline-functions -falign-jumps=5 -falign-loops=5 -falign-functions=64 -pipe"
it would be a great help for me, if anyone could tell me if i included a very stupid bug. thank you guys...


pi-cubiq
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Wed Jun 18, 2003 10:23 am    Post subject: Reply with quote

pi-cubiq wrote:
first: wow, there is much info in this thread and i feel quite overwhelmed by it...and i'm not even a native english speaker ;)

second: reading this thread and other sources, i finally have the following CFLAGS for my Intel Pentium 4M (laptop) machine:
Code:
CFLAGS="-march=pentium3 -mcpu=pentium4 -O3 -finline-functions -falign-jumps=5 -falign-loops=5 -falign-functions=64 -pipe"
it would be a great help for me, if anyone could tell me if i included a very stupid bug. thank you guys...


pi-cubiq


the only problem I can see is that it should read
Quote:
-march=athlon-xp

:twisted:
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
pi-cubic
Tux's lil' helper
Tux's lil' helper


Joined: 25 May 2003
Posts: 143

PostPosted: Wed Jun 18, 2003 7:00 pm    Post subject: Reply with quote

taskara wrote:
the only problem I can see is that it should read
Quote:
-march=athlon-xp

:twisted:

i'm sorry, but i can't follow you :(. do you mean, that my cflags-settings would be for an athlon-xp? what do you mean by 'it should read'?
Back to top
View user's profile Send private message
MOS-FET
Apprentice
Apprentice


Joined: 20 May 2003
Posts: 291
Location: Cologne, Germany

PostPosted: Fri Jun 20, 2003 9:01 pm    Post subject: my cflags Reply with quote

ok so i've almost read every post in this topic, and i finaly chose these cflags/cxxflags for me (i've got a athlon-xp):

"-march=athlon-xp -O3 -pipe -m3dnow -mmmx -msse -mfpmath=sse,387 -finline-functions -fmerge-all-constants -fthread-jumps -fomit-frame-pointer -fexpensive-optimizations -ffast-math -fforce-addr -falign-functions=64 -falign-jumps=4 -falign-loops=4 -frerun-cse-after-loop -frerun-loop-opt -fprefetch-loop-arrays -maccumulate-outgoing-args"

i've just compiled a few packages with that, and everything seems to work fine. i'll do an emerge -e world this night and see what happens tomorrow :-) do i remember right that those cflags do NOT apply when making a new kernel?

do you have any suggestions about these cflags? did i miss something or should i remove something? this whole cflags thing is damn confusing, i mean, there must be someone out there who knows that they all do and if it's a good idea to use them or not ...

tom
Back to top
View user's profile Send private message
esapersona
n00b
n00b


Joined: 17 May 2003
Posts: 16
Location: Perth, Western Australia

PostPosted: Sat Jun 21, 2003 7:36 am    Post subject: Reply with quote

I have an athlon-xp and I had a few problems with fast-math...Mainly on the emerge system, so you may need to muddle around with those packages if you want to use fast-math...
Back to top
View user's profile Send private message
MOS-FET
Apprentice
Apprentice


Joined: 20 May 2003
Posts: 291
Location: Cologne, Germany

PostPosted: Sat Jun 21, 2003 7:57 am    Post subject: Reply with quote

well my emerge -e world just finished, and everything is just working fine. no problems at all yet, neither at compilation nor when using the system.
Back to top
View user's profile Send private message
esapersona
n00b
n00b


Joined: 17 May 2003
Posts: 16
Location: Perth, Western Australia

PostPosted: Sat Jun 21, 2003 11:05 am    Post subject: Reply with quote

Great! Perhaps I'll have to look into using those CFLAGS....Perhaps it was some combination that I used
Back to top
View user's profile Send private message
MOS-FET
Apprentice
Apprentice


Joined: 20 May 2003
Posts: 291
Location: Cologne, Germany

PostPosted: Sat Jun 21, 2003 11:12 am    Post subject: Reply with quote

well -ffast-math and -mpfmath= seem like they have to do with each other, i don't know.
Back to top
View user's profile Send private message
MOS-FET
Apprentice
Apprentice


Joined: 20 May 2003
Posts: 291
Location: Cologne, Germany

PostPosted: Sat Jun 21, 2003 1:52 pm    Post subject: wow Reply with quote

ok, so i finished compiling all packages with the CFLAGS i posted earlier and i must say - my system is FEELABLE faster. everything runs really smooth, much smoother than before.

ok here's my hardware data:
athlon xp 2200+ (1800 mhz)
msi f41 mainboard
nvidia nforce2 chipset

my CFLAGS and CXXFLAGS are:

"-march=athlon-xp -O3 -pipe -m3dnow -mmmx -msse -mfpmath=sse,387 -finline-functions -fmerge-all-constants -fthread-jumps -fomit-frame-pointer -fexpensive-optimizations -ffast-math -fforce-addr -falign-functions=64 -falign-jumps=4 -falign-loops=4 -frerun-cse-after-loop -frerun-loop-opt -fprefetch-loop-arrays -maccumulate-outgoing-args"

as i said, i did an emerge -e world, and i had no problems compiling/running the system, kde, mozilla, k3b, mplayer, xmms, gaim, lmule and a few other apps so far - and it's stunning fast! i wish i had made a benchmark, but it feels much faster really.

tom
Back to top
View user's profile Send private message
esapersona
n00b
n00b


Joined: 17 May 2003
Posts: 16
Location: Perth, Western Australia

PostPosted: Sun Jun 22, 2003 8:54 am    Post subject: Reply with quote

Okay - I've changed my CFLAGS to what you have. I have to try this :o
Seems to be going alright - *yay*
Back to top
View user's profile Send private message
MOS-FET
Apprentice
Apprentice


Joined: 20 May 2003
Posts: 291
Location: Cologne, Germany

PostPosted: Sun Jun 22, 2003 9:32 am    Post subject: Reply with quote

hey i just even compiled openoffice 1.1beta2 with the above cflags. that's really surprising me because the ebuild tells you that openoffice is very fragile about aggressive cflags ... but openoffice is so stunning fast now!
Back to top
View user's profile Send private message
drake51
n00b
n00b


Joined: 15 Jun 2003
Posts: 13

PostPosted: Sun Jun 22, 2003 11:25 am    Post subject: Reply with quote

I am doing an emerge -eUD world right now. I have modified my use/cflags as stated below.

If this were to fail for some non-critical package, what is the best way to handle it? Will adding --resume --skipfirst get past it without recompiling all the prior packages again?

As far for the flags...I based them on the awsome details provided by defconfoo. Do they look complete? I have been compiling with them for the past 5 hrs (after I recompiled the kernel and rebooted).

Code:
snip from cpuinfo....
model name      : Intel(R) Pentium(R) 4 CPU 3.06GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm


Code:
USE="aaob acpi acpi4linux dvd emacs fbcon gtk2 jikes ofx pda radeon samba sse usb"
CFLAGS="-march=pentium4 -O3 -pipe -mfpmath=sse -fomit-frame-pointer -ffast-math -fprefetch-loop-arrays -fmerge-all-constants -mmmx -msse -msse2"
Back to top
View user's profile Send private message
esapersona
n00b
n00b


Joined: 17 May 2003
Posts: 16
Location: Perth, Western Australia

PostPosted: Sun Jun 22, 2003 1:42 pm    Post subject: Reply with quote

drake51 wrote:
If this were to fail for some non-critical package, what is the best way to handle it? Will adding --resume --skipfirst get past it without recompiling all the prior packages again?


I had a package fail during my current emerge -e....How annoying. What I did:
Code:
emerge -ep world > foo
vi foo

Edit out the first few lines and the packages already emerged so that you're only left with the list of packages not yet updated (it's in order, so just remember the last one and delete until that...Then type
Code:
:1,$s/\[ebuild\ \ N\ \ \ \]//g

That'll get rid of all the ebuild stuff...Then I went through deleting all ther version numbers (so that I was left with only the package names. There is probably a way to do that automatically (perhaps something like this?):
EDIT: I just tried this line - Doesn't work...You need to replace the * with something that means any number of characters....
Code:
:1,$s/-[0987654321]*\n//g


THen write and quit with :wq, and on the command line type:
Code:
emerge -p `cat foo`
to make sure it's all good...THen do that line again without the -p. THere is probably a better way, but I took the opertunity to familiarize myself with vim instead of findint it out :wink:
Back to top
View user's profile Send private message
TheCoop
Veteran
Veteran


Joined: 15 Jun 2002
Posts: 1814
Location: Where you least expect it

PostPosted: Sun Jun 22, 2003 2:49 pm    Post subject: Reply with quote

or you could type
Code:
emerge -e --resume world

_________________
95% of all computer errors occur between chair and keyboard (TM)

"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler

Change the world - move a rock
Back to top
View user's profile Send private message
esapersona
n00b
n00b


Joined: 17 May 2003
Posts: 16
Location: Perth, Western Australia

PostPosted: Mon Jun 23, 2003 12:10 am    Post subject: Reply with quote

Bah - But what do you learn from that? =P
Back to top
View user's profile Send private message
cchapman
Guru
Guru


Joined: 16 Jan 2003
Posts: 440
Location: Fremont, NE

PostPosted: Wed Jun 25, 2003 4:14 pm    Post subject: Reply with quote

Here are the optimizations per -0#

Code:
-0

          -fdefer-pop
          -fmerge-constants
          -fthread-jumps
          -floop-optimize
          -fcrossjumping
          -fif-conversion
          -fif-conversion2
          -fdelayed-branch
          -fguess-branch-probability
          -fcprop-registers
         
-O2

          -fdefer-pop
          -fmerge-constants
          -fthread-jumps
          -floop-optimize
          -fcrossjumping
          -fif-conversion
          -fif-conversion2
          -fdelayed-branch
          -fguess-branch-probability
          -fcprop-registers
          -fforce-mem
          -foptimize-sibling-calls
          -fstrength-reduce
          -fcse-follow-jumps 
          -fcse-skip-blocks
          -frerun-cse-after-loop 
          -frerun-loop-opt
          -fgcse   
          -fgcse-lm   
          -fgcse-sm
          -fdelete-null-pointer-checks
          -fexpensive-optimizations
          -fregmove
          -fschedule-insns 
          -fschedule-insns2
          -fsched-interblock 
          -fsched-spec
          -fcaller-saves
          -fpeephole2
          -freorder-blocks 
          -freorder-functions
          -fstrict-aliasing
          -falign-functions 
          -falign-jumps
          -falign-loops 
          -falign-labels

-O3
          -fdefer-pop
          -fmerge-constants
          -fthread-jumps
          -floop-optimize
          -fcrossjumping
          -fif-conversion
          -fif-conversion2
          -fdelayed-branch
          -fguess-branch-probability
          -fcprop-registers
          -fforce-mem
          -foptimize-sibling-calls
          -fstrength-reduce
          -fcse-follow-jumps 
      -fcse-skip-blocks
          -frerun-cse-after-loop 
     -frerun-loop-opt
          -fgcse   
     -fgcse-lm   
     -fgcse-sm
          -fdelete-null-pointer-checks
          -fexpensive-optimizations
          -fregmove
          -fschedule-insns 
     -fschedule-insns2
          -fsched-interblock 
     -fsched-spec
          -fcaller-saves
          -fpeephole2
          -freorder-blocks 
     -freorder-functions
          -fstrict-aliasing
          -falign-functions 
     -falign-jumps
          -falign-loops 
     -falign-labels
     -finline-functions
     -funit-at-a-time
-frename-registers
Back to top
View user's profile Send private message
invaderzim
Tux's lil' helper
Tux's lil' helper


Joined: 16 Aug 2002
Posts: 93
Location: Louisville, KY

PostPosted: Wed Jul 02, 2003 3:42 am    Post subject: Definitive answer. Reply with quote

Code:
-march=pentium3 -mmmx -msse -O2 -fomit-frame-pointer -pipe -mfpmath=sse,387 -mno-push-args -mno-align-stringops -frename-registers -ffast-math -fprefetch-loop-arrays -s


implies

Code:
options enabled:  -fdefer-pop -fomit-frame-pointer -foptimize-sibling-calls
 -fcse-follow-jumps -fcse-skip-blocks -fexpensive-optimizations
 -fthread-jumps -fstrength-reduce -fprefetch-loop-arrays -fpeephole
 -fforce-mem -ffunction-cse -fkeep-static-consts -fcaller-saves
 -fpcc-struct-return -fgcse -fgcse-lm -fgcse-sm -frerun-cse-after-loop
 -frerun-loop-opt -fdelete-null-pointer-checks -fschedule-insns2
 -fsched-interblock -fsched-spec -fbranch-count-reg -freorder-blocks
 -frename-registers -fcprop-registers -fcommon -fgnu-linker -fregmove
 -foptimize-register-move -fargument-alias -fstrict-aliasing
 -fmerge-constants -fident -fpeephole2 -fguess-branch-probability
 -funsafe-math-optimizations -m80387 -mhard-float -mno-soft-float
 -mfp-ret-in-387 -mno-align-stringops -mno-push-args -mmmx -mno-mmx -msse
 -mno-sse -mcpu=pentium3 -mfpmath=sse,387 -march=pentium3



These flags are all safe except -ffast-math but i have had NO problems with it yet on my old flags (-s -march=pentium3 -mmmx -msse -Os -fomit-frame-pointer -pipe -fforce-addr -ffast-math -mpush-args -mfpmath=sse,387 -fschedule-insns2 -fmerge-all-constants)

i emailed GNU about -mmmx -mno-mmx and the sse ones...ill tell you what they say, im hoping its just an error in the output.

@all: still dont REALLY know if MMX and SSE really are IMPLIED by -march= because whats the point of -mmmx and -msse then?

defconfoo:
id like your input on these flags...i think they are the the best flags possible... they use defaults by cpu for all the settings not specified. What do you think of -ffast-math? If you think its okay because MANY MANY MANY use it for their whole system with no problems, then what do you think about
Code:
-fno-math-errno
    Do not set ERRNO after calling math functions that are executed with a single instruction, e.g., sqrt. A program that relies on IEEE exceptions for math error handling may want to use this flag for speed while maintaining IEEE arithmetic compatibility.

    This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

    The default is -fmath-errno.


errm just answered my question so you all know..
-ffast-math
Sets -fno-math-errno, -funsafe-math-optimizations,
-fno-trapping-math, -ffinite-math-only and
-fno-signaling-nans.

Thanks LAta!
Back to top
View user's profile Send private message
ph317
n00b
n00b


Joined: 02 Jun 2002
Posts: 43

PostPosted: Thu Jul 03, 2003 5:24 am    Post subject: cache and alignment Reply with quote

A few corrections to some misinfo above:

First off, L1 and L2 caches are seperate, even on athlons.
Also, those caches, whether they're 64, 128, 256, etc... are in Kbytes, not megabytes.
Last but not least, please don't go doing "-falign-functions=64" or any other crazy value like that. Sane values are things "4". It's just how many bytes to align the functions by so that jumps are efficient. jumps, memory access, etc or some processors is just more efficient when aligned on certain boundaries, usually something like 4 bytes, which has nothing to do with L1/L2 cache size. The only relation between -falign-XXXX=Y and cache sizes is that as you increase the alignment value for faster access, you leave gaps, which means the overall size of the code or data is larger and runs a greater statistical chance of causing cache misses by making things a little further apart.
Back to top
View user's profile Send private message
Kesereti
Guru
Guru


Joined: 07 Nov 2002
Posts: 520

PostPosted: Fri Jul 04, 2003 12:06 am    Post subject: This may be a silly question... Reply with quote

But which runs of Athlon chips are T-Birds? ^_^ I'm rather confused about the naming conventions of AMD chips =P
Back to top
View user's profile Send private message
odegard
Guru
Guru


Joined: 08 Mar 2003
Posts: 324
Location: Trondheim, NO

PostPosted: Fri Jul 04, 2003 8:21 pm    Post subject: Re: cache and alignment Reply with quote

ph317 wrote:
A few corrections to some misinfo above:
First off, L1 and L2 caches are seperate, even on athlons.


Actually, *only* on athlons.

However, I was thinking, what is the bottleneck on modern computers? I/O. So why don't we optimize the code for smaller footprint than for faster execution? Lets be utterly simplistic and say that there are two variables: LOAD and EXECUTE. LOAD is far bigger than EXECUTE so in order to get a total boost, get LOAD down, even thought it may use longer time EXECUTING.

Agree/disagree?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next
Page 5 of 7

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum