View previous topic :: View next topic |
Author |
Message |
xedx Tux's lil' helper
Joined: 23 May 2003 Posts: 93
|
Posted: Sat Jun 14, 2003 3:38 am Post subject: |
|
|
dgrant wrote: | from freehackers.org:
Quote: |
Athlon-tbird, aka K7 (AMD)
CFLAGS="-march=athlon-tbird -O3 -pipe -fforce-addr -fomit-frame-pointer
-funroll-loops -falign-functions=4 -maccumulate-outgoing-args"
CXXFLAGS="${CFLAGS}"
note : -m3dnow and -mmmx optimisations are implied by march=athlon-tbird
|
It says that 3dnow and mmx optimization are implied by athlon-tbird? |
that's why u dont need to put them (eg. -m3dnow) when you have -march={athlon,etc} _________________ --+//+ |
|
Back to top |
|
|
Radea n00b
Joined: 08 May 2003 Posts: 59 Location: United States
|
Posted: Mon Jun 16, 2003 10:21 pm Post subject: |
|
|
Athlons have 128k L1 cache (I think it is Durons with 64), therefore would it not be better to use '-falign-functions=128' instead of '-falign-functions=64' |
|
Back to top |
|
|
taskara Advocate
Joined: 10 Apr 2002 Posts: 3763 Location: Australia
|
Posted: Mon Jun 16, 2003 10:40 pm Post subject: |
|
|
Radea wrote: | Athlons have 128k L1 cache (I think it is Durons with 64), therefore would it not be better to use '-falign-functions=128' instead of '-falign-functions=64' |
athlons have 64k level1 cache, and 256mb level 2 cache (barton athlons have 512), whereas duron's have 64k level 1 cache, and 128k level 2 cache.
they both have the same level 1 cache size, so '-falign-functions=64' is correct for both cpus.
[img:6b6c26e30b]http://www.penguinitis.com/images/athloncache.jpg[/img:6b6c26e30b] _________________ Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer! |
|
Back to top |
|
|
Radea n00b
Joined: 08 May 2003 Posts: 59 Location: United States
|
Posted: Mon Jun 16, 2003 10:56 pm Post subject: |
|
|
taskara wrote: | Radea wrote: | Athlons have 128k L1 cache (I think it is Durons with 64), therefore would it not be better to use '-falign-functions=128' instead of '-falign-functions=64' |
athlons have 64k level1 cache, and 256mb level 2 cache (barton athlons have 512), whereas duron's have 64k level 1 cache, and 128k level 2 cache.
they both have the same level 1 cache size, so '-falign-functions=64' is correct for both cpus.
|
Why am I thinking 128 then? Im also thinking 384K total cache for non-Bartons, maybe adding AMD was adding the I-Cache and D-Cache as a marketing number? Either that or Im just going completly crazy
Edit
"All Athlon XP processors (including Barton) contain a 128K L1 cache, 64K for data, and 64K for instructions." |LINK|
That must be whats getting me. So it is 128K L1 cache but there are two types, or? |
|
Back to top |
|
|
taskara Advocate
Joined: 10 Apr 2002 Posts: 3763 Location: Australia
|
Posted: Mon Jun 16, 2003 11:02 pm Post subject: |
|
|
hehe what's probably getting you is that athlons can COMBINE their level 1 and level 2 cache, where as pentium's can't, they remain seperate _________________ Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer! |
|
Back to top |
|
|
pi-cubic Tux's lil' helper
Joined: 25 May 2003 Posts: 143
|
Posted: Wed Jun 18, 2003 10:11 am Post subject: |
|
|
first: wow, there is much info in this thread and i feel quite overwhelmed by it...and i'm not even a native english speaker
second: reading this thread and other sources, i finally have the following CFLAGS for my Intel Pentium 4M (laptop) machine: Code: | CFLAGS="-march=pentium3 -mcpu=pentium4 -O3 -finline-functions -falign-jumps=5 -falign-loops=5 -falign-functions=64 -pipe" | it would be a great help for me, if anyone could tell me if i included a very stupid bug. thank you guys...
pi-cubiq |
|
Back to top |
|
|
taskara Advocate
Joined: 10 Apr 2002 Posts: 3763 Location: Australia
|
Posted: Wed Jun 18, 2003 10:23 am Post subject: |
|
|
pi-cubiq wrote: | first: wow, there is much info in this thread and i feel quite overwhelmed by it...and i'm not even a native english speaker
second: reading this thread and other sources, i finally have the following CFLAGS for my Intel Pentium 4M (laptop) machine: Code: | CFLAGS="-march=pentium3 -mcpu=pentium4 -O3 -finline-functions -falign-jumps=5 -falign-loops=5 -falign-functions=64 -pipe" | it would be a great help for me, if anyone could tell me if i included a very stupid bug. thank you guys...
pi-cubiq |
the only problem I can see is that it should read
_________________ Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer! |
|
Back to top |
|
|
pi-cubic Tux's lil' helper
Joined: 25 May 2003 Posts: 143
|
Posted: Wed Jun 18, 2003 7:00 pm Post subject: |
|
|
taskara wrote: | the only problem I can see is that it should read
|
i'm sorry, but i can't follow you . do you mean, that my cflags-settings would be for an athlon-xp? what do you mean by 'it should read'? |
|
Back to top |
|
|
MOS-FET Apprentice
Joined: 20 May 2003 Posts: 291 Location: Cologne, Germany
|
Posted: Fri Jun 20, 2003 9:01 pm Post subject: my cflags |
|
|
ok so i've almost read every post in this topic, and i finaly chose these cflags/cxxflags for me (i've got a athlon-xp):
"-march=athlon-xp -O3 -pipe -m3dnow -mmmx -msse -mfpmath=sse,387 -finline-functions -fmerge-all-constants -fthread-jumps -fomit-frame-pointer -fexpensive-optimizations -ffast-math -fforce-addr -falign-functions=64 -falign-jumps=4 -falign-loops=4 -frerun-cse-after-loop -frerun-loop-opt -fprefetch-loop-arrays -maccumulate-outgoing-args"
i've just compiled a few packages with that, and everything seems to work fine. i'll do an emerge -e world this night and see what happens tomorrow :-) do i remember right that those cflags do NOT apply when making a new kernel?
do you have any suggestions about these cflags? did i miss something or should i remove something? this whole cflags thing is damn confusing, i mean, there must be someone out there who knows that they all do and if it's a good idea to use them or not ...
tom |
|
Back to top |
|
|
esapersona n00b
Joined: 17 May 2003 Posts: 16 Location: Perth, Western Australia
|
Posted: Sat Jun 21, 2003 7:36 am Post subject: |
|
|
I have an athlon-xp and I had a few problems with fast-math...Mainly on the emerge system, so you may need to muddle around with those packages if you want to use fast-math... |
|
Back to top |
|
|
MOS-FET Apprentice
Joined: 20 May 2003 Posts: 291 Location: Cologne, Germany
|
Posted: Sat Jun 21, 2003 7:57 am Post subject: |
|
|
well my emerge -e world just finished, and everything is just working fine. no problems at all yet, neither at compilation nor when using the system. |
|
Back to top |
|
|
esapersona n00b
Joined: 17 May 2003 Posts: 16 Location: Perth, Western Australia
|
Posted: Sat Jun 21, 2003 11:05 am Post subject: |
|
|
Great! Perhaps I'll have to look into using those CFLAGS....Perhaps it was some combination that I used |
|
Back to top |
|
|
MOS-FET Apprentice
Joined: 20 May 2003 Posts: 291 Location: Cologne, Germany
|
Posted: Sat Jun 21, 2003 11:12 am Post subject: |
|
|
well -ffast-math and -mpfmath= seem like they have to do with each other, i don't know. |
|
Back to top |
|
|
MOS-FET Apprentice
Joined: 20 May 2003 Posts: 291 Location: Cologne, Germany
|
Posted: Sat Jun 21, 2003 1:52 pm Post subject: wow |
|
|
ok, so i finished compiling all packages with the CFLAGS i posted earlier and i must say - my system is FEELABLE faster. everything runs really smooth, much smoother than before.
ok here's my hardware data:
athlon xp 2200+ (1800 mhz)
msi f41 mainboard
nvidia nforce2 chipset
my CFLAGS and CXXFLAGS are:
"-march=athlon-xp -O3 -pipe -m3dnow -mmmx -msse -mfpmath=sse,387 -finline-functions -fmerge-all-constants -fthread-jumps -fomit-frame-pointer -fexpensive-optimizations -ffast-math -fforce-addr -falign-functions=64 -falign-jumps=4 -falign-loops=4 -frerun-cse-after-loop -frerun-loop-opt -fprefetch-loop-arrays -maccumulate-outgoing-args"
as i said, i did an emerge -e world, and i had no problems compiling/running the system, kde, mozilla, k3b, mplayer, xmms, gaim, lmule and a few other apps so far - and it's stunning fast! i wish i had made a benchmark, but it feels much faster really.
tom |
|
Back to top |
|
|
esapersona n00b
Joined: 17 May 2003 Posts: 16 Location: Perth, Western Australia
|
Posted: Sun Jun 22, 2003 8:54 am Post subject: |
|
|
Okay - I've changed my CFLAGS to what you have. I have to try this
Seems to be going alright - *yay* |
|
Back to top |
|
|
MOS-FET Apprentice
Joined: 20 May 2003 Posts: 291 Location: Cologne, Germany
|
Posted: Sun Jun 22, 2003 9:32 am Post subject: |
|
|
hey i just even compiled openoffice 1.1beta2 with the above cflags. that's really surprising me because the ebuild tells you that openoffice is very fragile about aggressive cflags ... but openoffice is so stunning fast now! |
|
Back to top |
|
|
drake51 n00b
Joined: 15 Jun 2003 Posts: 13
|
Posted: Sun Jun 22, 2003 11:25 am Post subject: |
|
|
I am doing an emerge -eUD world right now. I have modified my use/cflags as stated below.
If this were to fail for some non-critical package, what is the best way to handle it? Will adding --resume --skipfirst get past it without recompiling all the prior packages again?
As far for the flags...I based them on the awsome details provided by defconfoo. Do they look complete? I have been compiling with them for the past 5 hrs (after I recompiled the kernel and rebooted).
Code: | snip from cpuinfo....
model name : Intel(R) Pentium(R) 4 CPU 3.06GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
|
Code: | USE="aaob acpi acpi4linux dvd emacs fbcon gtk2 jikes ofx pda radeon samba sse usb"
CFLAGS="-march=pentium4 -O3 -pipe -mfpmath=sse -fomit-frame-pointer -ffast-math -fprefetch-loop-arrays -fmerge-all-constants -mmmx -msse -msse2"
|
|
|
Back to top |
|
|
esapersona n00b
Joined: 17 May 2003 Posts: 16 Location: Perth, Western Australia
|
Posted: Sun Jun 22, 2003 1:42 pm Post subject: |
|
|
drake51 wrote: | If this were to fail for some non-critical package, what is the best way to handle it? Will adding --resume --skipfirst get past it without recompiling all the prior packages again? |
I had a package fail during my current emerge -e....How annoying. What I did:
Code: | emerge -ep world > foo
vi foo |
Edit out the first few lines and the packages already emerged so that you're only left with the list of packages not yet updated (it's in order, so just remember the last one and delete until that...Then type
Code: | :1,$s/\[ebuild\ \ N\ \ \ \]//g |
That'll get rid of all the ebuild stuff...Then I went through deleting all ther version numbers (so that I was left with only the package names. There is probably a way to do that automatically (perhaps something like this?):
EDIT: I just tried this line - Doesn't work...You need to replace the * with something that means any number of characters....
Code: | :1,$s/-[0987654321]*\n//g |
THen write and quit with :wq, and on the command line type:
Code: | emerge -p `cat foo` | to make sure it's all good...THen do that line again without the -p. THere is probably a better way, but I took the opertunity to familiarize myself with vim instead of findint it out |
|
Back to top |
|
|
TheCoop Veteran
Joined: 15 Jun 2002 Posts: 1814 Location: Where you least expect it
|
Posted: Sun Jun 22, 2003 2:49 pm Post subject: |
|
|
or you could type
Code: | emerge -e --resume world |
_________________ 95% of all computer errors occur between chair and keyboard (TM)
"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler
Change the world - move a rock |
|
Back to top |
|
|
esapersona n00b
Joined: 17 May 2003 Posts: 16 Location: Perth, Western Australia
|
Posted: Mon Jun 23, 2003 12:10 am Post subject: |
|
|
Bah - But what do you learn from that? =P |
|
Back to top |
|
|
cchapman Guru
Joined: 16 Jan 2003 Posts: 440 Location: Fremont, NE
|
Posted: Wed Jun 25, 2003 4:14 pm Post subject: |
|
|
Here are the optimizations per -0#
Code: | -0
-fdefer-pop
-fmerge-constants
-fthread-jumps
-floop-optimize
-fcrossjumping
-fif-conversion
-fif-conversion2
-fdelayed-branch
-fguess-branch-probability
-fcprop-registers
-O2
-fdefer-pop
-fmerge-constants
-fthread-jumps
-floop-optimize
-fcrossjumping
-fif-conversion
-fif-conversion2
-fdelayed-branch
-fguess-branch-probability
-fcprop-registers
-fforce-mem
-foptimize-sibling-calls
-fstrength-reduce
-fcse-follow-jumps
-fcse-skip-blocks
-frerun-cse-after-loop
-frerun-loop-opt
-fgcse
-fgcse-lm
-fgcse-sm
-fdelete-null-pointer-checks
-fexpensive-optimizations
-fregmove
-fschedule-insns
-fschedule-insns2
-fsched-interblock
-fsched-spec
-fcaller-saves
-fpeephole2
-freorder-blocks
-freorder-functions
-fstrict-aliasing
-falign-functions
-falign-jumps
-falign-loops
-falign-labels
-O3
-fdefer-pop
-fmerge-constants
-fthread-jumps
-floop-optimize
-fcrossjumping
-fif-conversion
-fif-conversion2
-fdelayed-branch
-fguess-branch-probability
-fcprop-registers
-fforce-mem
-foptimize-sibling-calls
-fstrength-reduce
-fcse-follow-jumps
-fcse-skip-blocks
-frerun-cse-after-loop
-frerun-loop-opt
-fgcse
-fgcse-lm
-fgcse-sm
-fdelete-null-pointer-checks
-fexpensive-optimizations
-fregmove
-fschedule-insns
-fschedule-insns2
-fsched-interblock
-fsched-spec
-fcaller-saves
-fpeephole2
-freorder-blocks
-freorder-functions
-fstrict-aliasing
-falign-functions
-falign-jumps
-falign-loops
-falign-labels
-finline-functions
-funit-at-a-time
-frename-registers |
|
|
Back to top |
|
|
invaderzim Tux's lil' helper
Joined: 16 Aug 2002 Posts: 93 Location: Louisville, KY
|
Posted: Wed Jul 02, 2003 3:42 am Post subject: Definitive answer. |
|
|
Code: | -march=pentium3 -mmmx -msse -O2 -fomit-frame-pointer -pipe -mfpmath=sse,387 -mno-push-args -mno-align-stringops -frename-registers -ffast-math -fprefetch-loop-arrays -s |
implies
Code: | options enabled: -fdefer-pop -fomit-frame-pointer -foptimize-sibling-calls
-fcse-follow-jumps -fcse-skip-blocks -fexpensive-optimizations
-fthread-jumps -fstrength-reduce -fprefetch-loop-arrays -fpeephole
-fforce-mem -ffunction-cse -fkeep-static-consts -fcaller-saves
-fpcc-struct-return -fgcse -fgcse-lm -fgcse-sm -frerun-cse-after-loop
-frerun-loop-opt -fdelete-null-pointer-checks -fschedule-insns2
-fsched-interblock -fsched-spec -fbranch-count-reg -freorder-blocks
-frename-registers -fcprop-registers -fcommon -fgnu-linker -fregmove
-foptimize-register-move -fargument-alias -fstrict-aliasing
-fmerge-constants -fident -fpeephole2 -fguess-branch-probability
-funsafe-math-optimizations -m80387 -mhard-float -mno-soft-float
-mfp-ret-in-387 -mno-align-stringops -mno-push-args -mmmx -mno-mmx -msse
-mno-sse -mcpu=pentium3 -mfpmath=sse,387 -march=pentium3 |
These flags are all safe except -ffast-math but i have had NO problems with it yet on my old flags (-s -march=pentium3 -mmmx -msse -Os -fomit-frame-pointer -pipe -fforce-addr -ffast-math -mpush-args -mfpmath=sse,387 -fschedule-insns2 -fmerge-all-constants)
i emailed GNU about -mmmx -mno-mmx and the sse ones...ill tell you what they say, im hoping its just an error in the output.
@all: still dont REALLY know if MMX and SSE really are IMPLIED by -march= because whats the point of -mmmx and -msse then?
defconfoo:
id like your input on these flags...i think they are the the best flags possible... they use defaults by cpu for all the settings not specified. What do you think of -ffast-math? If you think its okay because MANY MANY MANY use it for their whole system with no problems, then what do you think about Code: | -fno-math-errno
Do not set ERRNO after calling math functions that are executed with a single instruction, e.g., sqrt. A program that relies on IEEE exceptions for math error handling may want to use this flag for speed while maintaining IEEE arithmetic compatibility.
This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.
The default is -fmath-errno. |
errm just answered my question so you all know..
-ffast-math
Sets -fno-math-errno, -funsafe-math-optimizations,
-fno-trapping-math, -ffinite-math-only and
-fno-signaling-nans.
Thanks LAta! |
|
Back to top |
|
|
ph317 n00b
Joined: 02 Jun 2002 Posts: 43
|
Posted: Thu Jul 03, 2003 5:24 am Post subject: cache and alignment |
|
|
A few corrections to some misinfo above:
First off, L1 and L2 caches are seperate, even on athlons.
Also, those caches, whether they're 64, 128, 256, etc... are in Kbytes, not megabytes.
Last but not least, please don't go doing "-falign-functions=64" or any other crazy value like that. Sane values are things "4". It's just how many bytes to align the functions by so that jumps are efficient. jumps, memory access, etc or some processors is just more efficient when aligned on certain boundaries, usually something like 4 bytes, which has nothing to do with L1/L2 cache size. The only relation between -falign-XXXX=Y and cache sizes is that as you increase the alignment value for faster access, you leave gaps, which means the overall size of the code or data is larger and runs a greater statistical chance of causing cache misses by making things a little further apart. |
|
Back to top |
|
|
Kesereti Guru
Joined: 07 Nov 2002 Posts: 520
|
Posted: Fri Jul 04, 2003 12:06 am Post subject: This may be a silly question... |
|
|
But which runs of Athlon chips are T-Birds? ^_^ I'm rather confused about the naming conventions of AMD chips =P |
|
Back to top |
|
|
odegard Guru
Joined: 08 Mar 2003 Posts: 324 Location: Trondheim, NO
|
Posted: Fri Jul 04, 2003 8:21 pm Post subject: Re: cache and alignment |
|
|
ph317 wrote: | A few corrections to some misinfo above:
First off, L1 and L2 caches are seperate, even on athlons. |
Actually, *only* on athlons.
However, I was thinking, what is the bottleneck on modern computers? I/O. So why don't we optimize the code for smaller footprint than for faster execution? Lets be utterly simplistic and say that there are two variables: LOAD and EXECUTE. LOAD is far bigger than EXECUTE so in order to get a total boost, get LOAD down, even thought it may use longer time EXECUTING.
Agree/disagree? |
|
Back to top |
|
|
|