View previous topic :: View next topic |
Author |
Message |
ArchiCAT n00b
Joined: 07 Jan 2005 Posts: 14
|
Posted: Sun Apr 03, 2005 5:31 am Post subject: branch target load opt ...not intended to be run twice |
|
|
Hi hope that it's not a repost.
My CFLAGS:
Code: |
CFLAGS="-march=pentium3 -mtune=pentium3 -mfpmath=sse,387 -mmmx -msse -minline-all-stringops -pipe -O3 -fomit-frame-pointer -fforce-addr -finline-functions -finline-limit=800 -fmove-all-movables -freduce-all-givs -freorder-blocks -freorder-functions -fexpensive-optimizations -falign-functions -falign-labels -falign-loops -falign-jumps -frename-registers -fweb -funit-at-a-time -fbranch-target-load-optimize -fbranch-target-load-optimize2"
|
It's adapted from a wiki about cflags. I can compile with it, but with a lot of warnings:
"warning: branch target load optimization is not intended to be run twice"
I believe it's due to the last 2 options. Is it normal or not? |
|
Back to top |
|
|
moocha Watchman
Joined: 21 Oct 2003 Posts: 5722
|
Posted: Sun Apr 03, 2005 6:05 am Post subject: |
|
|
Oh.
My.
Sweet.
Lord.
Please, for the love of $deity, don't do something like that to your system.
Use a sane set of CFLAGS. Don't believe stories about "h0w 1ncred1bly f4st my system 1s, d00d" with a kilometric insane set of CFLAGS unless you can reasonably trust the user to know what the hell (s)he is talking about. The more aggressive your CFLAGS, the more likely it is they will actually cause a slowdown due to code bloat, which leads to processor cache trashing. Also, the more aggressive your CFLAGS, the more likely it is that your system will malfunction, since there are enough optimization flags that produce incorrect machine code on one or more packages. Also, more optimizations cause a sometimes dramatic increase in compiling time.
For example, -frename-registers is buggy (occasionally produces broken code), it's utterly pointless on 32-bit Intel and AMD architectures since those don't have enough registers to make it worth it anyway, and increases compile times by 10-15%. -fweb increases compile time and doesn't produce measurable improvements. A lot of the flags you listed there are already implied in -O2, and if they're filtered by an ebuild they're usually filtered for a damn good reason (they break things on that package). Etc etc.
For a P3 system I recommend these flags (and please trust me on this, I did own Pentium 3 systems, both single- and multiprocessor): Code: | CFLAGS="-march=pentium3 -mtune=pentium3 -O2 -fomit-frame-pointer -momit-leaf-frame-pointer -fno-ident -pipe"
CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden" | This will result in a nice, responsive, and stable system. Most of the people telling you that there's a huge difference between -O2 and -O3 are either bullshitting or are deluding themselves, perceiving a speed increase just because they expect to perceive one.
If you really, really, really just want to play around with the system, use -O3 instead of my recommended -O2 for an overall speed increase of somewhere around 3-4% (not noticeable by a human being, just by benchmarks) at the cost of increasing compile time by around 30%.
Optimization is a very subtle art, and it's extremely easy to shoot yourself in the foot and anti-optimize things. Not to mention that some flags that are good for a specific package can be extremely bad for another package. Not to mention that raw speed and system responsiveness are two very different things.
For a lot of nice flamage relating to CFLAGS, check the CFLAGS Central thread.
Edit: Added CXXFLAGS.
Please also note that -mtune and -fvisibility-inlines-hidden are only understood by GCC 3.4, so if you should switch back to 3.3 you have to replace -mtune by -mcpu, and delete -fvisibility-inlines-hidden (leaving CXXFLAGS="${CFLAGS}"). _________________ Military Commissions Act of 2006: http://tinyurl.com/jrcto
"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin |
|
Back to top |
|
|
rhill Retired Dev
Joined: 22 Oct 2004 Posts: 1629 Location: sk.ca
|
Posted: Sun Apr 03, 2005 6:21 am Post subject: |
|
|
i vote this post be stickied somewhere. _________________ by design, by neglect
for a fact or just for effect |
|
Back to top |
|
|
ArchiCAT n00b
Joined: 07 Jan 2005 Posts: 14
|
Posted: Sun Apr 03, 2005 6:24 am Post subject: |
|
|
Thanks for your information! I will take it into deep consideration.
I found no information of "-fno-ident" even in the gcc doc. But I googled it and found something.
Do I need to rebuild glibc with "-fno-dent" before rebuilding other packages? Thanks. |
|
Back to top |
|
|
moocha Watchman
Joined: 21 Oct 2003 Posts: 5722
|
Posted: Sun Apr 03, 2005 6:29 am Post subject: |
|
|
-fno-ident is the opposite of -fident (it turns it off). -fident is enabled by default. It's not a code optimization flag. -fno-ident just causes GCC not to inset those stupid useless repeated version strings caused by autogenerated #ident directives into the object files it produces. Doesn't affect the behavior of any program at all, but it saves a few hundred bytes for every executable and library on the system. It adds up to a few megabytes on a Gentoo desktop install, and it definitely doesn't hurt anything.
It's in the GCC docs (info pages, under Invocation -> Code Generation), or check here: http://gcc.gnu.org/onlinedocs/gcc-3.4.3/gcc/Code-Gen-Options.html
Edit: More explicit info here, I'm too lazy to type it all in : http://www.trilithium.com/johan/2004/12/gcc-ident-strings/And no, absolutely no need to rebuild anything just because you added that one option. Add it now and forget about it, it will get phased into the system over time anyway, as packages get upgraded and rebuilt. _________________ Military Commissions Act of 2006: http://tinyurl.com/jrcto
"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin
Last edited by moocha on Sun Apr 03, 2005 6:34 am; edited 1 time in total |
|
Back to top |
|
|
ArchiCAT n00b
Joined: 07 Jan 2005 Posts: 14
|
Posted: Sun Apr 03, 2005 6:34 am Post subject: |
|
|
moocha wrote: | -fno-ident is the opposite of -fident (it turns it off). -fident is enabled by default. It's not a code optimization flag. -fno-ident just causes GCC not to inset those stupid useless repeated version strings caused by autogenerated #ident directives into the object files it produces. Doesn't affect the behavior of any program at all, but it saves a few hundred bytes for every executable and library on the system. It adds up to a few megabytes on a Gentoo desktop install, and it definitely doesn't hurt anything.
It's in the GCC docs (info pages, under Invocation -> Code Generation), or check here: http://gcc.gnu.org/onlinedocs/gcc-3.4.3/gcc/Code-Gen-Options.html
Edit: More explicit info here, I'm too lazy to type it all in : http://www.trilithium.com/johan/2004/12/gcc-ident-strings/ |
Really thanks. I have pressed ctrl-C already. |
|
Back to top |
|
|
moocha Watchman
Joined: 21 Oct 2003 Posts: 5722
|
Posted: Sun Apr 03, 2005 8:12 am Post subject: |
|
|
dirtyepic wrote: | i vote this post be stickied somewhere. | Could probably be done but it's possible that it won't have the desired effect, sadly. Been wanting to write an article / FAQ debunking the common ricer myths for some time now (it would certainly save skin off my fingertips since I'm tired of retyping the same things all over every so often), but whenever I find some spare time and am about to do it I can't help thinking that it'll probably just start a flamewar and generate more support calls, so I give up before starting. _________________ Military Commissions Act of 2006: http://tinyurl.com/jrcto
"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin |
|
Back to top |
|
|
rhill Retired Dev
Joined: 22 Oct 2004 Posts: 1629 Location: sk.ca
|
Posted: Sun Apr 03, 2005 8:31 am Post subject: |
|
|
heh, you're probably right. i had the exact same reaction as your first four lines when i came across this. i started to reply but just didn't know where to begin. but you covered it pretty thoroughly, and i even ended up learning about ident strings tonight, so thank you for a great post. _________________ by design, by neglect
for a fact or just for effect |
|
Back to top |
|
|
moocha Watchman
Joined: 21 Oct 2003 Posts: 5722
|
Posted: Fri Apr 15, 2005 4:46 pm Post subject: |
|
|
Since I've been referring others to this thread and there have been requests, here are the C[XX]FLAGS I recommend for an Athlon XP system: Code: | CFLAGS="-march=athlon-xp -O2 -fomit-frame-pointer -momit-leaf-frame-pointer -fno-ident -pipe"
CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden" |
Edit: -fvisibility-inlines-hidden only works on GCC 3.4! If you use GCC 3.3, you want Code: | CXXFLAGS="${CFLAGS}" |
_________________ Military Commissions Act of 2006: http://tinyurl.com/jrcto
"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin
Last edited by moocha on Fri Apr 15, 2005 6:08 pm; edited 1 time in total |
|
Back to top |
|
|
Drooling Iguana Tux's lil' helper
Joined: 07 Apr 2004 Posts: 94 Location: Sector ZZ9 Plural Z Alpha
|
Posted: Fri Apr 15, 2005 5:51 pm Post subject: |
|
|
Crosspost from this thread (where it may not have been appropriate):
I wrote: | On another note, would "-O3 -march=athlon-xp -mcpu=athlon-xp -pipe -m3dnow -m128bit-long-double -mfpmath=sse -mmmx -msse -msse2" be a reasonable set of CFLAGS for an Athlon-XP system? |
|
|
Back to top |
|
|
moocha Watchman
Joined: 21 Oct 2003 Posts: 5722
|
Posted: Fri Apr 15, 2005 6:00 pm Post subject: |
|
|
Drooling Iguana wrote: | Crosspost from this thread (where it may not have been appropriate):
I wrote: | On another note, would "-O3 -march=athlon-xp -mcpu=athlon-xp -pipe -m3dnow -m128bit-long-double -mfpmath=sse -mmmx -msse -msse2" be a reasonable set of CFLAGS for an Athlon-XP system? |
| -m3dnow -mmmx -msse are redundant, provide no benefit at all, and have the potential to cause trouble. Get rid of them.
-msse2 is plain wrong since the Athlon XP doesn't have SSE2 support and will cause binaries to crash. Imperatively get rid of it.
-m128bit-long-double will create a lot of trouble with binary only or precompiled software and will make no difference perceivable by a human being. Get rid of it.
-mfpmath=sse will most likely slow FPU calculations down, but your mileage may vary. If you're not sure that it will speed things up, get rid of it. You won't notice the difference unless you run some heavy-duty scientific software applications.
Generally speaking, unless I'd have a very very good reason, I'd stay away from all the -m options except -march, -mcpu/-mtune and -momit-leaf-frame-pointer. It's much easier to mess things up with them than to get positive results, and it's very unlikely you'll ever feel the difference anyway. So little to no benefit and moderate to high risks - I think that speaks for itself .
Add -fomit-frame-pointer -momit-leaf-frame-pointer. These aren't enabled by default on an Athlon XP, don't mess up anything, don't slow down compiling, and provide some speed boost (not very much, but still).
Add -fno-ident which will save some space in the resulting binaries and has no ill side effects whatsoever.
-O3 may or may not be a good idea. My recommendation is not to use -O3 globally, since the resulting speedup isn't that much, the increase in compile time is big, and it enables -frename-registers, which is pretty iffy. I'd stick with -O2.
The resulting set will be exactly what I recommended above (except for the added -mcpu, which is implied by -march but is safe to leave in).
Edit: Note: I recommend you get rid of the -mmmx -m3dnow -msse CFLAGS (and obviously -msse2 which isn't appropiate for an Athlon XP). However, please do enable the mmx 3dnow sse USE flags (in the USE variable). Those are an entirely different matter altogether and I heartily recommend them.
Edit 2: If you use GCC 3.4, add -fvisibility-inlines-hidden to your CXXFLAGS, otherwise (if you use GCC 3.3, which is the default in 2005.0) you want Code: | CXXFLAGS="${CFLAGS}" |
_________________ Military Commissions Act of 2006: http://tinyurl.com/jrcto
"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin |
|
Back to top |
|
|
Jazz Guru
Joined: 16 Nov 2003 Posts: 543 Location: Melbourne, Australia
|
Posted: Thu Jun 02, 2005 9:02 am Post subject: |
|
|
Umm.. cool, so are :
CFLAGS="-Os -march=pentium4 -mtune=pentium4 -mfpmath=sse -mmmx -msse -msse2 -ftracer -fweb -ffast-math -fno-ident -pipe"
CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden"
a good set of flags for a p4 non-hypertheading cpu system wid 512 ram ??
if u check http://gentoo-wiki.com/CFLAGS it mentions:
-fomit-frame-pointer
This flag is very good if you are concerned mainly with execution times. However, binary size may increase, sometimes by up to as much as %30.
So is it still recommended on desktop systems ? also i have no idea as in what the -fweb does, and how did i get in der in da first place.. lol. should i get rid of it ?
Also, what should be the major components i should recompile after heavily modifying my flags ? i need to recompile da entire kde 3.4.1 for sure, anything else i might wanna consider ? lol, on that whats the command to recompile only the existing installed kde 3.4.1 packages ?
Thanx heaps..
Jazz _________________ In 2010, M$ Windows will be a quantum processing emulation layer for a 128-bit mod of a 64-bit hack of a 32-bit patch to a 16-bit GUI for an 8-bit operating system written for a 4-bit processor from a 2-bit company that can't stand 1 bit of competition. |
|
Back to top |
|
|
Jazz Guru
Joined: 16 Nov 2003 Posts: 543 Location: Melbourne, Australia
|
Posted: Sat Jun 04, 2005 11:05 am Post subject: |
|
|
bump _________________ In 2010, M$ Windows will be a quantum processing emulation layer for a 128-bit mod of a 64-bit hack of a 32-bit patch to a 16-bit GUI for an 8-bit operating system written for a 4-bit processor from a 2-bit company that can't stand 1 bit of competition. |
|
Back to top |
|
|
moocha Watchman
Joined: 21 Oct 2003 Posts: 5722
|
Posted: Sat Jun 04, 2005 3:14 pm Post subject: |
|
|
Have you even read the thread? _________________ Military Commissions Act of 2006: http://tinyurl.com/jrcto
"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin |
|
Back to top |
|
|
nxsty Veteran
Joined: 23 Jun 2004 Posts: 1556 Location: .se
|
Posted: Sat Jun 04, 2005 3:32 pm Post subject: |
|
|
Jazz wrote: | Umm.. cool, so are :
CFLAGS="-Os -march=pentium4 -mtune=pentium4 -mfpmath=sse -mmmx -msse -msse2 -ftracer -fweb -ffast-math -fno-ident -pipe"
CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden"
a good set of flags for a p4 non-hypertheading cpu system wid 512 ram ?? |
Get rid of -mfpmath=sse, it's currently useless.
-ftracer bloats code a bit so I suggest that you remove it too.
-fweb makes compilation times longer and doesen't do much good, but at least it doesen't have any negative effects on the ouput binary except perhaps slightly larger size. Remove it or keep it, decide yourself.
-ffast-math can have really good effect on audio/video decoders and such but you shouldn't use it gloably in the CFLAGS. Instead add it only when you compile anything that might benefit from it.
Jazz wrote: | if u check http://gentoo-wiki.com/CFLAGS it mentions:
-fomit-frame-pointer
This flag is very good if you are concerned mainly with execution times. However, binary size may increase, sometimes by up to as much as %30. |
I would really like to know what code it is that grows 30% with fomit-frame-pointer. The binary might become slightly larger with it on x86 but the difference is usually small. |
|
Back to top |
|
|
Dr.Dran l33t
Joined: 08 Oct 2004 Posts: 766 Location: Imola - Italy
|
Posted: Sat Jun 04, 2005 9:49 pm Post subject: |
|
|
@moocha
Code: |
`-O' also turns on `-fomit-frame-pointer' on machines where doing so does not interfere with debugging.
|
I take that word from the man ggc (versiona 3.3.5) and I'm a little in trouble, because I think that enable -fomit-frame-pointer in the CFLAGS is not necessary if that option is turned on by -O, -O2, -O3, -Os options... Can you give me an answer? |
|
Back to top |
|
|
desertstalker Apprentice
Joined: 18 Sep 2004 Posts: 209
|
Posted: Sun Jun 05, 2005 2:50 am Post subject: |
|
|
On x86 -fomit-frame-pointer does interfere with debugging so you need to add it explicidly. It is not implied by -Ox |
|
Back to top |
|
|
moocha Watchman
Joined: 21 Oct 2003 Posts: 5722
|
Posted: Sun Jun 05, 2005 4:20 am Post subject: |
|
|
DranXXX wrote: | @moocha
Code: |
`-O' also turns on `-fomit-frame-pointer' on machines where doing so does not interfere with debugging.
|
I take that word from the man ggc (versiona 3.3.5) and I'm a little in trouble, because I think that enable -fomit-frame-pointer in the CFLAGS is not necessary if that option is turned on by -O, -O2, -O3, -Os options... Can you give me an answer? | As desertstalker pointed out, it's not implied by any -O flag on x86. You can also add -momit-leaf-frame-pointer.
Nevertheless, please do not crosspost in multiple threads - that just confoozles the search function, which more or less sucks as it is anyway . _________________ Military Commissions Act of 2006: http://tinyurl.com/jrcto
"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin |
|
Back to top |
|
|
Dr.Dran l33t
Joined: 08 Oct 2004 Posts: 766 Location: Imola - Italy
|
Posted: Sun Jun 05, 2005 7:58 am Post subject: |
|
|
@moocha
Sorry!
Excuse me for my crosspost and my bad english!
Now I have a very clean answer, because in the Italian forum there is a very ambiguos point o view on that option, everyone put it in the CFLAGS but no one knows what that option really do, and I search as "desertstalker" on the gcc manpages and I find that description that make me in trouble
By the way now I will show the CFLAGS configuration that I have in mind to do:
Code: | CFLAGS=-Os -march=athlon-mp -fomit-frame-pointer -momit-leaf-frame-pointer -fno-ident -fforce-addr -funroll-loops -pipe |
On the gcc manpage I have found another option that I think may be usefull:
Code: | -Os = Optimize for size. `-Os' enables all `-O2' optimizations that do not typically increase code size.
It also performs further optimizations designed to reduce code size. -Os disables the following
optimization flags:
-falign-functions
-falign-jumps
-falign-loops
-falign-labels
-freorder-blocks
-fprefetch-loop-arrays
If you use multiple `-O' options, with or without level numbers, the last such option is the one that is effective.
-fforce-addr = Force memory address constants to be copied into registers before doing arithmetic on them.
This may produce better code just as -fforce-mem may.
-funroll-loops = Unroll loops whose number of iterations can be determined at compile time or upon entry
to the loop. -funroll-loops implies both -fstrength-reduce and -frerun-cse-after-loop.
This option makes code larger, and may or may not make it run faster. |
I think that -Os produce smaller files, and smaller file will be load in memory faster than larger one's
But I'm non sure for the funroll-loops option... Do you know if that option produce some benefits?
On the Manual: Securing & Optimizing LInux: The Hacking Solution of Gerhard Mourani hes write:
"...The funroll-loops optimization otion will perform the optimization of loop unrolling and will do it only for
loops whose number of iterations can be determined at compile time..."
I thanx you very very much for the answers, and by the way have a good hacking time
P.S. Next Time I will guess you for the CXXFLAGS and the LDFLAGS options Thanx |
|
Back to top |
|
|
nxsty Veteran
Joined: 23 Jun 2004 Posts: 1556 Location: .se
|
Posted: Sun Jun 05, 2005 11:16 am Post subject: |
|
|
DranXXX wrote: | I think that -Os produce smaller files, and smaller file will be load in memory faster than larger one's
But I'm non sure for the funroll-loops option... Do you know if that option produce some benefits? |
That's correct about -Os but the combination of -Os and -funroll-loops is useless. -funroll-loops bloats your binaries a lot, so when they are compiled with -Os and -funroll-loops they will be both larger and slower than with just -O2. |
|
Back to top |
|
|
moocha Watchman
Joined: 21 Oct 2003 Posts: 5722
|
Posted: Sun Jun 05, 2005 12:22 pm Post subject: |
|
|
Also, I recommend people stay away from -Os. It's not appropiate for a normal desktop system. Stick with what I recommended above - there are very good reasons I recommend those values, as I pointed out... _________________ Military Commissions Act of 2006: http://tinyurl.com/jrcto
"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin |
|
Back to top |
|
|
Dr.Dran l33t
Joined: 08 Oct 2004 Posts: 766 Location: Imola - Italy
|
Posted: Sun Jun 05, 2005 12:24 pm Post subject: |
|
|
Thanx nxsty!
If I cut off the -funroll-loops option I can compile my gentoo and I'm not loose the optimization of the -Os optimization...
But do yuo think that mantain the -O2 optimization I gain more advantage of stable and faster code than -Os?
Than for the suggestionz and Have a Good Hacking Too!!
EDIT: Thanx moocha again eh eh eh you have answer at my question! But in which case the Os option is the best?
Excuse me but I'm a noob of the configuration of CFLAGS and CXXFLAGS LDFLAGS |
|
Back to top |
|
|
moocha Watchman
Joined: 21 Oct 2003 Posts: 5722
|
Posted: Sun Jun 05, 2005 4:06 pm Post subject: |
|
|
DranXXX wrote: | EDIT: Thanx moocha again eh eh eh you have answer at my question! But in which case the Os option is the best? | Embedded systems, old systems with little level 2 cache (such as Pentium, Pentium II, K6) and true SMP systems (multiprocessor and dual-core, but not the pseudo-multicore bastardization called HyperThreading). On normal, modern desktops, -Os will result in a slower and less responsive system than -O2. _________________ Military Commissions Act of 2006: http://tinyurl.com/jrcto
"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."
-- attributed to Benjamin Franklin |
|
Back to top |
|
|
Dr.Dran l33t
Joined: 08 Oct 2004 Posts: 766 Location: Imola - Italy
|
Posted: Sun Jun 05, 2005 4:34 pm Post subject: |
|
|
WOW in that case I have an Dual Athlon-MP workstation and I may utilize the -Os option
Thanx a lot for the suggestion |
|
Back to top |
|
|
Alejandro Nova n00b
Joined: 08 Sep 2004 Posts: 50
|
Posted: Mon Jun 06, 2005 2:46 pm Post subject: |
|
|
I have read all the posts about CFLAGS, and ended like this (for an old Athlon Thunderbird running at 1.33 GHz)
Code: |
CFLAGS="-O2 -finline-functions -fweb -march=athlon-tbird -fomit-frame-pointer -momit-leaf-frame-pointer -fno-ident -pipe -fforce-addr"
|
The "-O2 -finline-functions -fweb" section is instead of -O3, to get rid of -frename-registers, a counter-optimisation from what I have read.
However, I don't know about how good is to use -fforce-addr. I had always put there, it has caused no trouble for me, my system is rock-solid stable, and the GCC man page says that -fforce-addr produces better code. Is it good? _________________ Becoming someone beautiful, through my music, my silent devotion...
Alejandro Nova™. |
|
Back to top |
|
|
|