CFLAGS Central (Part 2)

Message

pv · Post by pv » Sat Dec 24, 2005 9:18 pm

geckosenator, I was greatly surprised after looking at your run-time test results! When I run some tests that were emerging bzip2 with different CFLAGS I noticed that -Os gives the fastest run while bzipping and an average one while unbzipping. Also I noticed -O3 is the worst flag for bzip2.

It seems that any test behaves differently on different processors/architectures. For example, I have Athlon XP. AMD manual recommends to align loops at the start of 16-bytes blocks of code that means execution of empty instructions that do nothing but requires processor resources for their execution. So, in my opinion, nobody should use -O2/-O3 without at least -fno-align-labels and -fno-align-loops on Athlon XP. According to the standard GCC documentation -Os does this and something more to decrease the executable size.

It seems that any test says NOTHING about REAL program performance. ALL the tests I've ever heard about don't recommend to switch between windows, move mouse, etc while running tests so they don't take into account multitasking. But where can you see a singletasking computer (server/desktop) today? I may run a test for a DAY AND A NIGHT, then build Gentoo with the best-performance cflags (given by this test), and then I'll see that the system was very faster and more responsible before I rebuild it.

So, in my opinion, when giving a test results, one should give:
1. The test itself, or the link for it, or some information on how to reproduce the results. Or you can just add something like 'I feel my computer is faster then before'. Sometimes the latter is more useful, sometimes the reverse.
2. The hardware the test was run on (the processor, RAM, may be swap size).
3. The compiler, kernel version etc.

geckosenator · Post by **geckosenator** » Sun Dec 25, 2005 4:26 pm

I used the time command to measure performance. It measures the cpu time the program used. For compiling, I simply did:
make clean
edit Makefile to change cflags
time make

I also used time to measure the speed of runtime programs.

I agree that every system is different, and every program is different, but the tests I did were on a common system, and I think they are applicable to a lot of people.

I personally don't care if my programs load 3% slower, because I won't be able to tell the difference. In a lot of cases, the bottleneck is not the cpu speed anyway. I might get some programs to run 3% faster if I tweak cflags individually for them, but I won't get this by putting -O2 in my make.conf.

I have found that the vast majority of programs are idle most of the time anyway, so optimizing them might just make them faster at doing nothing. If you are using -O2 instead of -O as a global cflag, I suggest you use -O and overclock your computer by 3% instead, this will definately make things go faster!

carpman · Post by **carpman** » Fri Dec 30, 2005 12:19 pm

Hello, ok building a system with Athlon-Tbird 1100 using GCC 3.4.5 and need little help on choosing some highly optimised (not insane) settings, i did find following on Romanian forum but as don't speak Romanian i could work out if they worked.

Code: Select all

CFLAGS="-O2 -march=athlon-tbird -pipe -fomit-frame-pointer -ftracer -ffast-math -fforce-addr -fprefetch-loop-arrays -falign-functions=64 -momit-leaf-frame-pointer"

CXXFLAGS="-O2 -march=athlon-tbird -pipe -fomit-frame-pointer -ftracer -ffast-math -fforce-addr -fprefetch-loop-arrays -falign-functions=64 -momit-leaf-frame-pointer -fvisibility-inlines-hidden"

LDFLAGS="-Wl,-O1 -Wl,--enable-new-dtags -Wl,--sort-common -s"

Have also seen -mno-sse used with this cpu, but not sure as believe this cpu only supports sse prefetch and not full sse!

pv · Post by pv » Fri Dec 30, 2005 10:07 pm

carpman wrote:highly optimised (not insane)

Highly optimized almost always means insane

.

My own opinion is the following:
1) LDFLAGS must be "". But I've been using Gentoo built several months ago with LDFLAGS="-s". Things like -O1 and --sort-common are dangerous.
2) I don't know anything about -fvisibility-inlines-hidden and I don't use it (note, I have gcc-3.3.5).
3) -O2 -march=athlon-tbird -pipe -fomit-frame-pointer certainly.
4) -ftracer -fforce-addr. I don't think these would improve the performance at all, maybe even they decrease it.
5) -momit-leaf-frame-pointer. Why? Don't you use -fomit-frame-pointer which is better?
6) -ffast-math. Any program using it does usually itself append it to CFLAGS. Moreover, this flag is known to break things.
7) -fprefetch-loop-arrays. I don't think it improves performance in all cases. Most probably it improves a few programs and just increases the code size in other ones. I've tried compiling bzip2 using the option. Unpacking was as fast as with CFLAGS="-Os +flags from 3)" and packing is 5-10% SLOWER than with the options from 3).
8 ) -falign-functions=64.

You've read AMD manual, didn't you? In spite of their recommendations to align code don't do it. I've tried many flags and ones aligning code aren't the best of them. Moreover, I recommend you use at least '-fno-align-labels -fno-align-loops', maybe also '-fno-align-functions -fno-align-jumps'.

Another important thing is that you use GENTOO and PORTAGE. Portage automatically strips insane flags depending on the package being emerged. For example, standard methods don't allow you to build GCC with -O3 or -Os. Only -O2. The same with OpenOffice and QT.

So, it seems that if you even use insane CFLAGS, portage strips those of them that break the package being emerged.

See also my post at http://forums.gentoo.org/viewtopic-p-29 ... ml#2920459 and codergeek42's answer.

carpman · Post by **carpman** » Fri Dec 30, 2005 10:47 pm

Thanks for reply, i am thinking i might try those below.

Note this is a Athlon Thunderbird NOT an XP

Code: Select all

CHOST="i686-pc-linux-gnu"
CFLAGS="-march=athlon-tbird"
#
CFLAGS="${CFLAGS} -O2"
CFLAGS="${CFLAGS} -pipe"
CFLAGS="${CFLAGS} -mno-sse"
CFLAGS="${CFLAGS} -frename-registers"
CFLAGS="${CFLAGS} -fforce-addr"
CFLAGS="${CFLAGS} -fomit-frame-pointer"
CFLAGS="${CFLAGS} -ftracer"
CFLAGS="${CFLAGS} -fprefetch-loop-arrays
CFLAGS="${CFLAGS} -fno-align-labels
CFLAGS="${CFLAGS} -fno-align-loops
CFLAGS="${CFLAGS} -fno-align-functions
CFLAGS="${CFLAGS} -fno-align-jumps

#
CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden"
#
LDFLAGS="-Wl,-O1 -Wl"

cheers

carpman · Post by **carpman** » Fri Dec 30, 2005 11:00 pm

Think i am going to have to ease up a bit as glibc fails to build

carpman · Post by **carpman** » Fri Dec 30, 2005 11:40 pm

ok found the culprit

Code: Select all

LDFLAGS="-Wl,-O1 -Wl"

can someone explain these a bit and would work on this cpu?

cheers

sylware · Post by **sylware** » Mon Jan 02, 2006 6:29 pm

Is there a better way to display the enables optimization flags than "gcc -v -Q"?
gcc source code is currently too dark for me.

nxsty · Post by **nxsty** » Mon Jan 02, 2006 8:49 pm

sylware wrote:Is there a better way to display the enables optimization flags than "gcc -v -Q"?
gcc source code is currently too dark for me.

You could read the manual.

http://gcc.gnu.org/onlinedocs/

nxsty · Post by **nxsty** » Mon Jan 02, 2006 8:50 pm

carpman wrote:ok found the culprit
Code: Select all
LDFLAGS="-Wl,-O1 -Wl"
can someone explain these a bit and would work on this cpu?

cheers

Try with just LDFLAGS="-Wl,-O1", why do you have that last -Wl?

sylware · Post by **sylware** » Mon Jan 02, 2006 10:10 pm

nxsty wrote:
sylware wrote:Is there a better way to display the enables optimization flags than "gcc -v -Q"?
gcc source code is currently too dark for me.
You could read the manual.

http://gcc.gnu.org/onlinedocs/

-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.

On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage.

Enabled at levels -O, -O2, -O3, -Os.

This information is wrong. Since you'll see that this optimization flag has to be expliclity used on the command line whatever is the optimization level in order to enable it (didn't check -O3 though

). Moreover is quite easier to have the list of enabled optimization technics with all the rules already computed than to try to predict it from the doc without being able to check if your prediction was correct.

Moreover if you check the doc about the -v -Q options, there is no mention of optimization flags. I Just sensed the info from the source code and got it from a mailing archive from.... the year 2000!!!

Since then, I put a lot of distance between reality and the documentation. I don't ignore the doc, I just don't see the accurate Truth in it.

oggialli · Post by **oggialli** » Tue Jan 03, 2006 10:23 am

It should say "enabled at levels - - unless it infers debugging on the target architecture." And that's correct, and that's actually how it is (like for almost all architectures except ix86).

nxsty · Post by **nxsty** » Tue Jan 03, 2006 12:45 pm

sylware wrote:This information is wrong. Since you'll see that this optimization flag has to be expliclity used on the command line whatever is the optimization level in order to enable it (didn't check -O3 though ). Moreover is quite easier to have the list of enabled optimization technics with all the rules already computed than to try to predict it from the doc without being able to check if your prediction was correct.
Moreover if you check the doc about the -v -Q options, there is no mention of optimization flags. I Just sensed the info from the source code and got it from a mailing archive from.... the year 2000!!!

Well it also says:

-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.

sylware · Post by **sylware** » Tue Jan 03, 2006 5:06 pm

oggialli wrote:It should say "enabled at levels - - unless it infers debugging on the target architecture." And that's correct, and that's actually how it is (like for almost all architectures except ix86).

Indeed... it should say...

nxsty wrote:Well it also says:
-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.

That's not the information we get in the paragraph dealing with the omit-frame-pointer optimiz.ation as oggialli noted.

My point with those gcc options is a way to check what optimizations are enabled.

rini17 · Post by **rini17** » Sun Jan 15, 2006 7:39 pm

My Almost Yet Unheard Of cflags for Athlon TBird 1400

Code: Select all

-O3 -march=athlon-tbird -finline-limit=300 -fomit-frame-pointer -fforce-addr -ftracer -momit-leaf-frame-pointer -pipe

I did some (almost statistically insignificant, however) research on "binary bloat" and it seems that -ftracer adds only ~1-2% to binary size. But -finline-functions (that is included in -O3 by default) adds up ~10% . When one restricts inlining to 300 byte functions with -finline-limit=300 the result seems to be only ~3% larger than with "-O3 -fno-inline-fucntions -fno-tracer". So hopefully the speed gain overweights size increase - yet it remains to be tested... But the original purpose was not primarily speed, but to make me feel better and to slow compiles for more convenient watching of emerge output, you know

.

pv · Post by pv » Sun Jan 15, 2006 8:08 pm

rini17 wrote:it seems that -ftracer adds only ~1-2% to binary size

My tests have also shown it increases execution time no more than by ~5% too, so you can use it without great performance loss

.

enderandrew · Post by **enderandrew** » Fri Jan 27, 2006 3:22 pm

What do people think of the following flags used with GCC 4.x?

-fno-default-inline
Do not make member functions inline by default merely because they are defined inside the class scope (C++ only). Otherwise, when you specify -O, member functions defined inside class scope are compiled inline by default; i.e., you don't need to add `inline' in front of the member function name.

With all the buzz about inlines and removing them from the kernel, I wondered if this might be a good thing.

-fweb
Constructs webs as commonly used for register allocation purposes and assign each web individual pseudo register. This allows the register allocation pass to operate on pseudos directly, but also strengthens several other optimization passes, such as CSE, loop optimizer and trivial dead code remover. It can, however, make debugging impossible, since variables will no longer stay in a home register.

Enabled at levels -O2, -O3, -Os, on targets where the default format for debugging information supports variable tracking.

This is in the Conrad guide, but I've never used it. It seems like -fomit-frame-pointer. It is implied by -O2, yet only enabled when it won't cause a problem with debugging.

Most of the time, it it builds correctly, I don't care about debugging. If it doesn't build, then I disable any suspect flags anyway. So is it a bad thing to really enable it?

-frename-registers - That is in the Conrad guide, but not the GCC 4.x manual.

Thoughts?

oggialli · Post by **oggialli** » Sat Jan 28, 2006 9:12 am

Register renaming is pretty useless for x86 due to the terrible lack of HW registers (and the invisible renaming of them already occurring in the CPU). For amd64 it's somewhat more useful but it really becomes useful when the CPU has a LOT of available architectural registers.

enderandrew · Post by **enderandrew** » Wed Feb 01, 2006 4:05 pm

What about -fno-default-inline?

Anyone with thoughts on the issue?

taylorpendley · Post by **taylorpendley** » Thu Feb 02, 2006 11:09 pm

lololololol

fastest CFLAGS for MOST (<keyword) packages are plain and simply:

-O2 -march=YOUR_ARCH -fomit-frame-pointer -pipe

Some (few) packages benefit from the "other" CFLAGS but i guarantee all of you who have all these whack crazy long lists of CFLAGS are getting slower compile times, slower program load times, etc. (remember the previous "keyword")

if you read there are MANY MANY cases where -Os compiled progs are much quicker because that have smaller footprint and on processors with little cache -Os is going to be quicker in most cases (again MOST cases...remember keyword)

Why do i say this??? Because when i first hopped on the Gentoo Bandwagon i too was interested in CFLAGS and read and read and read about any and all CFLAGS i could find. I was ricer like some of you and had a laundry list of these bad boys only to find out, one year later, that it was all for nothing because a simple, -march=k8 -O2 -fomit-frame-pointer -pipe, has been the fastest, most stable, less time consuming CFLAG set i have used.............

sorry for mentioning keyword so much but i know there are people that are going to skim it and start yelling at me defending their sunuffabish -frename-registers (hah!) and -fweb (haha!) which is why i stressed MOST of the time.

Thanks

pv · Post by pv » Thu Feb 02, 2006 11:17 pm

enderandrew wrote:What about -fno-default-inline?
Anyone with thoughts on the issue?

I've read somewhere (maybe in Stroustrup's book

) 'default-inline' is in ISO/IEC 14882 standard for C++ so applying the option in question you break the 'standard' behaviour. Although I don't know how it influence the performance.

randomeister · Post by **randomeister** » Sat Feb 04, 2006 12:49 pm

So what flags should I use for the new 64 bit Sempron? Does that depend on whether I'm in amd64 arch or the usual 32 bit x86?

Should I even consider using amd64? Perhaps the advantages are too small for the effort.

enderandrew · Post by **enderandrew** » Mon Feb 06, 2006 7:41 am

This is what I go with. It is fairly aggressive. You'll want to run with a script that lets you use a package.ldflags file to filter ldflags for certain packages.

-Bdirect requires a binutils patch, and --as-needed needs to be filter for certain packages. I don't think this is crazy aggressive, but it puts out decent results.

CHOST="x86_64-pc-linux-gnu"
CFLAGS="-march=athlon64 -O2 -pipe -fomit-frame-pointer -ftracer"
CXXFLAGS="${CFLAGS}"
CXXFLAGS="${CXXFLAGS} -ffriend-injection"
CXXFLAGS="${CXXFLAGS} -fvisibility-inlines-hidden"
LDFLAGS="-Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-Bdirect"

na641 · Post by **na641** » Mon Feb 06, 2006 9:40 am

randomeister wrote:So what flags should I use for the new 64 bit Sempron? Does that depend on whether I'm in amd64 arch or the usual 32 bit x86?

Should I even consider using amd64? Perhaps the advantages are too small for the effort.

Well the 'effort' really isn't that much greater than an x86 install nowadays. outside of a few things, you'll find that the process is really the same.

I also have a 64bit Sempron, and these are the settings i use:

Code: Select all

CFLAGS="-O2 -march=athlon64 -msse3 -pipe -fomit-frame-pointer -fno-align-labels -fno-align-loops"
CHOST="x86_64-pc-linux-gnu"
CXXFLAGS="${CFLAGS}"

Not extreme, to say the least. i have added the -fno-align-labels and -fno-align-loops because of cache consideration. The 64bit sempron has the same amount of L1/2 cache as the athlon-xp. These two flags help since the amount of cache is minimal when compared to that of a athlon 64. I consider it the best of -O2 and -Os. From my own experience it really does make a noticeable different in app start up time and general responsiveness.

randomeister · Post by **randomeister** » Mon Feb 06, 2006 6:35 pm

Thanks a lot fellows! I really appreciate the efforts people here put in to helping other users!

I'm trying the amd64 arch now, with na641's flags. We'll see how it goes, but I guess I've got nothing to compare to in terms of performance. I've read other Linux forums, where people claim using 64 bit support doesn't improve that much on a Sempron64. I don't know what to believe, but it's a cool experience to have set up an amd64 gentoo system!

CFLAGS Central (Part 2)

Athlon-Tbird

Re: Athlon-Tbird

May help.

Re: May help.

Re: May help.

Re: May help.