-O3 Optimizations - Implications on portage:

nelsonwcf · Tux's lil' helper Joined: 31 Oct 2012 Posts: 112

Hi,

I'm looking for a list of packages that work/doesn't work with -O3 at the Gentoo documentation but I couldn't find anything. The only reference I could find was on the Gentoo Handbook mentioning that using -O3 is not a good idea as some packages will have problems. However, in my ARM Banana Pi, my CFLAGS have -O3 and I have been using it for more than a year without any implications. Are there any information sources available on this subject, but specific to Gentoo?

Thank you,
Nelson

eccerr0r · Posted: Thu Mar 16, 2017 5:34 am Post subject:

Technically programs that compile incorrectly with -O3 is a gcc bug.
However as a lot of the -O3 are experimental, it can change from version to version, and if things are stable, these optimizations will go to -O2.

I'd treat using -O3 the equivalent of using ~arch ... Assumed unstable but likely will work.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

Akkara · Posted: Thu Mar 16, 2017 8:10 am Post subject:

I don't recommend -O3 globally.

It often causes immense code-bloat as it loop-unrolls anything it can, vectorizing anything it can -- while at the same time keeping a copy of the scalar code and dynamically picking which one to use each time thru because it generally can't prove that the vectors will be aligned properly or that the iteration count will be a even multiple of the vector length.

... and topping it off, usually the only loops that are hyper-optimized in this way, are initialization loops. Those tend to be the only ones simple enough for its heuristics to find something.

In the end, it often makes things slower because the reduced effectiveness of the cache swamps any performance benefit it might have otherwise achieved.

However, it can be an excellent flag to use on a per-file basis: after you've profiled the code, found the hot-spots; improved the algorithms as much as possible; reduced the data inter-dependencies as much as possible; peppered your argument lists with "restrict"s to indicate which pointers never alias with any other; attached __attributes(...) to indicate buffer alignments (and allocated the buffers for maximally friendly alignment); sprinkled whatever #pragmas further assists conveying your intentions ... after all that, -O3, used when compiling the files thusly blessed (and only those files), can be an invaluable asset to getting a nice performance boost.

Try it for yourself: Use 'objdump' to look at the '.o's after the compilation stage has finished, or, better yet, pass -S as part of CFLAGS and inspect the generated assembler and compare the -O -O2 and -O3 versions. (But don't expect emerge to complete successfully if you add it to make.conf

)
_________________
Many think that Dilbert is a comic. Unfortunately it is a documentary.

NeddySeagoon · Posted: Thu Mar 16, 2017 9:40 am Post subject:

nelsonwcf,

-O3, if it works can produce slower code than -O2 or -Os.
The compiler makes the code bigger to eliminate instructions, particularly branch instructions that add nothing to solving the problem.
This bigger code no longer fits into the CPU cache which increases cache evictions, cache misses and fetches from much slower main memory.
As a result of this 'cache thrashing' execution slows down.

ARM CPUs are not noted for huge CPU caches, so a global -O3 is probably counter productive.
A few apps may benefit but the only way to find out is to compare -O2, -O3 and -Os.

nelsonwcf · Tux's lil' helper Joined: 31 Oct 2012 Posts: 112

Very nice answers, guys. Thank you very much. Will test changing the -O3 to -O2 in my ARMv7.

As an additional but related question, is it worth changing from gcc to icc in Gentoo (not for ARM, obviously)? Since the main benefit from Gentoo is to have the packages optimized to your system, I'm guessing that it would be possible to get an additional boost by using icc. Is my assumption correct?

Thank you again!

Yamakuzure · Posted: Thu Mar 16, 2017 3:34 pm Post subject:

According to https://software.intel.com/en-us/forums/intel-c-compiler/topic/327585 the answer is no.

Generally speaking, you can get real speed gains using icc if, and only if, the source code is written in the right way.

However, here are some real numbers:
http://insights.dice.com/2013/11/04/speed-test-comparing-intel-c-gnu-c-and-llvm-clang-compilers/
_________________
Important German:

"Aha" - German reaction to pretend that you are really interested while giving no f*ck.
"Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.

nelsonwcf · Tux's lil' helper Joined: 31 Oct 2012 Posts: 112

Hi Yamakuzure,

I've saw these posts as well but they are old and consider only punctual applications. I'm looking for some insight on more current versions and using it as the general compiler in portage from Gentoo. Obviously, not all packages can be compiled with icc due to different "dialects" of C (this is especially try for the GLIBC and GCC).

If fact, if it was possible to set icc as the general compiler but force portage to use gcc on a package basis (or the other way around), that would be a great solution. However, I don't know if there is any simple way to do that in Gentoo, reason I'm looking for an updated Gentoo packages that are known to work/don't work with icc. Newer benchmarks are also useful, but I couldn't find any.

Thank you again.

Drone4four · Apprentice Joined: 09 May 2006 Posts: 247

Now if only we add a compiler which used GPUs instead of CPUs, then we could put to good use the 3584 cuda cores potentially at our disposal. I wonder how long a GPU based compiler would build the linux kernel or the Gnome DE.

What an unrealistic fantasy! har har
_________________
My rig:
IBM Personal System/2 Model 30-286 - - Intel 80286 (16 bit) 10 Mhz - - 1MB DRAM - - Integrated VGA Display adapter
1.44MB capacity Floppy Disk - - PS/2 keyboard (no mouse)

axl · Veteran Joined: 11 Oct 2002 Posts: 1144 Location: Romania

-O3 is not as experimental as it use to be back when gcc was age 2.

these stories, it's funny. it's like the chinese whispers game. in my country it's called the telephone without the wire game.

https://en.wikipedia.org/wiki/Chinese_whispers

some packages will not compile with -O3. right now, chromium is the only one i know.

and yes, binaries will be phater. bigger. and it's a really bad idea for an arm platform.

it only makes sense when you have a fast / or /usr storage. like an m2 ssd.

eccerr0r · Posted: Sun Mar 19, 2017 6:44 am Post subject:

Well, if the compilation (or runtime speed) breaks with -O3, wouldn't that mean it's not quite prime time and thus "experimental"? Not only the cache size hit, -O3 may generate very slow code sequences for x86 too; no way to tell without trying (or knowing what your code is and what gcc does with the code).

Until the day gcc can automatically tell what optimizations are best during static code analysis and always generate the fastest/smallest/code compatible with anything, the optimizations in -O3 are just experimental - experiment with it, it could go one way or the other, and badly.

To be safe for most cases, simply use -O2 - where the gcc developers deem the optimizations tend to not cause worst case behavior. YMMV.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

Akkara · Posted: Sun Mar 19, 2017 8:06 am Post subject:

Ant P. · Watchman Joined: 18 Apr 2009 Posts: 6920

Autoconf is pretty awful. Is confcache still maintained nowadays? I don't see it in portage any more.

Mind you, Portage itself can be just as bad at times... that "resolving dependencies" spinner is often 50% of the time spent installing single packages for me.

Roman_Gruber · Posted: Sun Mar 19, 2017 6:35 pm Post subject:

Did not some ebuilds remove any bad optimizations?

I use this for quite a while

frostschutz · Advocate Joined: 22 Feb 2005 Posts: 2977 Location: Germany

march native pretty much eliminated the necessity for custom CFLAGS

used to be you had to look up the correct safe cflags for your processor, now the compiler does it for you. yay.

O3 makes for slower binaries, sometimes. I once had a broken system like that.