Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
LTO: We are almost there
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3, 4, 5, 6, 7  Next  
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Sun Oct 09, 2016 6:19 pm    Post subject: LTO: We are almost there Reply with quote

Using gcc-6.2.0, glibc-2.24 and binutils-2.27 is quite easy to compile firefox, thunderbird, webkit-gtk and libreoffice with Link Time Optimization.
I did not try with lower versions of toolchain, but it is possible that they will do the same.

Code:
AR="/usr/bin/gcc-ar"
NM="/usr/bin/gcc-nm"
RANLIB="/usr/bin/gcc-ranlib"

FLTO="-flto=8 -fuse-linker-plugin -fno-fat-lto-objects"
FGRAPHITE="-fgraphite-identity -floop-interchange -floop-strip-mine -floop-block"
CFLAGS="-O2 -pipe -march=native -mtune=skylake -w ${FLTO} ${FGRAPHITE}"
CXXFLAGS="${CFLAGS} -fno-delete-null-pointer-checks -flifetime-dse=1"
LDFLAGS="-Wl,-O1,--sort-common,--hash-style=gnu,--as-needed,-z,now ${CXXFLAGS}"


Firefox and thunderbird require a small patch from firefox bugzilla. It easy to adapt it to thunderbird (prefix mozilla/ to path):
Code:
diff -Naur a/ipc/app/moz.build b/ipc/app/moz.build
--- a/ipc/app/moz.build   2016-06-01 12:11:45.000000000 +0800
+++ b/ipc/app/moz.build   2016-06-30 13:38:44.418231590 +0800
@@ -85,7 +85,7 @@
     # from the function using it which breaks the build.  Work around that by
     # forcing there to be only one partition.
     if '-flto' in CONFIG['OS_CXXFLAGS'] and not CONFIG['CLANG_CXX']:
-        LDFLAGS += ['--param lto-partitions=1']
+        LDFLAGS += ['--lto-partition=one']
 
 if CONFIG['MOZ_SANDBOX'] and CONFIG['OS_TARGET'] == 'Darwin':
     # For sandbox includes and the include dependencies those have
diff -Naur a/ipc/app/pie/moz.build b/ipc/app/pie/moz.build
--- a/ipc/app/pie/moz.build   2016-05-13 01:13:13.000000000 +0800
+++ b/ipc/app/pie/moz.build   2016-06-30 13:38:31.791619842 +0800
@@ -25,7 +25,7 @@
     # from the function using it which breaks the build.  Work around that by
     # forcing there to be only one partition.
     if '-flto' in CONFIG['OS_CXXFLAGS'] and not CONFIG['CLANG_CXX']:
-   LDFLAGS += ['--param lto-partitions=1']
+   LDFLAGS += ['--lto-partition=one']
 
 LDFLAGS += ['-pie']
 
diff -Naur a/security/sandbox/linux/moz.build b/security/sandbox/linux/moz.build
--- a/security/sandbox/linux/moz.build   2016-06-01 12:11:46.000000000 +0800
+++ b/security/sandbox/linux/moz.build   2016-06-30 13:38:52.561530457 +0800
@@ -79,7 +79,7 @@
 # from the function using it which breaks the build.  Work around that by
 # forcing there to be only one partition.
 if '-flto' in CONFIG['OS_CXXFLAGS'] and not CONFIG['CLANG_CXX']:
-    LDFLAGS += ['--param lto-partitions=1']
+    LDFLAGS += ['--lto-partition=one']
 
 DEFINES['NS_NO_XPCOM'] = True
 DISABLE_STL_WRAPPING = True


webkit-gtk just disable lto for a portion of it. That portion (llint) can not be compiled, yet, with lto:
Code:
diff --git a/Source/JavaScriptCore/llint/LowLevelInterpreter.cpp b/Source/JavaScriptCore/llint/LowLevelInterpreter.cpp
--- a/Source/JavaScriptCore/llint/LowLevelInterpreter.cpp   2016-01-14 21:20:42.304902905 +0100
+++ b/Source/JavaScriptCore/llint/LowLevelInterpreter.cpp   2016-01-14 21:25:52.000000000 +0100
@@ -23,6 +23,15 @@
  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+// If we are using gcc >= 5.0, make sure that this compilation unit is
+// compiled with -fno-lto, because we may be using inline assembly
+// included from LLIntAssembly.h
+#ifdef __GNUC__
+#if __GNUC__ >= 5
+#pragma GCC optimize ("no-lto")
+#endif
+#endif
+
 #include "config.h"
 #include "LowLevelInterpreter.h"


Libreoffice is ready to be compiled with lto, it is not enabled in ebuild. All is needed is to append --enable-lto to configure:
Code:
--- libreoffice-5.2.2.2.ebuild   2016-10-09 21:00:04.140865463 +0300
+++ libreoffice-5.2.2.2-r1.ebuild   2016-10-09 20:59:21.324450098 +0300
@@ -78,7 +78,7 @@
 LO_EXTS="nlpsolver scripting-beanshell scripting-javascript wiki-publisher"
 
 IUSE="bluetooth +branding coinmp collada +cups dbus debug eds firebird gltf gnome googledrive
-gstreamer +gtk gtk3 jemalloc kde libressl mysql odk pdfimport postgres quickstarter telepathy test vlc
+gstreamer +gtk gtk3 jemalloc kde libressl +lto mysql odk pdfimport postgres quickstarter telepathy test vlc
 $(printf 'libreoffice_extensions_%s ' ${LO_EXTS})"
 
 LICENSE="|| ( LGPL-3 MPL-1.1 )"
@@ -404,6 +404,7 @@
    #   not linked or anything else, worthless to depend on
    econf \
       --docdir="${EPREFIX}/usr/share/doc/${PF}/" \
+      $(use_enable lto) \
       --with-system-dicts \
       --with-system-headers \
       --with-system-jars \


I have only nine packages with failed with lto in my package.env:
Code:
dev-lang/spidermonkey:0/mozjs185 no-lto-graphite
dev-libs/libgcrypt no-lto-graphite
dev-python/notify-python no-lto-graphite
gnome-base/gnome-shell no-lto-graphite
media-libs/alsa-lib no-lto-graphite
net-misc/dhcp no-lto-graphite
sys-apps/man-db no-lto-no-graphite
sys-apps/pciutils no-lto-graphite
x11-drivers/xf86-video-intel no-lto-graphite


Unfortunately, there are more packages those fail with both lto and graphite. This bug just keep to appear (with different variations) at every major version of gcc and, normally, fixed until the second release from same major branch.
For example it is (re)introduced in gcc-6.1.0 and hopefully fixed in 6.3.0.
If anyone know how to fix it, or, at least, a partial patch please post here.
Here is the bug in action:
Code:
internal compiler error: in add_loop_constraints, at graphite-sese-to-poly.c:933


and here are affected packages:
Code:
app-arch/cpio lto-no-graphite
app-arch/tar lto-no-graphite
dev-lang/orc lto-no-graphite
media-libs/flac lto-no-graphite
media-video/ffmpeg lto-no-graphite
net-mail/mailutils lto-no-graphite
sys-apps/gawk lto-no-graphite
sys-apps/groff lto-no-graphite
sys-apps/man-db no-lto-no-graphite


LTO does not provide any significant speed improvement, but binary size reduction (up to 18%) and, if a mechanical hdd is used, a better start time. On a ssd, only firefox and thunderbird provide a visible start speed-up.
Since version 53 chromium behave somewhat erratic and buggy, so I replaced it with google-chrome-beta, but I wouldn't be surprised if, with some patches, chromium, also, enter in "lto ready packages".

If you have similar experiences, fixes, ideas regarding lto and graphite please post here.
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
haarp
Guru
Guru


Joined: 31 Oct 2007
Posts: 535

PostPosted: Tue Oct 11, 2016 10:25 am    Post subject: Reply with quote

Thanks, I finally managed to compile Firefox with this patch
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Tue Oct 11, 2016 12:28 pm    Post subject: Reply with quote

You are most welcome, but the patch is not mine. It belongs to James Shelton.
Here is the bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1258215
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
NTU
Apprentice
Apprentice


Joined: 17 Jul 2015
Posts: 187

PostPosted: Thu Oct 13, 2016 4:14 am    Post subject: Reply with quote

As for the reason why this patch is needed is explained by trippels in case anyone else wanted to know:

https://forums.gentoo.org/viewtopic-p-7971722.html#7971722

-floop-nest-optimize replaces -floop-interchange -floop-strip-mine -floop-block btw.

The flags I'm using are as follows:

Code:
# Cherry-picked from -O3 which seem to not globally cause needless CPU cycles and enormous file sizes
OPT="-fpredictive-commoning -fgcse-after-reload -fvect-cost-model -ftree-partial-pre"
# -funsafe-math-optimizations enables our vectorization friend, -fassociative-math, and it's requirements.
# -fassociative-math allows re-ordering of operands to further along auto vectorization without
# all the other unsafe optimizations that come with -ffast-math.
VECOPT="-ftree-vectorize -funsafe-math-optimizations"
# ISL / Graphite optimization flags without blindly parallelizing everything
# -floop-nest-optimize is the new -ftree-loop-linear -floop-interchange etc etc
ISLOPT="-floop-nest-optimize -fgraphite-identity"
# Now for LTO!
LTOOPT-"-flto -fuse-linker-plugin -fno-fat-lto-objects"
# Specify a CPU type that explicitly switches on -msse -msse2 for auto vec to actually do something
# or at least -mtune=generic -msse -msse2 (and -mavx if you can / want)
CFLAGS="-march=core2 -O2 -pipe -fomit-frame-pointer ${OPT} ${VECOPT} ${ISLOPT} ${LTOOPT}"
CXXFLAGS="${CFLAGS}"
LDFLAGS="${LDFLAGS} ${CFLAGS}"


I forced -fuse-linker-plugin as you can see, in the first post of my thread in the link above, at the end of step 2 I explain more about this, feel free to chime in btw if you know the underlying reason for why it is not automatically switched on in my test program.

It'd be interesting to pair LTO + Graphite with musl and see if we can squeeze even more performance! I compiled my kernel with my OPT+VECOPT+ISLOPT flags as above, haven't noticed any issues. Remember, speed demon, not ricer!

With just graphite alone I haven't ran into a single compiling error (nor any runtime issues) with it enabled, only LTO related matters. If you're running into trouble with LTO + Graphite, try just leaving off LTO (KEEP graphite) and give it another go.

More info about auto vectorization btw in case you're interested:

https://gcc.gnu.org/projects/tree-ssa/vectorization.html#using

Speed of Firefox (both runtime and load time) is much improved and CPU usage by the application dropped like a freakin ROCK!! (I've been monitoring it using top)

One more thing, I noticed that you have -z,now specified in your LDFLAGS, if you care about security I suggest adding -z,relro to it as well, like so:

Code:
LDFLAGS="${LDFLAGS} -Wl,-z,now -Wl,-z,relro"


Gentoo's profile already has -O1 and --as-needed specified I believe, not all packages have read-only relocations though and immediate binding, more here:

https://forums.gentoo.org/viewtopic-p-7974722-highlight-.html#7974722

I'm not trying to advertise my threads (even though it may look like that) you and I just think alike and want to help out. :wink:

Using GCC 5.4 / binutils 2.26.1 / glibc 2.23 / ISL 0.17.1 / MPFR 3.1.5 / GMP 6.1.1 here!

P.S. -fgcse-las anyone? :lol:


Last edited by NTU on Thu Oct 13, 2016 7:57 pm; edited 1 time in total
Back to top
View user's profile Send private message
haarp
Guru
Guru


Joined: 31 Oct 2007
Posts: 535

PostPosted: Thu Oct 13, 2016 5:05 am    Post subject: Reply with quote

Looks like I celebrated too early. Firefox compiles now, but segfaults when I try to start it.

Quote:
Speed of Firefox (both runtime and load time) is much improved and CPU usage by the application dropped like a freakin ROCK!! (I've been monitoring it using top)

Eh, I doubt that. LTO doesn't make that much of a difference. This is most likely due to the fact that Firefox has been freshly started.
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Thu Oct 13, 2016 7:10 am    Post subject: Reply with quote

@NTU
As far I understand -floop-nest-optimize is a new way to implement graphite. That's way it replace -floop-interchange -floop-strip-mine -floop-block
I will try it. Pluto optimization algorithms is described here: http://pluto-compiler.sourceforge.net/, if anyone is interested.

Regarding graphite bug, it is a known one, but, it's priority is 4 (P4), so no much hope here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71824

-z,relro it a always a good ideea. I had have used it, but somehow was lost during tests :lol:

I will make a new backup, try a full recompilation with -floop-nest-optimize and, why not, -fgcse-las and I'll come back.
Thank you very much. This type of information exchange was the core purpose of this thread.

@haarp
Here, firefox is rock stable, 15 extensions installed. I am sorry that it is not working for you.
Do you have any error message ?
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Thu Oct 13, 2016 7:29 am    Post subject: Reply with quote

-floop-nest-optimize does not eliminate graphite bug.
Code:
x86_64-pc-linux-gnu-gcc  -O2 -pipe -march=native -mtune=skylake -w -flto=8 -fuse-linker-plugin -fno-fat-lto-objects -fgraphite-identity -floop-nest-optimize -fpredictive-commoning -fgcse-after-reload -fvect-cost-model -ftree-partial-pre -ftree-vectorize -fgcse-las  -Wl,-O1,--sort-common,--hash-style=gnu,--as-needed,-z,now,-z,relro -O2 -pipe -march=native -mtune=skylake -w -flto=8 -fuse-linker-plugin -fno-fat-lto-objects -fgraphite-identity -floop-nest-optimize -fpredictive-commoning -fgcse-after-reload -fvect-cost-model -ftree-partial-pre -ftree-vectorize -fgcse-las -fno-delete-null-pointer-checks -flifetime-dse=1 -o cpio copyin.o copyout.o copypass.o defer.o dstring.o global.o fatal.o main.o tar.o util.o filemode.o idcache.o makepath.o userspec.o ../lib/libpax.a ../gnu/libgnu.a 
getopt.c: In function ‘exchange’:
getopt.c:144:1: internal compiler error: in add_loop_constraints, at graphite-sese-to-poly.c:933
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://bugs.gentoo.org/> for instructions.
make[3]: *** [/var/tmp/portage/app-arch/cpio-2.12-r1/temp/cc7aviVP.mk:23: /var/tmp/portage/app-arch/cpio-2.12-r1/temp/ccrUCC7y.ltrans7.ltrans.o] Error 1
make[3]: *** Waiting for unfinished jobs....


Correction: does not eliminate
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
NTU
Apprentice
Apprentice


Joined: 17 Jul 2015
Posts: 187

PostPosted: Thu Oct 13, 2016 7:56 pm    Post subject: Reply with quote

haarp wrote:
Looks like I celebrated too early. Firefox compiles now, but segfaults when I try to start it.

Quote:
Speed of Firefox (both runtime and load time) is much improved and CPU usage by the application dropped like a freakin ROCK!! (I've been monitoring it using top)

Eh, I doubt that. LTO doesn't make that much of a difference. This is most likely due to the fact that Firefox has been freshly started.


I was referring to the optimizations picked from -O3 -funsafe-math-optimizations and graphite, haven't tried LTO yet with Firefox.

costel78, cpio compiles fine here with graphite and all those other flags I specified above (except LTO.) Maybe try GCC 5.4?

Code:
x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I..  -I. -I.. -I../gnu -I../gnu  -I../lib -I../lib   -march=core2 -O2 -pipe -fomit-frame-pointer -fpredictive-commoning -fgcse-after-reload -fvect-cost-model -ftree-partial-pre -ftree-vectorize -funsafe-math-optimizations -floop-nest-optimize -fgraphite-identity -Wformat -Wformat-security -Werror=format-security --param ssp-buffer-size=4 -c -o userspec.o userspec.c
x86_64-pc-linux-gnu-gcc  -march=core2 -O2 -pipe -fomit-frame-pointer -fpredictive-commoning -fgcse-after-reload -fvect-cost-model -ftree-partial-pre -ftree-vectorize -funsafe-math-optimizations -floop-nest-optimize -fgraphite-identity -Wformat -Wformat-security -Werror=format-security --param ssp-buffer-size=4  -Wl,-O1 -Wl,--as-needed -Wl,-z,now -Wl,-z,relro -o cpio copyin.o copyout.o copypass.o defer.o dstring.o global.o fatal.o main.o tar.o util.o filemode.o idcache.o makepath.o userspec.o ../lib/libpax.a ../gnu/libgnu.a 
make[2]: Leaving directory '/var/tmp/portage/app-arch/cpio-2.12-r1/work/cpio-2.12/src'
Making all in po
make[2]: Entering directory '/var/tmp/portage/app-arch/cpio-2.12-r1/work/cpio-2.12/po'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/var/tmp/portage/app-arch/cpio-2.12-r1/work/cpio-2.12/po'
Making all in tests
make[2]: Entering directory '/var/tmp/portage/app-arch/cpio-2.12-r1/work/cpio-2.12/tests'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/var/tmp/portage/app-arch/cpio-2.12-r1/work/cpio-2.12/tests'
make[2]: Entering directory '/var/tmp/portage/app-arch/cpio-2.12-r1/work/cpio-2.12'
make[2]: Leaving directory '/var/tmp/portage/app-arch/cpio-2.12-r1/work/cpio-2.12'
make[1]: Leaving directory '/var/tmp/portage/app-arch/cpio-2.12-r1/work/cpio-2.12'
>>> Source compiled.


Why are all your CFLAGS repeating, copy and paste error or issue with emerge --info?

Code:
CFLAGS="-march=core2 -O2 -pipe -fomit-frame-pointer -fpredictive-commoning -fgcse-after-reload -fvect-cost-model -ftree-partial-pre -ftree-vectorize -funsafe-math-optimizations -floop-nest-optimize -fgraphite-identity -Wformat -Wformat-security -Werror=format-security --param ssp-buffer-size=4"
<scrubbed>
CXXFLAGS="-march=core2 -O2 -pipe -fomit-frame-pointer -fpredictive-commoning -fgcse-after-reload -fvect-cost-model -ftree-partial-pre -ftree-vectorize -funsafe-math-optimizations -floop-nest-optimize -fgraphite-identity -Wformat -Wformat-security -Werror=format-security --param ssp-buffer-size=4"
<scrubbed>
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,-z,now -Wl,-z,relro"


cpio 2.12-r1 compiles fine here with those. No issues with -fgcse-las?
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Thu Oct 13, 2016 8:24 pm    Post subject: Reply with quote

app-arch/cpio, app-arch/tar, dev-lang/orc, media-libs/flac, media-video/ffmpeg, net-mail/mailutils, sys-apps/gawk, sys-apps/groff and sys-apps/man-db does not fail to me with lto or with graphite, but with both of them simultaneous.
The flags are repeating because the pasted portion is from linking phase and LDFLAGS contain CFLAGS. Or are you refering to something else ?

So far, no runtime error with -fgcse-las, but I tested only services and just started gnome. Tomorrow I will have time to make more tests.
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
mir3x
Guru
Guru


Joined: 02 Jun 2012
Posts: 455

PostPosted: Thu Oct 13, 2016 8:51 pm    Post subject: Reply with quote

I juut did my first compiles with lto thanks to your posts.

Quote:
I have only nine packages with failed with lto in my package.env:
Kod:
dev-lang/spidermonkey:0/mozjs185 no-lto-graphite


But that didn't failed.
dev-lang/spidermonkey-24.2.0-r3::gentoo emerged ok with +icu +systemicu on gcc 5.4
with those flags:
Code:
CFLAGS = "-mtune=native -O2 -pipe  -fgraphite -flto=8 -fuse-linker-plugin -fno-fat-lto-objects -fgraphite-identity -floop-interchange -floop-strip-mine -floop-block"
CXXFLAGS="-mtune=native -O2 -pipe  -fgraphite -flto=8 -fuse-linker-plugin -fno-fat-lto-objects -fgraphite-identity -floop-interchange -floop-strip-mine -floop-block -fno-delete-null-pointer-checks"
LDFLAGS="-O2 -Wl,-O1 -Wl,--enable-new-dtags -Wl,--sort-common -s -flto=4 -fno-fat-lto-objects"


Btw. on qt site https://wiki.qt.io/Performance_Tip_Startup_Time
They write: (GNU ld) Use -Bsymbolic-functions for your shared libraries.

Is this enabled by something else ? Or i should add it ?
_________________
Sent from Windows
Back to top
View user's profile Send private message
NTU
Apprentice
Apprentice


Joined: 17 Jul 2015
Posts: 187

PostPosted: Thu Oct 13, 2016 11:19 pm    Post subject: Reply with quote

If this forum had a friends list feature, costel78 would be on it. Thank you very much for testing LTO and other things that others may not be as brave to do. The more people we can get to rally behind the killer combo of graphite + LTO + OpenMP (and perhaps musl) the more QA (and Q/A) we can get done. Fast, stable, and secure performance. It's nice knowing I'm not the only one who uses -z,donthackmeplz :lol:
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Fri Oct 14, 2016 9:46 am    Post subject: Reply with quote

@mir3x
Yes, you are right: ver. 24.2.0-r3 of JS library compiles and works perfectly safe. But it is a slotted package. Older version, 1.8.5-r6, fail to compile (:0/mozjs185 is the slot number).
Also gnome-shell is on the list, despite the fact it compiles fine, but there are runtime errors, freezes etc.
Regarding QT, I will quote NTU:
Quote:
Remember, speed demon, not ricer!

I use an SSD. So start-up time improvements are insignificant.

Thank you NTU, but I does not feel I deserve it. At the end of the day it is a matter of, first, curiosity and PC resources. A large SSD and a powerfull CPU helps when we are talking about Gentoo tests.
So far, no errors whatsoever with -fgcse-las. I can not test -ffast-math or similar flags. I am a heavy user of postgresql and it does not like them, at all.

Later:
I was under impresion that -fomit-frame-pointer is not required on amd64, more than that, it is enabled by default. So I didn't use it, but I was wrong.
https://wiki.gentoo.org/wiki/GCC_optimization#-fomit-frame-pointer
However, there is no size difference with or without it on equery s package-name.

If you are interested about hardened, too, you should dig in mv posts. I consider him a guru about hardened and he was and is a source of inspiration for me.
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
mir3x
Guru
Guru


Joined: 02 Jun 2012
Posts: 455

PostPosted: Sat Oct 15, 2016 3:52 pm    Post subject: Reply with quote

What about kernel ? Is there some easy way ? (I've became very greedy, anyway - emerging doesn't hurt :D )
Any XXX-sources has some lto patch ?
_________________
Sent from Windows
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Sat Oct 15, 2016 4:24 pm    Post subject: Reply with quote

A few years back, when I tried it, it was way to buggy. Oh, and it "eat" ~ 8GB RAM.
https://kernel.googlesource.com/pub/scm/linux/kernel/git/mmarek/kbuild/+/lto
I know nothing about current status. A little busy few days, I will test again on monday.
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Mon Oct 17, 2016 4:35 pm    Post subject: Reply with quote

It seems that we'll have to wait for kernel with lto.
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
charles17
Advocate
Advocate


Joined: 02 Mar 2008
Posts: 3664

PostPosted: Tue Oct 18, 2016 6:37 am    Post subject: Reply with quote

costel78 wrote:
It seems that we'll have to wait for kernel with lto.

Already in main tree: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=84d69848c97faab0c25aa2667b273404d2e2a64a
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Tue Oct 18, 2016 7:41 am    Post subject: Reply with quote

That's a nice find. Thank you!
Michal Marek and Andi Kleen worked on this for few years now, back in 2014, Andi Kleen patch was rejected by Linus Torvalds. I am glad he changed his mind.

Later: I had some free time so made some tests.
First, the patch mentioned above seems to do like the opposite ? Am I wrong ?
Second, using git clone of kernel, when using lto it fail:
Code:
  gcc -Wp,-MD,arch/x86/kernel/.asm-offsets.s.d  -nostdinc -isystem /usr/lib/gcc/x86_64-pc-linux-gnu/6.2.0/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -march=native -fuse-linker-plugin -flto=4 -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Wno-maybe-uninitialized -Wno-frame-address -O2 --param=allow-store-data-races=0 -Wframe-larger-than=2048 -fstack-protector -Wno-unused-but-set-variable -Wno-unused-const-variable -fomit-frame-pointer -fno-var-tracking-assignments -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -DCC_HAVE_ASM_GOTO    -DKBUILD_BASENAME='"asm_offsets"'  -DKBUILD_MODNAME='"asm_offsets"'  -fverbose-asm -S -o arch/x86/kernel/asm-offsets.s arch/x86/kernel/asm-offsets.c
In file included from ./include/linux/mmzone.h:18:0,
                 from ./include/linux/gfp.h:5,
                 from ./include/linux/slab.h:14,
                 from ./include/linux/crypto.h:24,
                 from arch/x86/kernel/asm-offsets.c:8:
./include/linux/page-flags-layout.h:14:5: warning: "MAX_NR_ZONES" is not defined [-Wundef]
 #if MAX_NR_ZONES < 2
     ^~~~~~~~~~~~
./include/linux/page-flags-layout.h:57:63: warning: "NR_PAGEFLAGS" is not defined [-Wundef]
 #if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
                                                               ^~~~~~~~~~~~
./include/linux/page-flags-layout.h:78:81: warning: "NR_PAGEFLAGS" is not defined [-Wundef]
 #if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
                                                                                 ^~~~~~~~~~~~
In file included from ./include/linux/gfp.h:5:0,
                 from ./include/linux/slab.h:14,
                 from ./include/linux/crypto.h:24,
                 from arch/x86/kernel/asm-offsets.c:8:
./include/linux/mmzone.h:363:22: error: ‘MAX_NR_ZONES’ undeclared here (not in a function)
  long lowmem_reserve[MAX_NR_ZONES];
                      ^~~~~~~~~~~~
In file included from ./include/linux/sched.h:27:0,
                 from ./include/linux/kasan.h:4,
                 from ./include/linux/slab.h:118,
                 from ./include/linux/crypto.h:24,
                 from arch/x86/kernel/asm-offsets.c:8:
./include/linux/mm_types.h:30:30: warning: "SPINLOCK_SIZE" is not defined [-Wundef]
 #define ALLOC_SPLIT_PTLOCKS (SPINLOCK_SIZE > BITS_PER_LONG/8)
                              ^
./include/linux/mm_types.h:185:5: note: in expansion of macro ‘ALLOC_SPLIT_PTLOCKS’
 #if ALLOC_SPLIT_PTLOCKS
     ^~~~~~~~~~~~~~~~~~~
In file included from ./include/linux/dcache.h:12:0,
                 from ./include/linux/fs.h:7,
                 from ./include/linux/cgroup.h:16,
                 from ./include/linux/memcontrol.h:22,
                 from ./include/linux/swap.h:8,
                 from ./include/linux/suspend.h:4,
                 from arch/x86/kernel/asm-offsets.c:12:
./include/linux/lockref.h:22:29: warning: "SPINLOCK_SIZE" is not defined [-Wundef]
   IS_ENABLED(CONFIG_SMP) && SPINLOCK_SIZE <= 4)
                             ^
./include/linux/lockref.h:26:5: note: in expansion of macro ‘USE_CMPXCHG_LOCKREF’
 #if USE_CMPXCHG_LOCKREF
     ^~~~~~~~~~~~~~~~~~~
In file included from ./include/linux/highmem.h:7:0,
                 from ./include/linux/bio.h:21,
                 from ./include/linux/writeback.h:192,
                 from ./include/linux/memcontrol.h:30,
                 from ./include/linux/swap.h:8,
                 from ./include/linux/suspend.h:4,
                 from arch/x86/kernel/asm-offsets.c:12:
./include/linux/mm.h:726:62: warning: "NR_PAGEFLAGS" is not defined [-Wundef]
 #if SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
                                                              ^~~~~~~~~~~~
In file included from ./include/linux/sched.h:27:0,
                 from ./include/linux/kasan.h:4,
                 from ./include/linux/slab.h:118,
                 from ./include/linux/crypto.h:24,
                 from arch/x86/kernel/asm-offsets.c:8:
./include/linux/mm_types.h:30:30: warning: "SPINLOCK_SIZE" is not defined [-Wundef]
 #define ALLOC_SPLIT_PTLOCKS (SPINLOCK_SIZE > BITS_PER_LONG/8)
                              ^
./include/linux/mm.h:1605:5: note: in expansion of macro ‘ALLOC_SPLIT_PTLOCKS’
 #if ALLOC_SPLIT_PTLOCKS
     ^~~~~~~~~~~~~~~~~~~
make[1]: *** [Kbuild:82: arch/x86/kernel/asm-offsets.s] Error 1
make: *** [Makefile:1025: prepare0] Error 2

_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
NTU
Apprentice
Apprentice


Joined: 17 Jul 2015
Posts: 187

PostPosted: Thu Oct 20, 2016 2:56 am    Post subject: Reply with quote

Don't worry guys, I got this! Posting patch shortly :lol:

edit: Jeeze, that bit me in the arse hard..

Code:
arch/x86/built-in.o: In function `calibrate_delay_is_known':
(.text+0x47a0): multiple definition of `calibrate_delay_is_known'
init/built-in.o:(.text+0x6a0): first defined here
kernel/built-in.o: In function `smp_announce':
(.text+0x1ba20): multiple definition of `smp_announce'
arch/x86/built-in.o:(.text+0x12660): first defined here
kernel/built-in.o: In function `arch_early_irq_init':
(.init.text+0x17fe): multiple definition of `arch_early_irq_init'
arch/x86/built-in.o:(.init.text+0xb5de): first defined here
kernel/built-in.o: In function `save_stack_trace_regs':
(.text+0x26380): multiple definition of `save_stack_trace_regs'
arch/x86/built-in.o:(.text+0x95f0): first defined here
kernel/built-in.o: In function `arch_cpu_idle_dead':
(.text+0x3b7a0): multiple definition of `arch_cpu_idle_dead'
arch/x86/built-in.o:(.text+0x24a0): first defined here
kernel/built-in.o: In function `arch_cpu_idle':
(.text+0x3b790): multiple definition of `arch_cpu_idle'
arch/x86/built-in.o:(.text+0x2490): first defined here
kernel/built-in.o: In function `arch_probe_nr_irqs':
(.init.text+0x1801): multiple definition of `arch_probe_nr_irqs'
arch/x86/built-in.o:(.init.text+0xd12f): first defined here
kernel/built-in.o: In function `arch_show_interrupts':
(.text+0x57550): multiple definition of `arch_show_interrupts'
arch/x86/built-in.o:(.text+0x9c90): first defined here
kernel/built-in.o: In function `arch_dynirq_lower_bound':
(.text+0x32100): multiple definition of `arch_dynirq_lower_bound'
arch/x86/built-in.o:(.text+0x23fd0): first defined here
kernel/built-in.o: In function `arch_remove_reservations':
(.text+0x31b30): multiple definition of `arch_remove_reservations'
arch/x86/built-in.o:(.text+0x44d0): first defined here
kernel/built-in.o: In function `arch_disable_smp_support':
(.text+0x1ba60): multiple definition of `arch_disable_smp_support'
arch/x86/built-in.o:(.text+0xe4a0): first defined here
kernel/built-in.o: In function `arch_irq_work_raise':
(.text+0x34680): multiple definition of `arch_irq_work_raise'
arch/x86/built-in.o:(.text+0x11600): first defined here
kernel/built-in.o: In function `arch_cpu_idle_enter':
(.text+0x3b7c0): multiple definition of `arch_cpu_idle_enter'
arch/x86/built-in.o:(.text+0x24f0): first defined here
kernel/built-in.o: In function `arch_cpu_idle_exit':
(.text+0x3b7b0): multiple definition of `arch_cpu_idle_exit'
arch/x86/built-in.o:(.text+0x24b0): first defined here
kernel/built-in.o: In function `perf_callchain_kernel':
(.text+0x77ea0): multiple definition of `perf_callchain_kernel'
arch/x86/built-in.o:(.text+0x1ac50): first defined here
kernel/built-in.o: In function `read_persistent_clock':
(.text+0x610f0): multiple definition of `read_persistent_clock'
arch/x86/built-in.o:(.text+0x45d0): first defined here
kernel/built-in.o: In function `arch_perf_update_userpage':
(.text+0x76040): multiple definition of `arch_perf_update_userpage'
arch/x86/built-in.o:(.text+0x1acd0): first defined here
kernel/built-in.o: In function `arch_dup_task_struct':
(.text+0x8c0): multiple definition of `arch_dup_task_struct'
arch/x86/built-in.o:(.text+0x4910): first defined here
kernel/built-in.o: In function `save_stack_trace_tsk':
(.text+0x263d0): multiple definition of `save_stack_trace_tsk'
arch/x86/built-in.o:(.text+0x95a0): first defined here
kernel/built-in.o: In function `perf_callchain_user':
(.text+0x78120): multiple definition of `perf_callchain_user'
arch/x86/built-in.o:(.text+0x1ab20): first defined here
kernel/built-in.o: In function `arch_jump_label_transform_static':
(.init.text+0x1b3b): multiple definition of `arch_jump_label_transform_static'
arch/x86/built-in.o:(.init.text+0x62aa): first defined here
kernel/built-in.o: In function `sched_clock':
(.text+0x3f0c0): multiple definition of `sched_clock'
arch/x86/built-in.o:(.text+0x5f90): first defined here
kernel/built-in.o: In function `arch_release_task_struct':
(.text+0xd20): multiple definition of `arch_release_task_struct'
arch/x86/built-in.o:(.text+0x44a0): first defined here
kernel/built-in.o: In function `perf_event_print_debug':
(.text+0x707f0): multiple definition of `perf_event_print_debug'
arch/x86/built-in.o:(.text+0x15760): first defined here
kernel/built-in.o: In function `arch_task_cache_init':
(.init.text+0x100): multiple definition of `arch_task_cache_init'
arch/x86/built-in.o:(.text+0x48c0): first defined here
mm/built-in.o: In function `get_user_pages_fast':
(.text+0xabd0): multiple definition of `get_user_pages_fast'
arch/x86/built-in.o:(.text+0x2b930): first defined here
mm/built-in.o: In function `vmalloc_sync_all':
(.text+0x2fa00): multiple definition of `vmalloc_sync_all'
arch/x86/built-in.o:(.text+0x264b0): first defined here
mm/built-in.o: In function `vmemmap_populate_print_last':
(.meminit.text+0x15d): multiple definition of `vmemmap_populate_print_last'
arch/x86/built-in.o:(.meminit.text+0x5c6): first defined here
mm/built-in.o: In function `__get_user_pages_fast':
(.text+0xa490): multiple definition of `__get_user_pages_fast'
arch/x86/built-in.o:(.text+0x2bae0): first defined here
fs/built-in.o: In function `arch_report_meminfo':
(.text+0x61590): multiple definition of `arch_report_meminfo'
arch/x86/built-in.o:(.text+0x2c820): first defined here
arch/x86/lib/built-in.o: In function `__iowrite32_copy':
(.text+0x940): multiple definition of `__iowrite32_copy'
lib/built-in.o:(.text+0x8ad0): first defined here
drivers/built-in.o: In function `arch_restore_msi_irqs':
(.text+0xcaa0): multiple definition of `arch_restore_msi_irqs'
arch/x86/built-in.o:(.text+0x1850): first defined here
drivers/built-in.o: In function `arch_teardown_msi_irq':
(.text+0xc230): multiple definition of `arch_teardown_msi_irq'
arch/x86/built-in.o:(.text+0x1860): first defined here
drivers/built-in.o: In function `arch_setup_msi_irqs':
(.text+0xc570): multiple definition of `arch_setup_msi_irqs'
arch/x86/built-in.o:(.text+0x1880): first defined here
drivers/built-in.o: In function `arch_msi_mask_irq':
(.text+0xcb00): multiple definition of `arch_msi_mask_irq'
arch/x86/built-in.o:(.text+0x1840): first defined here
drivers/built-in.o: In function `arch_msix_mask_irq':
(.text+0xc1d0): multiple definition of `arch_msix_mask_irq'
arch/x86/built-in.o:(.text+0x1830): first defined here
drivers/built-in.o: In function `unxlate_dev_mem_ptr':
(.text+0x76390): multiple definition of `unxlate_dev_mem_ptr'
arch/x86/built-in.o:(.text+0x2cdf0): first defined here
drivers/built-in.o: In function `arch_teardown_msi_irqs':
(.text+0xc320): multiple definition of `arch_teardown_msi_irqs'
arch/x86/built-in.o:(.text+0x1870): first defined here
drivers/built-in.o: In function `phys_mem_access_prot_allowed':
(.text+0x76380): multiple definition of `phys_mem_access_prot_allowed'
arch/x86/built-in.o:(.text+0x292b0): first defined here
arch/x86/pci/built-in.o: In function `pcibios_root_bridge_prepare':
(.text+0x2970): multiple definition of `pcibios_root_bridge_prepare'
drivers/built-in.o:(.text+0xd6d0): first defined here
arch/x86/pci/built-in.o: In function `pcibios_add_bus':
(.text+0x31c0): multiple definition of `pcibios_add_bus'
drivers/built-in.o:(.text+0xd6f0): first defined here
arch/x86/pci/built-in.o: In function `pcibios_add_device':
(.text+0x30c0): multiple definition of `pcibios_add_device'
drivers/built-in.o:(.text+0x39e0): first defined here
arch/x86/pci/built-in.o: In function `pcibios_enable_device':
(.text+0x3080): multiple definition of `pcibios_enable_device'
drivers/built-in.o:(.text+0x4330): first defined here
arch/x86/pci/built-in.o: In function `pci_ext_cfg_avail':
(.text+0x3040): multiple definition of `pci_ext_cfg_avail'
drivers/built-in.o:(.text+0x830): first defined here
arch/x86/pci/built-in.o: In function `pcibios_resource_survey_bus':
(.text+0x1240): multiple definition of `pcibios_resource_survey_bus'
drivers/built-in.o:(.text+0x157d0): first defined here
arch/x86/pci/built-in.o: In function `pcibios_retrieve_fw_addr':
(.text+0xf30): multiple definition of `pcibios_retrieve_fw_addr'
drivers/built-in.o:(.text+0x8060): first defined here
arch/x86/pci/built-in.o: In function `pcibios_penalize_isa_irq':
(.text+0x1d40): multiple definition of `pcibios_penalize_isa_irq'
drivers/built-in.o:(.text+0x39b0): first defined here
arch/x86/pci/built-in.o: In function `pcibios_remove_bus':
(.text+0x31b0): multiple definition of `pcibios_remove_bus'
drivers/built-in.o:(.text+0xd6e0): first defined here
arch/x86/pci/built-in.o: In function `pcibios_disable_device':
(.text+0x3050): multiple definition of `pcibios_disable_device'
drivers/built-in.o:(.text+0x39c0): first defined here
arch/x86/pci/built-in.o: In function `pcibios_setup':
(.init.text+0xfa8): multiple definition of `pcibios_setup'
drivers/built-in.o:(.init.text+0x546): first defined here
net/built-in.o: In function `skb_copy_bits':
(.text+0x8f60): multiple definition of `skb_copy_bits'
kernel/built-in.o:(.text+0x6f0e0): first defined here
collect2: error: ld returned 1 exit status
Makefile:947: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1


https://github.com/NTULINUX/kernel-patches/blob/master/kernel_lto_patch.patch

I modified this:

https://git.kernel.org/cgit/linux/kernel/git/mmarek/kbuild.git/commit/?h=lto&id=19a3cc83353e3bb4bc28769f8606139a3d350d2d

FIXED: Misc clean up etc. I probably did something stupid, 2:40am, should get to sleep. Maybe one of you can catch my mistake. Enjoy!

FIXED: I'm awake now! The issue seems to be that LTO_FINAL_CFLAGS is not doing or being inherited by anything (correctly, anyways.) LTO_FINAL_CFLAGS is not being called. I'll keep working on it on and off, I'm sure I'll figure it out.

FIXED: Still ironing out the last few problems, will update the patch when I'm done!

Update: Try now! :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol:
Back to top
View user's profile Send private message
NTU
Apprentice
Apprentice


Joined: 17 Jul 2015
Posts: 187

PostPosted: Fri Oct 21, 2016 4:53 pm    Post subject: Reply with quote

C'mon guys, don't die on me! Where you LTO boys at? Any tips on my kernel patch?
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Fri Oct 21, 2016 8:47 pm    Post subject: Reply with quote

Great work! :D

Don't worry, I don't forget or give up on lto. Just a little busy with work.
I tried the patch and, with minimal modifications, it apply against stable git sources. Unfortunately, my C/C++ skills are (more than) a little rusty. I work, mostly, with databases and web developer and, of course, there is an error.

Here is the error:
Code:
(cat /dev/null;   cat net/core/modules.order;   cat net/llc/modules.order;   cat net/ethernet/modules.order;   cat net/802/modules.order;   cat net/sched/modules.order;   cat net/netlink/modules.order;   cat net/netfilter/modules.order;   cat net/ipv4/modules.order;   cat net/xfrm/modules.order;   cat net/unix/modules.order;   cat net/ipv6/modules.order;   cat net/packet/modules.order;   cat net/key/modules.order;   cat net/bridge/modules.order;   cat net/8021q/modules.order;   cat net/wireless/modules.order;   cat net/mac80211/modules.order;   cat net/netlabel/modules.order;   cat net/dns_resolver/modules.order;) > net/modules.order
make -f ./scripts/Makefile.build obj=lib
make -f ./scripts/Makefile.build obj=lib/fonts
(cat /dev/null; ) > lib/fonts/modules.order
make -f ./scripts/Makefile.build obj=lib/lz4
(cat /dev/null; ) > lib/lz4/modules.order
make -f ./scripts/Makefile.build obj=lib/lzo
(cat /dev/null; ) > lib/lzo/modules.order
make -f ./scripts/Makefile.build obj=lib/xz
(cat /dev/null; ) > lib/xz/modules.order
make -f ./scripts/Makefile.build obj=lib/zlib_deflate
(cat /dev/null; ) > lib/zlib_deflate/modules.order
make -f ./scripts/Makefile.build obj=lib/zlib_inflate
(cat /dev/null; ) > lib/zlib_inflate/modules.order
  objdump -h lib/lib.a | sed -ne '/___ksymtab/{s/.*+/EXTERN(/;s/ .*/)/;p}' >lib/.lib-ksyms.o.lds; rm -f lib/.lib_exports.o; ar rcsD lib/.lib_exports.o; ./scripts/gcc-ld -flto=8 -fuse-linker-plugin -fno-toplevel-reorder -fno-fat-lto-objects -allow-multiple-definition -dH -fdump-ipa-cgraph -fdump-ipa-inline-details -flto=8 -fuse-linker-plugin -fno-toplevel-reorder -fno-fat-lto-objects -fipa-cp-clone  -O2 -fno-strict-aliasing -fno-common -falign-jumps=1 -falign-loops=1 -funit-at-a-time -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -fstack-protector -fomit-frame-pointer -fno-var-tracking-assignments -fno-strict-overflow -fconserve-stack -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -maccumulate-outgoing-args -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -Werror-implicit-function-declaration -Wno-format-security -Wno-sign-compare -Wno-maybe-uninitialized -Wno-frame-address -Wframe-larger-than=2048 -Wno-unused-but-set-variable -Wno-unused-const-variable -Wdeclaration-after-statement -Wno-pointer-sign -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -m elf_x86_64   -r -o lib/lib-ksyms.o -T lib/.lib-ksyms.o.lds lib/.lib_exports.o; rm lib/.lib_exports.o lib/.lib-ksyms.o.lds
/usr/lib/gcc/x86_64-pc-linux-gnu/6.2.0/../../../../x86_64-pc-linux-gnu/bin/ld:lib/.lib-ksyms.o.lds:1: syntax error
collect2: error: ld returned 1 exit status
make[1]: *** [scripts/Makefile.build:432: lib/lib-ksyms.o] Error 1
make: *** [Makefile:1001: lib] Error 2


As you can see, I use V=1.
It seems that gcc does not recognize -nostdlib added by ./scripts/gcc-ld here:
Code:
./scripts/gcc-ld -flto=8 -fuse-linker-plugin -fno-toplevel-reorder -fno-fat-lto-objects -allow-multiple-definition -dH -fdump-ipa-cgraph -fdump-ipa-inline-details -flto=8 -fuse-linker-plugin -fno-toplevel-reorder -fno-fat-lto-objects -fipa-cp-clone  -O2 -fno-strict-aliasing -fno-common -falign-jumps=1 -falign-loops=1 -funit-at-a-time -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -fstack-protector -fomit-frame-pointer -fno-var-tracking-assignments -fno-strict-overflow -fconserve-stack -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -maccumulate-outgoing-args -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -Werror-implicit-function-declaration -Wno-format-security -Wno-sign-compare -Wno-maybe-uninitialized -Wno-frame-address -Wframe-larger-than=2048 -Wno-unused-but-set-variable -Wno-unused-const-variable -Wdeclaration-after-statement -Wno-pointer-sign -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -m elf_x86_64   -r -o lib/lib-ksyms.o -T lib/.lib-ksyms.o.lds lib/.lib_exports.o;


As far I can determine, it's haapening, just here, at final symbols linking, but no clue why, but I'll keep trying.

Did I said that you did a great work ? :)

Later: I am definitely stupid and blind. The real error is CC is not defined in this case so, the command line is exec -nostdlib ....
Need some sleep to recharge my batteries.
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
NTU
Apprentice
Apprentice


Joined: 17 Jul 2015
Posts: 187

PostPosted: Fri Oct 21, 2016 11:44 pm    Post subject: Reply with quote

Glad you got it going as I did! Woo! And thank you :D
Back to top
View user's profile Send private message
CaptainBlood
Advocate
Advocate


Joined: 24 Jan 2010
Posts: 3519

PostPosted: Sat Oct 22, 2016 3:59 pm    Post subject: Reply with quote

Hi,
Does it apply to 4.8.2 out of the box?
I'd be glad to try but have no knowledge how to patch but ebuilds.

Any tip plz?

Thks 4 ur attention.
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Sat Oct 22, 2016 5:33 pm    Post subject: Reply with quote

CaptainBlood wrote:
Does it apply to 4.8.2 out of the box?

Apply out of the box is the easy part. :wink: Working out of the box is a completely different story...
I had to move some modules to built-in and manually investigate few build errors. Here is where make -j1 V=1 all helps a lot.

!!! DISCLAIMER !!!
I have almost zero experience in kernel development. All I know is based on patch-o-matic and, later, xtables-addons little hacking experince, so use it on your own risk.
Patch is based on Andi Kleen and Alec Ari work and our co-forumer NTU update on it.

Code:
diff -urN linux-4.8.3-gentoo.orig/arch/x86/Kconfig linux-4.8.3-gentoo/arch/x86/Kconfig
--- linux-4.8.3-gentoo.orig/arch/x86/Kconfig   2016-10-03 02:24:33.000000000 +0300
+++ linux-4.8.3-gentoo/arch/x86/Kconfig   2016-10-22 21:22:35.893273858 +0300
@@ -665,7 +665,7 @@
 
 config SCHED_OMIT_FRAME_POINTER
    def_bool y
-   prompt "Single-depth WCHAN output"
+   prompt "Single-depth WCHAN output" if !LTO && !FRAME_POINTER
    depends on X86
    ---help---
      Calculate simpler /proc/<PID>/wchan values. If this option
diff -urN linux-4.8.3-gentoo.orig/Documentation/lto-build linux-4.8.3-gentoo/Documentation/lto-build
--- linux-4.8.3-gentoo.orig/Documentation/lto-build   1970-01-01 02:00:00.000000000 +0200
+++ linux-4.8.3-gentoo/Documentation/lto-build   2016-10-22 21:22:22.435128606 +0300
@@ -0,0 +1,104 @@
+Link time optimization (LTO) for the Linux kernel
+
+This is an experimental feature.
+
+Link Time Optimization allows the compiler to optimize the complete program
+instead of just each file.  LTO requires at least gcc 4.8 (but
+works more efficiently with 4.9+) LTO requires Linux binutils 2.26.1 or
+newer.
+
+The compiler can inline functions between files and do various other global
+optimizations, like specializing functions for common parameters,
+determing when global variables are clobbered, making functions pure/const,
+propagating constants globally, removing unneeded data and others.
+
+It will also drop unused functions which can make the kernel
+image smaller in some circumstances, in particular for small kernel
+configurations.
+
+For small monolithic kernels it can throw away unused code very effectively
+(especially when modules are disabled) and usually shrinks
+the code size.
+
+Build time and memory consumption at build time will increase, depending
+on the size of the largest binary. Modular kernels are less affected.
+With LTO incremental builds are less incremental, as always the whole
+binary needs to be re-optimized (but not re-parsed)
+
+Oops can be somewhat more difficult to read, due to the more aggressive
+inlining.
+
+Normal "reasonable" builds work with less than 4GB of RAM, but very large
+configurations like allyesconfig may need more memory. The actual
+memory needed depends on the available memory (gcc sizes its garbage
+collector pools based on that or on the ulimit -m limits) and
+the compiler version.
+
+gcc 4.9+ has much better build performance and less memory consumption
+
+- A few kernel features are currently incompatible with LTO, in particular
+function tracing, because they require special compiler flags for
+specific files, which is not supported in LTO right now.
+- Jobserver control for -j does not work correctly for the final
+LTO phase due to some problems with the kernel's pipe code.
+The makefiles hard codes -j<number of online cpus> for the final
+LTO phase to work around for this
+
+Configuration:
+- Enable CONFIG_LTO_MENU and then disable CONFIG_LTO_DISABLE.
+This is mainly to not have allyesconfig default to LTO.
+- FUNCTION_TRACER, STACK_TRACER, FUNCTION_GRAPH_TRACER, KALLSYMS_ALL, GCOV
+have to disabled because they are currently incompatible with LTO.
+- MODVERSIONS have to be disabled (may work with 4.9+)
+
+Requirements:
+- Enough memory: 4GB for a standard build, more for allyesconfig
+The peak memory usage happens single threaded (when lto-wpa merges types),
+so dialing back -j options will not help much.
+
+A 32bit compiler is unlikely to work due to the memory requirements.
+You can however build a kernel targeted at 32bit on a 64bit host.
+
+FAQs:
+
+Q: I get a section type attribute conflict
+A: Usually because of someone doing
+const __initdata (should be const __initconst) or const __read_mostly
+(should be just const). Check both symbols reported by gcc.
+
+Q: I see lots of undefined symbols for memcmp etc.
+A: Usually because NM=gcc-nm AR=gcc-ar are missing.
+The Makefile tries to set those automatically, but it doesn't always
+work. Better to set it manually on the make command line.
+
+Q: It's quite slow / uses too much memory.
+A: Consider a gcc 4.9 snapshot/release (not released yet)
+The main problem in 4.8 is the type merging in the single threaded WPA pass,
+which has been improved considerably in 4.9 by running it distributed.
+
+Q: It's still slow
+A: It'll always be somewhat slower than non LTO sorry.
+
+Q: What's up with .XXXXX numeric post fixes
+A: This is due LTO turning (near) all symbols to static
+Use gcc 4.9, it avoids them in most cases. They are also filtered out
+in kallsyms.
+
+References:
+
+Presentation on Kernel LTO
+(note, performance numbers/details outdated.  In particular gcc 4.9 fixed
+most of the build time problems):
+http://halobates.de/kernel-lto.pdf
+
+Generic gcc LTO:
+http://www.ucw.cz/~hubicka/slides/labs2013.pdf
+http://www.hipeac.net/system/files/barcelona.pdf
+
+Somewhat outdated too:
+http://gcc.gnu.org/projects/lto/lto.pdf
+http://gcc.gnu.org/projects/lto/whopr.pdf
+
+Happy Link-Time-Optimizing!
+
+Andi Kleen and Alec Ari
diff -urN linux-4.8.3-gentoo.orig/init/Kconfig linux-4.8.3-gentoo/init/Kconfig
--- linux-4.8.3-gentoo.orig/init/Kconfig   2016-10-03 02:24:33.000000000 +0300
+++ linux-4.8.3-gentoo/init/Kconfig   2016-10-22 21:22:35.893273858 +0300
@@ -1330,6 +1330,77 @@
 
 endchoice
 
+config LTO_MENU
+   bool "Enable gcc link time optimization (LTO)"
+   # Only tested on X86 for now. For other architectures you likely
+   # have to fix some things first, like adding asmlinkages etc.
+   depends on X86
+   # lto does not support excluding flags for specific files
+   # right now. Can be removed if that is fixed.
+   depends on !FUNCTION_TRACER
+   help
+     With this option gcc will do whole program optimizations for
+     the whole kernel and module. This increases compile time, but can
+     lead to better code. It allows gcc to inline functions between
+     different files and do other optimization.  It might also trigger
+     bugs due to more aggressive optimization. It allows gcc to drop unused
+     code. On smaller monolithic kernel configurations
+     it usually leads to smaller kernels, especially when modules
+     are disabled.
+
+     With this option gcc will also do some global checking over
+     different source files. It also disables a number of kernel
+     features.
+
+     This option is recommended for release builds. With LTO
+     the kernel always has to be re-optimized (but not re-parsed)
+     on each build.
+
+     This requires a gcc 4.8 or later compiler and
+     Linux binutils 2.21.51.0.3 or later.  gcc 4.9 builds significantly
+     faster than 4.8 It does not currently work with a FSF release of
+     binutils or with the gold linker.
+
+     On larger configurations this may need more than 4GB of RAM.
+     It will likely not work on those with a 32bit compiler.
+
+     When the toolchain support is not available this will (hopefully)
+     be automatically disabled.
+
+     For more information see Documentation/lto-build
+
+config LTO_DISABLE
+         bool "Disable LTO again"
+         depends on LTO_MENU
+         default n
+         help
+           This option is merely here so that allyesconfig or allmodconfig do
+           not enable LTO. If you want to actually use LTO do not enable.
+
+config LTO
+   bool
+   default y
+   depends on LTO_MENU && !LTO_DISABLE
+
+config LTO_DEBUG
+   bool "Enable LTO compile time debugging"
+   depends on LTO
+   help
+     Enable LTO debugging in the compiler. The compiler dumps
+     some log files that make it easier to figure out LTO
+     behavior. The log files also allow to reconstruct
+     the global inlining and a global callgraph.
+     They however add some (single threaded) cost to the
+     compilation.  When in doubt do not enable.
+
+config LTO_CP_CLONE
+   bool "Allow aggressive cloning for function specialization"
+   depends on LTO
+   help
+     Allow the compiler to clone and specialize functions for specific
+     arguments when it determines these arguments are very commonly
+     called.  Experimential. Will increase text size.
+
 config SYSCTL
    bool
 
@@ -1934,6 +2005,8 @@
 
 config MODVERSIONS
    bool "Module versioning support"
+   # LTO should work with gcc 4.9
+   depends on !LTO
    help
      Usually, you have to use modules compiled with your kernel.
      Saying Y here makes it sometimes possible to use modules
diff -urN linux-4.8.3-gentoo.orig/kernel/gcov/Kconfig linux-4.8.3-gentoo/kernel/gcov/Kconfig
--- linux-4.8.3-gentoo.orig/kernel/gcov/Kconfig   2016-10-03 02:24:33.000000000 +0300
+++ linux-4.8.3-gentoo/kernel/gcov/Kconfig   2016-10-22 21:22:35.894273869 +0300
@@ -2,7 +2,7 @@
 
 config GCOV_KERNEL
    bool "Enable gcov-based kernel profiling"
-   depends on DEBUG_FS
+   depends on DEBUG_FS && !LTO
    select CONSTRUCTORS if !UML
    default n
    ---help---
diff -urN linux-4.8.3-gentoo.orig/lib/Kconfig.debug linux-4.8.3-gentoo/lib/Kconfig.debug
--- linux-4.8.3-gentoo.orig/lib/Kconfig.debug   2016-10-03 02:24:33.000000000 +0300
+++ linux-4.8.3-gentoo/lib/Kconfig.debug   2016-10-22 21:22:35.895273880 +0300
@@ -216,7 +216,7 @@
 
 config READABLE_ASM
         bool "Generate readable assembler code"
-        depends on DEBUG_KERNEL
+        depends on DEBUG_KERNEL && !LTO
         help
           Disable some compiler optimizations that tend to generate human unreadable
           assembler output. This may make the kernel slightly slower, but it helps
diff -urN linux-4.8.3-gentoo.orig/Makefile linux-4.8.3-gentoo/Makefile
--- linux-4.8.3-gentoo.orig/Makefile   2016-10-22 13:34:24.338809497 +0300
+++ linux-4.8.3-gentoo/Makefile   2016-10-22 21:22:35.891273837 +0300
@@ -343,13 +343,24 @@
 scripts/Kbuild.include: ;
 include scripts/Kbuild.include
 
+include ${srctree}/scripts/Makefile.lto
+
 # Make variables (CC, etc...)
 AS      = $(CROSS_COMPILE)as
-LD      = $(CROSS_COMPILE)ld
+LD      = ${srctree}/scripts/gcc-ld ${LTO_FINAL_CFLAGS}
+LDFINAL   = $(LD)
 CC      = $(CROSS_COMPILE)gcc
 CPP      = $(CC) -E
+ifdef CONFIG_LTO
+AR      = $(CROSS_COMPILE)gcc-ar
+else
 AR      = $(CROSS_COMPILE)ar
+endif
+ifdef CONFIG_LTO
+NM      = $(CROSS_COMPILE)gcc-nm
+else
 NM      = $(CROSS_COMPILE)nm
+endif
 STRIP      = $(CROSS_COMPILE)strip
 OBJCOPY      = $(CROSS_COMPILE)objcopy
 OBJDUMP      = $(CROSS_COMPILE)objdump
@@ -414,7 +425,7 @@
 
 export VERSION PATCHLEVEL SUBLEVEL KERNELRELEASE KERNELVERSION
 export ARCH SRCARCH CONFIG_SHELL HOSTCC HOSTCFLAGS CROSS_COMPILE AS LD CC
-export CPP AR NM STRIP OBJCOPY OBJDUMP
+export CPP AR NM STRIP OBJCOPY OBJDUMP LDFINAL
 export MAKE AWK GENKSYMS INSTALLKERNEL PERL PYTHON UTS_MACHINE
 export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
 
@@ -425,6 +436,17 @@
 export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
 export KBUILD_ARFLAGS
 
+ifdef CONFIG_LTO
+# LTO gcc creates a lot of files in TMPDIR, and with /tmp as tmpfs
+# it's easy to drive the machine OOM. Use the object directory
+# instead.
+ifndef TMPDIR
+TMPDIR ?= $(objtree)
+export TMPDIR
+$(info setting TMPDIR=$(objtree) for LTO build)
+endif
+endif
+
 # When compiling out-of-tree modules, put MODVERDIR in the module
 # tree rather than in the kernel tree. The kernel tree might
 # even be read-only.
@@ -789,6 +811,7 @@
 include scripts/Makefile.kasan
 include scripts/Makefile.extrawarn
 include scripts/Makefile.ubsan
+include scripts/Makefile.lto
 
 # Add any arch overrides and user supplied CPPFLAGS, AFLAGS and CFLAGS as the
 # last assignments
diff -urN linux-4.8.3-gentoo.orig/scripts/link-vmlinux.sh linux-4.8.3-gentoo/scripts/link-vmlinux.sh
--- linux-4.8.3-gentoo.orig/scripts/link-vmlinux.sh   2016-10-03 02:24:33.000000000 +0300
+++ linux-4.8.3-gentoo/scripts/link-vmlinux.sh   2016-10-22 21:22:35.895273880 +0300
@@ -53,7 +53,7 @@
    local lds="${objtree}/${KBUILD_LDS}"
 
    if [ "${SRCARCH}" != "um" ]; then
-      ${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}                  \
+      ${LDFINAL} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}                  \
          -T ${lds} ${KBUILD_VMLINUX_INIT}                     \
          --start-group ${KBUILD_VMLINUX_MAIN} --end-group ${1}
    else
diff -urN linux-4.8.3-gentoo.orig/scripts/Makefile.lto linux-4.8.3-gentoo/scripts/Makefile.lto
--- linux-4.8.3-gentoo.orig/scripts/Makefile.lto   1970-01-01 02:00:00.000000000 +0200
+++ linux-4.8.3-gentoo/scripts/Makefile.lto   2016-10-22 21:22:22.435128606 +0300
@@ -0,0 +1,45 @@
+#
+# Support for gcc link time optimization
+#
+
+DISABLE_LTO :=
+LTO_CFLAGS :=
+
+export DISABLE_LTO
+export LTO_CFLAGS
+
+ifdef CONFIG_LTO
+# the -fno-toplevel-reorder is to preserve the order of initcalls
+# everything else should tolerate reordering
+   LTO_CFLAGS := -flto=$(shell cat /proc/cpuinfo | grep processor | wc -l) \
+      -fuse-linker-plugin -fno-toplevel-reorder -fno-fat-lto-objects
+   LTO_FINAL_CFLAGS := ${LTO_CFLAGS} -allow-multiple-definition
+
+# Used to disable LTO for specific files (e.g. vdso)
+   DISABLE_LTO := -fno-lto
+
+ifdef CONFIG_LTO_DEBUG
+   LTO_FINAL_CFLAGS += -dH -fdump-ipa-cgraph -fdump-ipa-inline-details
+endif
+ifdef CONFIG_LTO_CP_CLONE
+   LTO_CFLAGS += -fipa-cp-clone
+   LTO_FINAL_CFLAGS += ${LTO_CFLAGS}
+endif
+
+   # In principle gcc should pass through options in the object files,
+   # but it doesn't always work. So do it here manually
+   # Note that special options for individual files does not
+   # work currently (except for some special cases that only
+   # affect the compiler frontend)
+   # The main offenders are FTRACE and GCOV -- we exclude
+   # those in the config.
+   LTO_FINAL_CFLAGS += $(filter -g%,${KBUILD_CFLAGS})
+   LTO_FINAL_CFLAGS += $(filter -O%,${KBUILD_CFLAGS})
+   LTO_FINAL_CFLAGS += $(filter -f%,${KBUILD_CFLAGS})
+   LTO_FINAL_CFLAGS += $(filter -m%,${KBUILD_CFLAGS})
+   LTO_FINAL_CFLAGS += $(filter -W%,${KBUILD_CFLAGS})
+
+   KBUILD_CFLAGS += ${LTO_CFLAGS}
+
+   LDFINAL := ${CONFIG_SHELL} ${srctree}/scripts/gcc-ld ${LTO_FINAL_CFLAGS}
+endif
diff -urN linux-4.8.3-gentoo.orig/scripts/Makefile.modpost linux-4.8.3-gentoo/scripts/Makefile.modpost
--- linux-4.8.3-gentoo.orig/scripts/Makefile.modpost   2016-10-03 02:24:33.000000000 +0300
+++ linux-4.8.3-gentoo/scripts/Makefile.modpost   2016-10-22 21:22:35.895273880 +0300
@@ -78,7 +78,8 @@
  $(if $(KBUILD_EXTMOD),-o $(modulesymfile))      \
  $(if $(CONFIG_DEBUG_SECTION_MISMATCH),,-S)      \
  $(if $(CONFIG_SECTION_MISMATCH_WARN_ONLY),,-E)  \
- $(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w)
+ $(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w) \
+ $(if $(CONFIG_LTO),-w)
 
 MODPOST_OPT=$(subst -i,-n,$(filter -i,$(MAKEFLAGS)))
 
@@ -116,8 +117,8 @@
 targets += $(modules:.ko=.mod.o)
 
 # Step 6), final link of the modules
-quiet_cmd_ld_ko_o = LD [M]  $@
-      cmd_ld_ko_o = $(LD) -r $(LDFLAGS)                                 \
+quiet_cmd_ld_ko_o = LDFINAL [M]  $@
+      cmd_ld_ko_o = $(LDFINAL) -r $(LDFLAGS)                            \
                              $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \
                              -o $@ $(filter-out FORCE,$^)
 


You can put it on /etc/portage/patches/sys-kernel/gentoo-sources-4.8.3 or, in /usr/src/linux exec patch -p1 < /path/lto-4.8.3.patch

I successfully apply and compile the kernel on my main desktop, my NAS and one server.

For those that previous affirmations sounds a little crazy I have to clarify that desktop and NAS belongs to me and server are used for tests and development, not in production.
Under any circumstances I wouldn't mess that much with kernel on production.

@NTU Hey, you thank me ? I gave up the idea with the kernel, you were the one who insisted and made everything :D :D :D
So thank YOU!

Edit reason: Incomplete patch posted (missing -N)
_________________
Sorry for my English. I'm still learning this language.


Last edited by costel78 on Sat Oct 22, 2016 6:26 pm; edited 1 time in total
Back to top
View user's profile Send private message
NTU
Apprentice
Apprentice


Joined: 17 Jul 2015
Posts: 187

PostPosted: Sat Oct 22, 2016 6:08 pm    Post subject: Reply with quote

I forgot to test module support, I always build my kernels with everything built in nowadays. Sorry everyone.

What changes did you have to make for my patch to work? I'll review them and put them into my patch. Btw I am the Alec you are referring to :P

To apply patches, it's very simple.

For example:

Code:
cd ~/
git clone https://github.com/NTULINUX/kernel-patches
cd <top dir of your source tree>
patch -p1 < ../kernel-patches/kernel_lto_patch.patch


If your kernel source tree is a git tree, you can use "git apply" instead. Patching and then committing your changes still works as well. When you're either desperate or paranoid, you can adjust the "fuzz" parameter of GNU's patch utility, and you can adjust the amount of "fuzz" (basically lines that don't match, not the best at wording things) to allow. Say a patch has a lot of conflicts, you specify --fuzz=3 and fix the compiling errors (if any) yourself or if set to 0, everything will need to line up.

More info:

https://git-scm.com/docs/git-apply

I was thanking you btw for telling me I did a nice job.

Feel free to make pull requests on the patch! Again, I apologize for not testing module support.
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 402

PostPosted: Sat Oct 22, 2016 6:54 pm    Post subject: Reply with quote

I just adjusted it to apply clean. No functional changes in the end.
When I saw that kernel compile failed I modified it, made some stupid mistake so, at a point, in gcc-ld CC was not defined, but, in the end, today morning, no modification was required.

Some "side-effects":
gentoo costel # systemd-analyze
Startup finished in 10.302s (firmware) + 5.474s (loader) + 2.157s (kernel) + 6.787s (userspace) = 24.720s
Previously total time was around 27s.

It not always failed with modules. I don't know the cause. Well, considering I am playing with things do not fully understand, I guess ignorance is a bless. :)
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Goto page 1, 2, 3, 4, 5, 6, 7  Next
Page 1 of 7

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum