Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Building clang with clang
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
incripshin
n00b
n00b


Joined: 07 Oct 2005
Posts: 53
Location: Seattle, WA, US

PostPosted: Sun Sep 17, 2017 4:13 am    Post subject: Building clang with clang Reply with quote

I've done a lot of investigation, and it appears the culprit is that my gcc (5.4.0) installed libLLVMSupport.so (among others) with the cxx11 ABI tag. What I'm wondering about:

  • Install llvm-ld with GCC, use clang+llvm-ld however that works. Maybe that linker will understand? [edit I decided this would never work as per my edit below; I htink I really need a single-ABI system.]
  • Downgrade GCC to 4.9, build llvm, clang, etc and retry building clang with clang
  • Upgrade GCC to something modern. 6.4? 7.1?
  • Some USE flag tweaks?


It seems this should have been fixed in LLVM 3.9. See LLVM bug 23529. But maybe I don't understand the issue.

What I did

First, I have gcc 5.4.0. My libc is musl, but that should be irrelevant. I installed llvm with the following USE flags (which I'm copying manually :()

Code:
sys-devel/llvm-common-5.0.0::gentoo
sys-libs/liboomp-5.0.0::musl USE="-hwloc -ompt {-test}"
sys-devel/llvm-5.0.0:5::gentoo USE="gold libedit libffi ncurses -debug -doc {-test}" LLVM_TARGETS="AMDGPU (X86) -AArch64 -ARM -BPF -Hexagon -Lanai -MSP430 -Mips -NVPTX -PowerPC -Sparc -SystemZ -XCore"
sys-devel/llvm-gold-5::gentoo
sys-libs/compiler-rt-sanitizers-5.0.0:5.0.0::musl USE="{-test}"
sys-libs/llvm-unwind-5.0.0::gentoo USE="libunwind static-libs {-test}"
sys-libs/libcxxabi-5.0.0::gentoo USE="libunwind static-libs {-test}"
sys-libs/libcxx-5.0.0::gentoo USE="libcxxabi libunwind static-libs -libcxxrt {-test}"
sys-devel/clang-5.0.0:5::gentoo USE="default-compiler-rt default-libcxx static-analyzer -debug -doc {-test} -xml (-z3)"  LLVM_TARGETS="AMDGPU (X86) -AArch64 -ARM -BPF -Hexagon -Lanai -MSP430 -Mips -NVPTX -PowerPC -Sparc -SystemZ -XCore" PYTHON_TARGETS="python2_7"
sys-libs/compiler-rt-5.0.0:5.0.0::gentoo USE="clang {-test}"
sys-devel/clang-runtime-5.0.0:5.0.0::gentoo USE="compiler-rt libcxx openmp sanitize"


I then switched my CC/CXX/AR/NM/RANLIB to the clang/llvm versions. Then I attempted to emerge -1v clang.

What happens

When clang tries to link against the gcc-built libLLVMSupport.so (using ld.bfd ... maybe I should try installing llvm-ld), I get a 12 undefined symbol errors. One of them:

Code:
utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/ClangDiagnosticsEmitter.cpp.o: In function `clang::EmitClangDiagsDefs(llvm::RecordKeeper&, llvm::raw_ostream&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)':
/var/tmp/portage/sys-devel/clang-5.0.0/work/x/y/cfe-5.0.0.src/utils/TableGen/ClangDiagnosticsEmitter.cpp:(.text._Zn5clang18EmitClangDiagsDefsERN4llvm12RecordKeeperERNS0_11raw_ostreamERKNSt3__112basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEE+0x65): undefined reference to `llvm::StringRef::upper() const`


So I looked around for that symbol. I found it in /usr/lib/llvm/5/lib/libLLVMSupport.so. Below shows the symbol as seen by nm. In order: GNU nm, GNU nm demangled, LLVM nm, LLVM nm demangled:

Code:
% nm -D /usr/lib/llvm/5/lib/libLLVMSupport.so | grep 'StringRef.*upper'
00000000000af2f0 T _ZNK4llvm9StringRef5upperB5cxx11Ev
% nm -DC /usr/lib/llvm/5/lib/libLLVMSupport.so | grep 'StringRef.*upper'
00000000000af2f0 T llvm::StringRef::upper[abi:cxx11]() const
% llvm-nm -dynamic /usr/lib/llvm/5/lib/libLLVMSupport.so | grep 'StringRef.*upper'
00000000000af2f0 T _ZNK4llvm9StringRef5upperB5cxx11Ev
% llvm-nm -dynamic -demangle /usr/lib/llvm/5/lib/libLLVMSupport.so | grep 'StringRef.*upper'
00000000000af2f0 T _ZNK4llvm9StringRef5upperB5cxx11Ev


So you can see that it has the 'cxx11' ABI tag, which is a GCC-5.1 invention. Also, LLVM's nm doesn't understand it. Let's see what ClangDiagnosticsEmitter.cpp.o says:

Code:
% nm -g ClangDiagnosticsEmitter.cpp.o  grep 'StringRef.*upper'
                 U _ZNK4llvm9StringRef5upperEv
% nm -gC ClangDiagnosticsEmitter.cpp.o  grep 'StringRef.*upper'
                 U llvm::StringRef::upper() const
% llvm-nm -extern-only ClangDiagnosticsEmitter.cpp.o  grep 'StringRef.*upper'
                 U _ZNK4llvm9StringRef5upperEv
% llvm-nm -extern-only -demangle ClangDiagnosticsEmitter.cpp.o  grep 'StringRef.*upper'
                 U llvm::StringRef::upper() const


LLVM & GNU nm agree that the object file doesn't use the cxx11 ABI. So that's it. I don't know what to do.

edit Just some more thoughts. It seems my first thought was oh, duh, clang should see the cxx11 symbol and use that but that means either that the compiler needs to notice or the linker needs to notice and rewrite the object file to use the different ABI. Both of which are wrong because the two-ABI thing GCC decided was a good idea was actually a dumb idea. I want a single ABI on my system. So it seems that the only solution may be to use a single-ABI GCC to compile LLVM.

P.S. I'm super sad this forum still uses BBCode and not Markdown.
Back to top
View user's profile Send private message
ct85711
Veteran
Veteran


Joined: 27 Sep 2005
Posts: 1791

PostPosted: Sun Sep 17, 2017 4:38 am    Post subject: Reply with quote

Quote:
Downgrade GCC to 4.9, build llvm, clang, etc and retry building clang with clang


One thing you need to remember, is that >=gcc-5 is NOT compatible to <gcc-5. The reason is because on gcc-5, the devs changed the file layout so that it is NOT backwards compatible. This is why you had to recompile most of your machine when you switched to gcc-5 (or you should have if you payed attention to the news article about it). So trying to downgrade will fail, because it is looking for the old file layout and not the new one.
Back to top
View user's profile Send private message
incripshin
n00b
n00b


Joined: 07 Oct 2005
Posts: 53
Location: Seattle, WA, US

PostPosted: Sun Sep 17, 2017 5:56 am    Post subject: Reply with quote

Yeah, this is a new installation. I stopped using Gentoo for too long that my installation was severely out of date. This is also why I took the opportunity to switch to musl (which got me to thinking about LLVM and the BSD/MIT/ISC life).

If all I use GCC-4.9 for is to build a working LLVM, it's worth it. I can then proceed to rebuild anything built with that broken ABI.

I've also been thinking... what I really need is a statically linked clang. It would have everything it needs to generate all LLVM libraries & binaries without any pesky GCC ABI dependency. LLVM has an option BUILD_SHARED_LIBS, which is OFF by default but forced ON in the ebuild. I bet if I changed my ebuild to *not* build them shared (which is probably, honestly, a better approach), then I get my mostly-statically-linked clang.

I'm building gcc-4.9.4 right now, but I think I like this new angle because maybe I can fix this in the future through a Gentoo bug report.
Back to top
View user's profile Send private message
ct85711
Veteran
Veteran


Joined: 27 Sep 2005
Posts: 1791

PostPosted: Sun Sep 17, 2017 3:45 pm    Post subject: Reply with quote

Honestly, I don't think you are going to have much luck getting a patch accepted because you don't like the ABI change that upstream intentionally did. From what I have seen, gcc devs have been fairly good to keeping the ABI changes so that they are backwards compatible. Gcc-5 is the one exception in that it was too big of change that they couldn't keep it backwards compatible and still be reasonable on the extra amount of code needed.

The big thing you need to keep in mind, is if upstream will accept the patch or not. From my experience Gentoo's devs are less likely to carry and maintain a patch when they know upstream won't accept it. Now if it is a patch, that you can show that upstream accepts it and will/already is committed; you'd have a decent chance to the patch being accepted as it is only a temporary thing and they don't have to maintain it when the version changes breaks it.

Beyond that, gcc-5 change over is a known breakage point; hence why there is a news item about this and we bring it up numerous times.
https://www.gentoo.org/support/news-items/2015-10-22-gcc-5-new-c++11-abi.html
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23366

PostPosted: Sun Sep 17, 2017 5:06 pm    Post subject: Reply with quote

The cxx11 ABI tag was added because the generated code isn't compatible, so without that tag, your link would succeed, but you would get undefined results at runtime. You can turn off the new ABI with a preprocessor flag, but you need to keep the system consistent. I am not aware of any plans by the gcc developers to revert this, so upgrading to newer gcc will not eliminate this issue.

Please explain why you consider this multi-ABI design to be a bad idea. It is a bit ugly, but the only other alternatives I see are either to retain the old ABI indefinitely (and all problems it had) or to cause an extremely painful flag day where any pre-update files incorrectly link with post-update files, then die horribly when the ABI incompatibility causes problems. Do you have a better solution?

As for Markdown: please never suggest such a horrible language. Markdown is a plague upon the Internet. There are trivial constructs that common Markdown parsers cannot get right that are easily expressed even in the relatively limited BBcode available on this forum.
Back to top
View user's profile Send private message
incripshin
n00b
n00b


Joined: 07 Oct 2005
Posts: 53
Location: Seattle, WA, US

PostPosted: Sun Sep 17, 2017 8:20 pm    Post subject: Reply with quote

ct85711 wrote:
Honestly, I don't think you are going to have much luck getting a patch accepted because you don't like the ABI change that upstream intentionally did. From what I have seen, gcc devs have been fairly good to keeping the ABI changes so that they are backwards compatible. Gcc-5 is the one exception in that it was too big of change that they couldn't keep it backwards compatible and still be reasonable on the extra amount of code needed.

The big thing you need to keep in mind, is if upstream will accept the patch or not. From my experience Gentoo's devs are less likely to carry and maintain a patch when they know upstream won't accept it. Now if it is a patch, that you can show that upstream accepts it and will/already is committed; you'd have a decent chance to the patch being accepted as it is only a temporary thing and they don't have to maintain it when the version changes breaks it.

Beyond that, gcc-5 change over is a known breakage point; hence why there is a news item about this and we bring it up numerous times.
https://www.gentoo.org/support/news-items/2015-10-22-gcc-5-new-c++11-abi.html


I think there's some miscommunication. I'm not interested in patching away the multiple-ABI thing. But that's a GCC invention, and LLVM doesn't want or need it. The only thing I'm interested in patching is disabling shared libraries in LLVM in Gentoo's ebuilds, which is making it in line with LLVM's defaults.

Hu wrote:
The cxx11 ABI tag was added because the generated code isn't compatible, so without that tag, your link would succeed, but you would get undefined results at runtime. You can turn off the new ABI with a preprocessor flag, but you need to keep the system consistent. I am not aware of any plans by the gcc developers to revert this, so upgrading to newer gcc will not eliminate this issue.

Please explain why you consider this multi-ABI design to be a bad idea. It is a bit ugly, but the only other alternatives I see are either to retain the old ABI indefinitely (and all problems it had) or to cause an extremely painful flag day where any pre-update files incorrectly link with post-update files, then die horribly when the ABI incompatibility causes problems. Do you have a better solution?

As for Markdown: please never suggest such a horrible language. Markdown is a plague upon the Internet. There are trivial constructs that common Markdown parsers cannot get right that are easily expressed even in the relatively limited BBcode available on this forum.


Again, if I have to go the GCC-4.9 route, it would only be to produce an LLVM installation that can build itself. I shouldn't get unpredictable results since GCC, itself, doesn't link with any external C++ libraries. Right now, LLVM built with GCC-5 and dynamically linked cannot build itself. I'm the only person who thinks this is bad, apparently.

Thanks for your exaggerated disgust at Markdown. I'm totally on your side, now.

edit
According to LLVM's build guide:
Quote:
BUILD_SHARED_LIBS:BOOL
Flag indicating if each LLVM component (e.g. Support) is built as a shared library (ON) or as a static library (OFF). Its default value is OFF. On Windows, shared libraries may be used when building with MinGW, including mingw-w64, but not when building with the Microsoft toolchain.

Note
BUILD_SHARED_LIBS is only recommended for use by LLVM developers. If you want to build LLVM as a shared library, you should use the LLVM_BUILD_LLVM_DYLIB option.


tl;dr we aren't LLVM developers so we should disable shared libraries.

edit x2
There's a pretty good discussion over https://www.reddit.com/r/cpp/comments/3b2glr/why_clang_cant_use_the_new_gcc_5_cxx11_abi/ on why it's not so terrible to have versioned symbols. All the same, that ABI change is preventing a coherent LLVM. Statically-linking the libraries is the only way to escape. I just hope that it's enough to have statically-compiled LLVM libraries. So far, I've rebuilt LLVM et al with static libraries. Now the static clang has successfully built LLVM, and it's 25% complete rebuilding clang. This is the furthest it's gone, folks. Wish me luck.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23366

PostPosted: Sun Sep 17, 2017 11:53 pm    Post subject: Reply with quote

That disgust is not exaggerated. I really do dislike Markdown.

I'm not saying it's good that LLVM can't build itself. I'm just not saying it's the fault of anything other than LLVM that it is in this halfway state that it wants to use files not built by it, but assumes files not built by it will be built the same way it builds itself. It needs either to be properly self-hosting and not rely on details of how gcc builds it, or it needs to be compatible with how gcc builds it.
Back to top
View user's profile Send private message
incripshin
n00b
n00b


Joined: 07 Oct 2005
Posts: 53
Location: Seattle, WA, US

PostPosted: Mon Sep 18, 2017 12:02 am    Post subject: Reply with quote

clang-built-clang is at 49%!

Is it really LLVM's fault when Gentoo is overriding defaults to values that only LLVM developers are supposed to use? Given LLVM's architecture, Gentoo should be using static libraries [edit i.e. BUILD_SHARED_LIBS=NO].

edit aaaaand that did it! And now to rebuild everything that depends on libstdc++...
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6780

PostPosted: Tue Sep 19, 2017 7:09 am    Post subject: Reply with quote

incripshin wrote:
Gentoo is overriding defaults to values that only LLVM developers are supposed to use?

I am not so sure: Doesn't BUILD_SHARED_LIBS=no mean that every C++ program compiled with clang will have statically included the whole STL (or at least that part it uses)? This doesn't sound like people would want that by default.
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 547
Location: NRW, Germany

PostPosted: Tue Sep 19, 2017 2:16 pm    Post subject: Reply with quote

Dumb question: Can't you just use gcc to build llvm+clang:4 and then use llvm+clang:4 to build llvm+clang:5?
Back to top
View user's profile Send private message
incripshin
n00b
n00b


Joined: 07 Oct 2005
Posts: 53
Location: Seattle, WA, US

PostPosted: Wed Sep 27, 2017 7:55 pm    Post subject: Reply with quote

mv wrote:
incripshin wrote:
Gentoo is overriding defaults to values that only LLVM developers are supposed to use?

I am not so sure: Doesn't BUILD_SHARED_LIBS=no mean that every C++ program compiled with clang will have statically included the whole STL (or at least that part it uses)? This doesn't sound like people would want that by default.


I'm writing this from memory, since I still haven't had the time to make my little-netbook-that-could a usable computer. That flag is only set to yes on llvm, clang, and lld. As far as I know, the only things that depend on those libraries are the clang, lld, and mesa packages. LLVM's libc++ et al. still build shared libraries and shouldn't link against llvm/clang's libraries.

One annoying thing about the shared libraries is that the binaries from Mesa end up being megabytes each (I think). It appears to be an annoyingly difficult optimization to make a binary link with a static library while cutting out unused symbols.

Another annoyance is, of course, that plenty of people like that rebuilding a dependency causes the consumer to break. I don't like the idea that some binary might be linked against a version of a library that isn't installed.

I really only want clang, lld, llvm-ar, etc. to be linked against the static LLVM/clang libraries. I don't mind having shared libraries.

Dr.Willy wrote:
Dumb question: Can't you just use gcc to build llvm+clang:4 and then use llvm+clang:4 to build llvm+clang:5?


Yes. I realized this myself afterward, but it was definitely not documented anywhere. Not even on the Gentoo Wiki's Clang page.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23366

PostPosted: Thu Sep 28, 2017 12:59 am    Post subject: Reply with quote

incripshin wrote:
One annoying thing about the shared libraries is that the binaries from Mesa end up being megabytes each (I think). It appears to be an annoyingly difficult optimization to make a binary link with a static library while cutting out unused symbols.
Due to historical design decisions, linking to a static library is normally done at section granularity. If you need any symbol from a section in a translation unit, the entire section from that translation unit is linked in. If linking in that section introduces new dependencies, then other sections (possibly from other translation units) are in turn linked in to resolve them, recursively. This can cause a chain reaction including code and/or data that is never used. It would be nice to fix this, but most projects pay little attention to it, since it only impacts static libraries and the fixes can make the code substantially less maintainable. You can try to discourage this by placing each symbol in its own section, but this produces a very large number of very small sections, which stresses the linker (and may also lead to file size bloat, since now each section needs to be tracked separately in section index listings).
incripshin wrote:
Another annoyance is, of course, that plenty of people like that rebuilding a dependency causes the consumer to break. I don't like the idea that some binary might be linked against a version of a library that isn't installed.
Citation needed here, please. I am not aware of anyone who likes this. I am aware of many people who consider it the least bad path practically available.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum