Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Distcc How-To
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
Vorlon
Apprentice
Apprentice


Joined: 16 May 2003
Posts: 246
Location: West Grove, PA

PostPosted: Sat Oct 31, 2020 11:19 pm    Post subject: Distcc How-To Reply with quote

Distcc is a fantastic idea: Use the power of multiple computers to speed up compiling programs on a target computer. Gentoo makes this process very easy. Simply install distcc on the "target" computer and the "helper" computer(s), make a few changes to "/etc/portage/make.conf", and go.

There is one HUGE caveat, however. The software environment (processor type and testing-level) must be the same on both the target and the helper, or this system gets very complicated very fast. But there is an easy way to make it work: use a virtual machine for the helper. Here's how.



Definitions - To Keep us Straight

There seems to be a lack of standard naming conventions within the cross-compiling documentation. To keep things straight, here are the definitions used in this How-To, which I think generally follow the conventions used in other Gentoo documents.

Build / emerge / compile - In this How-To, all of these are considered to be synonymous (I know, they're really not, but that's being too pedantic for the purposes of this How-To.)

distcc - "Distributed Cross Compiling" A method of using multiple computers to build programs for another computer, which may have a different architecture. For example, using a multi-core AMD Ryzen system to build programs for a Raspberry Pi. (Please note that bridging different CPUs with fundamentally different architectures is beyond the scope of this How-To. This document assumes that both the Target and Helper are the same basic x86 type of CPU.)

Host - Any computer that compiles programs for another. This is often a synonym for "Helper". Please note that a computer can be a host for itself and other computers at the same time.

Target - The computer that needs the help. The Target is the computer that all the other computers are building software for.

Helper - The computer which compiles programs for another computer. There can be multiple helpers working at the same time to distribute the compiling load. In this How-To, all the Helpers will be VirtualBox virtual machines so that they can EXACTLY match the Target systems in terms of CPU, Tool Chain, and testing/stability (i.e.: "~AMD64" or "AMD64").

Tool Chain - The programs used to compile source code. Typically, this will consist of gcc, libc, binutils, and the kernel. If the required Tool Chain on the Target computer does not match the active Tool Chain on the Helper, the compiled programs may not function, or may function erratically. This incompatibility is a major stumbling block for distributed cross compilation.




Step 1 - Build a Helper virtual machine in VirtualBox

a) Build the Helper virtual machine normally as you would any typical Gentoo system. You do not need to install X or any GUI systems on the Helper. You will need to either set a static IP address for the Helper, or record the assigned IP address for use later.
b) Be sure to set CFLAGS to match the Target system. Set "--march=" to the correct CPU architecture as the Target. Do NOT use "--march=native" on the Helper because the Helper would then build the same CPU architecture as the VirtualBox host instead of the Target system.



Step 2 - Install distcc on all the computers (the Target and the VirtualBox Helpers)

a) Run "emerge distcc"



Step 3 - Enable distccd service on the Helper(s)

a) Edit /etc/conf.d/distccd to enable the subnet where the Target resides so that the Helper will compile for it. The "allow" variable needs to be set. For example, DISTCCD_OPTS="${DISTCCD_OPTS} --allow 192.168.1.0/24" will enable distccd to run for any Target in the 192.168.1.x subnet.
b) Run "rc-update add distccd default" to enable the service to start automatically at boot.
c) Run "/etc/init.d/distccd start" to start the service now.



Step 4 - Update make.conf on the Target

a) Update the MAKEOPTS variable to specify the number of helper cores and the number of local cores.
a1) -j now signifies the total number of threads to use. It is set to the total number of threads for all the helpers (including the target) + 1. For example, if the local computer has 4 cores and there are 2 helpers with 4 cores each, then the parameter would be "-j13"
a2) -l (lowercase "L") is added to indicate the number of threads on the local machine + 1. For our previous example, the parameter is "-l5".
a3) Combine the two parameters in the MAKEOPTS variable. In our example, the line should be MAKEOPTS="j13 -l5"
b) Add the "FEATURES" option by adding a line that says FEATURES="distcc".



Step 5 - Set distcc variables on the Target

a) Run the program distcc-config to set the Helper system(s) using the "--set-hosts" command. Be sure to append "cpp, lzo" to each Helper host. You can add multiple URLs. For example, if the Helpers are at 192.168.1.23 & 192.168.1.54, the command will be distcc-config --set-hosts "192.168.1.23,cpp, lzo 192.168.1.54,cpp, lzo". This creates the file "/etc/distcc/hosts", which lists all the Helper hosts. You can also create this file manually.



Step 6 - Emerge on the Target

a) Emerge packages on the Target as you normally would. You will see distcc messages on the Target as the packages emerge.




Miscellaneous Notes:

• Sometimes distcc induces errors in the build process and the emerge will fail. I have not been able to determine why some packages build and some don't. Instead, I simply emerged as many as possible using distcc, then commented out the "FEATURES" variable and emerged the rest.
• Based on my somewhat limited reading of anecdotal evidence, distributed computing is still something of a black art. It should work, but often doesn't. At least one person has told me they eventually found distcc more trouble than it is worth. Depending on the Target machine, the help may or may not be worth the extra effort. Using distcc, I was able to install Gentoo on an old Pentium III with only 512K RAM. Distcc was critical to that since the Pentium III is actually waaaaay too old and feeble, and 512K RAM is waaaaay too tiny to build Gentoo.
• I recommend making the Helper a VirtualBox virtual machine so you can ensure the Target and Helper have identical settings and Tool Chains. But the Helper can be anything as long as these items match the Target.
• There are ways of building a unique Tool Chain on the Helper without going through all the VirtualBox stuff, but I found creating a new VirtualBox machine a lot easier than trying to figure out the confusing Wiki info.
• There are 2 different monitoring programs mentioned in the Gentoo wikis, but I have never been able to get them to work. Instead, I simply ran htop on the Helpers and watched as the distcc program executed on them.
• Distcc is potentially a security risk because it allows remote machines to execute programs on the Helper. You need to be careful to restrict which computers can use the Helper in the configuration file /etc/conf.d/distccd.
_________________
Casey Bralla
Chief Nerd in Residence
The NerdWorld Organisation
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54252
Location: 56N 3W

PostPosted: Sun Nov 01, 2020 10:03 am    Post subject: Reply with quote

Moved from Installing Gentoo to Documentation, Tips & Tricks.

Its one of these.

distcc just works, ever across architectures. Its key to have identical versions of gcc everywhere.
When jobs are distributed, distcc tells exactly what must be done but not with what compiler version.

The monitoring programs are run on the Target.
DISTCC_DIR must be defined, as that's where they look to see what disstcc is doing.

Not all phases of a build can be distributed, so lots of nothing going on is expected too.
e.g. the target has to do its own preprocessing and linking.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21637

PostPosted: Sun Nov 01, 2020 6:25 pm    Post subject: Re: Distcc How-To Reply with quote

Vorlon wrote:
• Distcc is potentially a security risk because it allows remote machines to execute programs on the Helper. You need to be careful to restrict which computers can use the Helper in the configuration file /etc/conf.d/distccd.
The helper can also be instructed to restrict which programs it will run. In the daemon's environment, set DISTCC_CMDLIST to a file which lists approved compilers, one per line. This may not be perfect if you assume that the approved compiler can be exploited with the right options, but it narrows the set of available programs considerably. This mechanism could also be used to enforce that only properly qualified compiler names can be used, so that a Target that requests gcc gets a failure, but a Target that requests x86_64-pc-linux-gnu-gcc succeeds. This is helpful for environments where the Target and Helper have different values for CHOST, and thus gcc means different things to each of them. You could go a step further and restrict the allowed program to a specific version of gcc, but that will likely require the Target to set CC/CXX explicitly, as few build systems would automatically use a version-qualified compiler. Many build systems can be readily encouraged to use a CHOST-qualified non-version-qualified compiler.
Back to top
View user's profile Send private message
Lemon-Lime
n00b
n00b


Joined: 27 Apr 2023
Posts: 54

PostPosted: Fri Sep 01, 2023 1:39 pm    Post subject: Reply with quote

NeddySeagoon wrote:
distcc just works, ever across architectures. Its key to have identical versions of gcc everywhere.


Will distcc work if emerged with different use flags on different machines?
Say for instance a machine has emerged the package with the "hardened" use flag and the other didn't.
Or for instance if the different machines have different useflags for gcc (even if the gcc version is the same).

Will it work properly?
_________________
Crazy frog is the artist, not the song
Back to top
View user's profile Send private message
pingtoo
l33t
l33t


Joined: 10 Sep 2021
Posts: 926
Location: Richmond Hill, Canada

PostPosted: Fri Sep 01, 2023 2:20 pm    Post subject: Reply with quote

Lemon-Lime wrote:
Will distcc work if emerged with different use flags on different machines?
Say for instance a machine has emerged the package with the "hardened" use flag and the other didn't.
Or for instance if the different machines have different useflags for gcc (even if the gcc version is the same).

Will it work properly?


May be answers to my question will help more easier to understand distcc environment.

My question, Can someone definitively define exactly what binaries need to be installed on the "Helper" in order for distcc to work as helper?

My current understand/guess is that the helper only need the gcc (right version off cause) and "as" the assembler. Taking "crossdev" as the way to build a "Tool Chain", crossdev will build "gcc", "binutils", "libc" and kernel header.

So is it necessary to run "crossdev" to build entire "Tool Chain" in order to make a distcc helper environment? this is question.

So in my mind nothing else but gcc (the package) and "as" is needed. And no setting (of any Portage USE flags, or CFLAGS) on Helper will influence the build results on the Target nodes. Am I right?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21637

PostPosted: Fri Sep 01, 2023 3:47 pm    Post subject: Reply with quote

Lemon-Lime wrote:
Will distcc work if emerged with different use flags on different machines?
That depends on the flags. Generally, you need distcc on the build machine to cause the remote machine to produce exactly the same object file that would have been produced locally. Therefore, flags that are not relevant to what is produced can be out of sync. Flags that can impact what is produced need to be synchronized.
Lemon-Lime wrote:
Say for instance a machine has emerged the package with the "hardened" use flag and the other didn't.
As I read the ebuild for distcc, USE=hardened enables a patch that probably ought to be enabled everywhere, although in practice it seems not to be needed on systems that use a non-hardened gcc. I think it could be safely enabled everywhere, since it just makes distcc more cautious about what it passes to the remote system.
Lemon-Lime wrote:
Or for instance if the different machines have different useflags for gcc (even if the gcc version is the same).

Will it work properly?
The goal is that you need to produce the same output. If the mismatched USE flags do not affect that goal, then they can be safely mismatched. Some flags, such as lto or pgo, ought to only impact how well the compiler performs, but not what output it produces. Those can be mismatched. Others may impact how it changes C/C++ source text into GNU as assembly. Those need to be matched.
Back to top
View user's profile Send private message
Lemon-Lime
n00b
n00b


Joined: 27 Apr 2023
Posts: 54

PostPosted: Fri Sep 01, 2023 9:11 pm    Post subject: Reply with quote

Noted Hu! Will give it a try and I'll update this post with my findings.
Thank you so much for your help!
_________________
Crazy frog is the artist, not the song
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum