Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[HOWTO] Testing Emerge with ccache, distcc and pump emerge
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
dwbowyer
Apprentice
Apprentice


Joined: 18 Apr 2008
Posts: 155

PostPosted: Sun Nov 27, 2011 6:24 pm    Post subject: [HOWTO] Testing Emerge with ccache, distcc and pump emerge Reply with quote

NOTE: I was going to post this over a year ago, but didn't (guessing due to user-space memory errors, IE I forgot to)
Short summary to what follows, I eventually opted to set up the faster computer to act as a build- and bin-host for the slower computer, which was not only easier, but had time savings of about 40% for a full world compile, and also avoided the issues some packages have with distcc and distcc-pump.

Since there have been a few questions about ccache, distcc, and distcc pump mode lately, I now post my research for those that may find it useful.
END NOTE



I have two computers in a home network.

The 1st: AMD64x2-2.41GHz, 2GB DDR2@800MHz ram
The 2nd: Pentium3-600MHz, 384MB DIMMs@(whatever) ram

I estimate the dual-core AMD to be about 7 times faster than the Pentium3.

I have the first set up as a firewall, router and host via NFS to
/home
/mnt/multimedia
/usr/portage
/usr/portage/packages/ which then has ./distfiles and ./bin-pent3
(the latter presently unused)

Pentium3 has 183 packages installed and a full "emerge -e world" would take 2 days, 23 hours, 12 minutes -- according to genlop. Previously, I've used a 512MB ccache. I know that's small (originally, I kept it small when installing since that computer has a 20GB harddrive.) and was considering increasing the size. I've also used distcc which seemed to give better results, even with default settings. I've also been debating setting up the AMD64 to serve as a build- and bin-host for the Pentium3.

Recently while doing routine maintenance and cleaning USE flags, etc -- I decided to tweak the build system and actually test out some of the options. I looked into the gentoo docs and forums for advice on ccache and distcc, and then went looking a bit further, including the use of distcc-pump

For those who may have similar home networks and may want to know, I've decided to post my results. First I began with a simple test, compiling one package while varying a handful of options.

The options being varied, are as follows:

On Pentium3, in /etc/make.conf
Code:

1: MAKE_OPTS="-jN"
2: EMERGE_DEFAULT_OPTS="--jobs=N --load-average=N.N"
3: FEATURES="ccache distcc userpriv usersandbox"
    The latter two I'll discuss later.
4: EMERGE_DEFAULT_OPTS="--buildpkg --usepkg"

On Pentium3, in /etc/distcc/hosts
# The first line tells the Pentium3 to use the AMD64 as a distcc helper
# The second tells Pentium3 the same, but using pump mode. More on that
# later. The /8 tells Pentium3 to send up to 8 jobs to the host, rather
# than the default 4 per host. In fact it seems I could set this higher.
Code:

192.168.0.1/8
#192.168.0.1/8,cpp,lzo

On AMD64 /etc/distcc/hosts
# A much faster computer can be slowed using a much slower computer as
# a helper, so AMD64 shouldn't include Pentium3.
Code:

localhost

On AMD64, in /etc/conf.d/distccd I changed
Code:

DISTCCD_OPTS="${DISTCCD_OPTS} --allow 192.168.0.0/24"
DISTCCD_OPTS="${DISTCCD_OPTS} --listen 192.168.0.1"
DISTCCD_OPTS="${DISTCCD_OPTS} -N 5"

I'll discuss the distccd option here, for those that don't know how distcc works, and may have found this thread searching for help on how to set up distcc. Skip until the break if you already know this.

Any computer on a network that should act as a host (helper) needs to run distccd, preferably started in the default run-level and be properly configured to work. The above three options are all all that need to be configured, on each host. Then simply add FEATURES="distcc"
to the clients.

The first option --allow is required with more recent distcc versions. It allows only the local network to make use of the distccd service provided by my server. Anything attempting to connect across my internet connetion on net.eth0 would be ignored.

The second option is optional, and tells distccd to listen only on the network interface on the local computer with the specific address. A separate --listen option should be provided for each interface that distcc will provide service to.

In my case 192.168.0.1 (net.eth1) connects directly to my second computer. If I had multiple subnets or network interfaces in this computer connected to other computers, distcc wouldn't provide service to, for instance, 192.168.1.1 or anything connected through 192.168.0.2 (net.eth2). That's the network connection for the windows laptop I refuse to touch.

The final option changes the nice value that the distccd service, and all subprocesses run with. I've lowered it to 5 from 15, since my AMD64 is a perfectly responsive desktop when compiling software with emerge niced at anywhere from 3 to 5. So I expect it to be fine compiling for another computer through distcc.



---------------------------------------------------------------------
The Package I chose for my tests was "media-video/mplayer-1.0_rc4_p20101114". I chose this for being a fair-sized package with multiple code types. C, some C++, assembly, and some awk work are involved.

First I decided to test how my faster computer compiled the package for itself, as a baseline. All results are from "genlop -t mplayer" in minutes and seconds.

AMD64
Code:

     | nothing | ccache |
  -j1|   6:35  |   2:13 |
  -j2|   4:04  |   1:29 |
  -j3|   3:58  |   1:28 |
  -j4|   4:00  |   1:28 |
  -j5|   4:02  |   1:29 |
  -j6|   4:02  |   1:28 |

No surprises there. A dual core processor should have one task each core, plus one waiting to stay fully busy. Even with all the C code pre-compiled and waiting in the cache, -j3 is best.

The next tests compile the same package for Pentium3. Here there were a few surprises.

Pentium3-600MHz-32bit
Code:

     | nothing | ccache*| distcc | distcc | distcc | distcc |
     |         |        |        |   &    |   &    |  pump  |
     |         |        |        | ccache |  pump  | ccache |
-------------------------------------------------------------
  -j2|  45.55  |  10.18 |    .   |   .    |  .     |   .    |
  -j3| >43.16 !| > 9.58 |    .   |   .    |  .     |   .    |
  -j4|  43.26  |  10.11 |    .   |   .    |  .     |   .    |
  -j5|  43.25  |  10.30 |  10.55 |>10.49**| 9.03   |   .    |
  -j6|  43.24  |  10.27 | >10.44 |>10.50**| 8.46   |   .    |
  -j7|  46.45  |    .   |  10.51 | 10.59**| 8.46   |   .    |
  -j8|    .    |    .   |  11.03 | 11.37**| 8.45  +|   .    |
  -j9|    .    |    .   |    .   |   .    | 8.35   |   .    |
 -j10|    .    |    .   |    .   |   .    |>8.34 ++|   .    |
 -j11|    .    |    .   |    .   |   .    | 8.39+++|   .    |


> Optimum Time
! Even without pre-compiled cache the single core Pentium3 shaved
2:39 off it's time using -j3 rather than -j2 in MAKE_OPTS. 94.2%.
* Of course, compiling the same package over and over again, ccache
got it's best possible result. A world rebuild is likely to not show such speedup, unless a 2GB+ ccache size is set.
** Note that distcc worked better without ccache also enabled

When running -jN with distcc only the Pentium3 showed system load of of 6.50 to 9.50, or N+1.50, at peak. However using pump mode, it was quite different. Not only was system load fairly flat, there were periods during the compile phase when the system was idle, although also brief periods during .configure and then at linking when tasks were waiting. On average:
+ load 3.40, @25% idle Pentium3; load 3.40 AMD64
++ load 3.60, @10-25% idle Pentium3; load 3.60 AMD64
+++ load 3.80, @10-20% idle Pentium3; load 3.70 AMD64

Most astounding of all, was that with Pentium3 doing the unpack, configure, yasm, awk, then linking, stripping the binaries, and docs but leaving the C and C++ preprocessing and compiling to AMD64 not only did the builds take less time than using ccache (with perfect cache hits), I was able to gain performance by increasing the number of make --jobs higher than I had expected either system could handle.


The obvious conclusions:
CCache
1. If you need to complile a package multiple times, such as for dev purposes, ccache is near best, I would assume on any computer. The exception is distcc with pump, which is not an officially supported tool under gentoo.

2. For update rather than rebuild, distcc would be faster than ccache. When packages are updating, rather than only being recompiled, ccache SHOULD fault on any sources files that have been changed, and without a cache hit emerge (make) would still need to compile, thus increasing the time to build.

3. Should, because sometmes it doesn't and can cause build failures. Some gentoo devs have been arguing not to use ccache for this very reason. This alone gives using distcc rather than ccache the edge. Although there are reports of distcc causing build failures too, which I'll be investigating in the next round of tests.

Distcc
4. Distcc posted excellent results with a single host, but worked better without ccache also in use. So it seems it should be either/or, not both.

5. Either distcc option, with multiple hosts would scale, giving a network with 3 or more computers better results with distcc than with ccache.

Distcc + Pump
6. Distcc with pump, bears further examination. With the slower computer unpacking the source, running configure, but allowing the host to pre-process and compile all C and C++ code (it still ran awk and yasm), and then linking the results, it took 86% of the time between best results of ccache and distcc + pump.

7. The 10-25% average system idle, fairly flat increase in load on the client and -again- the scalability of distcc to multiple hosts makes using "pump emerge PACKAGES" ripe for parallelizing 2 packages by using EMGERGE_DEFAULT_OPTS="--jobs=2 --load-average=N.N" in /etc/make.conf, which I'll experiment with in the next round of tests.

8. Given only the 2 computers I have, AMD64 is still at least 2x faster building the package (it would take 17.6% the time with ccache) than even the best result above. It would seem the ultimate time saving method would be to set up a full chroot and use the host system as a build- and bin-host, then install binary packages onto the Pentium3. Since this would require only 10GB of space, I may still do that. I could then strip out much of the gentoo base system (no need for build tools) for the client computer, as you would with building for an embedded or thin client system (that latter is also an option for me).


The next round of tests will be rebuilding the entire world set using

emerge -e world
1: ccache + CCACHE_SIZE="3G" + MAKE_OPTS="-j3" (2 times, first run wil take 3 days, ugh!)
2: distcc + MAKE_OPTS="-jN", N=5-7

pump emerge -e world
3: distcc + MAKE_OPTS="-jN", <<<<< may cut these test short if I
see the Pentium3 processor idling much
4: distcc + MAKE_OPTS="-jN", EMERGE_DEFAULT_OPTS="--jobs=2 --load-average=3.5"
Where N=8-10 for the first group and N=4-6 for parallelizing the ebuilds, but adjust N up or down if speed test results warrant more data points.


IF you like where this investigation is going, or have thoughts on varying the tests, please let me know. Please bear in mind the first build alone will day 3 days. I hope to be back with results comparing ccache to plain distcc in about a week. and with pump 2-3 days after that.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum