View previous topic :: View next topic |
Author |
Message |
metafarion n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 15 Mar 2012 Posts: 13 Location: Madison, WI
|
Posted: Sun Nov 12, 2023 9:24 pm Post subject: Could distcc be smarter about distributing the Load? |
|
|
My understanding of the MAKEOPTS --jobs and --load-average parameters is that they are intended to help control how much parallelization the make process attempts, and how much load is placed on the local system, respectively. However, if distcc is enabled in portage, then --load-average seems to have the annoying side effect of also limiting how much the local machine is willing to distribute to compile nodes. Effectively, setting --load-average tells make not to attempt any parallelization beyond what the local machine itself is willing to compile directly, which kinda defeats the purpose of using distcc at all for me.
For example, in my setup on my 4C/8T workstation where I'd like to keep the system responsive for work, MAKEOPTS="-j9 -l6" results in one or two parallel gcc processes locally, which is good, but almost no jobs EVER being sent to the distcc nodes, because -l6 never allows those jobs to start, even if they wouldn't be adding to local load. If I change my MAKEOPTS to -j9 -l12, then my distcc nodes are fully utilized.... but my local machine is also severely taxed because of the increased load limit.
Is there a way to for make to be made aware of what's being processed elsewhere so it can intelligently spawn compile jobs without overloading the local system? Is there a smarter way to handle this situation? |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
sublogic Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
![](images/avatars/92700334162390783722aa.png)
Joined: 21 Mar 2022 Posts: 226 Location: Pennsylvania, USA
|
Posted: Mon Nov 13, 2023 2:35 am Post subject: |
|
|
I don't think it's distcc; it's make. From the info manual: Quote: | You can use the '-l' option
to tell 'make' to limit the number of jobs to run at once, based on the
load average. The '-l' or '--max-load' option is followed by a
floating-point number. For example,
-l 2.5
will not let 'make' start more than one job if the load average is above 2.5. | (--max-load is a synonym for --load-average). So if the load limit is reached, make falls back to a serial build, until the load average comes back down --which takes time even if all the jobs have finished.
Maybe lower the --jobs and don't use --load-average at all ? Or at least increase it by 4 to 8X. From the uptime man page: Quote: | Load averages are not normalized for the number of CPUs in a system, so a load
average of 1 means a single CPU system is loaded all the time while on a 4
CPU system it means it was idle 75% of the time. | (The load is the number of jobs in the run queue and can exceed the number of processors.) |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54454 Location: 56N 3W
|
Posted: Mon Nov 13, 2023 10:05 am Post subject: |
|
|
metafarion,
The order of hosts in /etc/distcc/hosts matters.
localhost should be last, if it appears an all,
Each entry is hostname/jobs. jobs is optional and defaults to 4.
distcc allocates jobs in the order hosts appear here, moving on to subsequent helpers when earlier ones are busy.
localhost is always used to process any jobs that fail to build on helpers.
Oh, I think that the keyword random is allowed to shuffle job allocation among available helpers. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
wjb l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/175948696043ebbca04cf98.png)
Joined: 10 Jul 2005 Posts: 614 Location: Fife, Scotland
|
Posted: Mon Nov 13, 2023 3:09 pm Post subject: |
|
|
distcc has some other options that help control what the client does
--localslots_cpp limits the number of processes doing pre-processing, this defaults to 8.
--localslots limits the number of processes running jobs that cannot be run remotely. Set to 1 on my N030 because memory-wise it can't really cope with more than one big link/compile at a time. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
metafarion n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 15 Mar 2012 Posts: 13 Location: Madison, WI
|
Posted: Mon Nov 13, 2023 3:59 pm Post subject: |
|
|
NeddySeagoon wrote: | The order of hosts in /etc/distcc/hosts matters.
localhost should be last, if it appears an all,
|
Do you know if it's possible to omit localhost when using zeroconf? Trying to build anything with a hosts file that ONLY has +zeroconf in it seems to fail outright. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54454 Location: 56N 3W
|
Posted: Mon Nov 13, 2023 4:24 pm Post subject: |
|
|
metafarion,
I've never tried zeroconf. That comes under the heading of autoblackmagic, which is minimised or banned altogether here.
man distcc: |
+zeroconf
This option is only available if distcc was compiled with Avahi
support enabled at configure time. When this special entry is
present in the hosts list, distcc will use Avahi Zeroconf DNS
Service Discovery (DNS-SD) to locate any available distccd
servers on the local network. This avoids the need to explic‐
itly list the host names or IP addresses of the distcc server
machines. The distccd servers must have been started with the
"--zeroconf" option to distccd. An important caveat is that in
the current implementation, pump mode (",cpp") and compression
(",lzo") will never be used for hosts located via zeroconf. |
That's the limit of my knowledge of distcc and zeroconf. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
eccerr0r Watchman
![Watchman Watchman](/images/ranks/rank-G-2-watchman.gif)
Joined: 01 Jul 2004 Posts: 9714 Location: almost Mile High in the USA
|
Posted: Mon Nov 13, 2023 6:19 pm Post subject: |
|
|
It definitely depends on the package one is trying to build what the best options are, and sometimes it depends on the phase of the build - rust comes to mind.
I've found that dynamically futzing with the hosts file during builds sometimes helps. Except on single core machines, having localhost in hosts definitely helps, but sometimes having it earlier is better, later in others. If you have a job that has a lot of c++, adding it later or not at all helps. But if it's a lot of small C files, the latency of sending it over the network hurts and having localhost first is better.
It's a tough call when your machine is a multicore machine. I've seen distcc/make waste cores on localhost as it's not actively preprocessing so I had to make use of localhost so it at least is building something... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
szatox Advocate
![Advocate Advocate](/images/ranks/rank-G-1-advocate.gif)
Joined: 27 Aug 2013 Posts: 3203
|
Posted: Mon Nov 13, 2023 7:41 pm Post subject: |
|
|
metafarion wrote: |
Do you know if it's possible to omit localhost when using zeroconf? Trying to build anything with a hosts file that ONLY has +zeroconf in it seems to fail outright. |
I think I eventually got distcc to work with avahi. I wonder if you're running into the same problem I had back then: distcc listening on ipv4 when avahi exchanges ipv6 addresses.
https://forums.gentoo.org/viewtopic-t-1083326-highlight-distcc.html |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
metafarion n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 15 Mar 2012 Posts: 13 Location: Madison, WI
|
Posted: Tue Nov 14, 2023 5:44 am Post subject: |
|
|
Zeroconf totally works, but you can't specify the order of remote hosts if you use it. I like it because there are lots of times in my environment where the other compile nodes are off or asleep or not present for some other reason, and it provides some flexibility there. I tested a little more today, and I was remembering incorrectly that having ONLY +zeroconf in your distcc hosts causes the job to fail. What it actually does, at least for me, is throw up a couple errors like this:
Code: | distcc[150] (dcc_parse_hosts) Warning: /var/tmp/portage/.distcc/zeroconf/hosts contained no hosts; can't distribute work
distcc[150] (dcc_zeroconf_add_hosts) CRITICAL! failed to parse host file
distcc[150] (dcc_build_somewhere) Warning: failed to distribute, running locally instead |
Contrary to these warnings, after they repeat once or twice, they disappear and jobs do indeed begin distributing as you'd expect. So yes, having localhost omitted or listed after the other hosts, zeroconf'd or otherwise, is a somewhat blunt measure one can take to reduce the load on a workstation emerging packages.
You can see what I'm after in the broad strokes though: A mechanism by which the compile jobs can be distributed to the available hosts, AND the local machine participates in a way that is elastically responsive to its current workload. --load-average sounded like a handy way to do that, but it seems like it can't really because it's an option for make. Make isn't even aware that distcc is in play, so setting it at a level I think is appropriate for my workstation prevents any distribution from occurring.
Maybe the next best option is to just limit the local machine to one or two threads via distcc settings, though that's under-utilizing it in certain cases. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Hu Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
Joined: 06 Mar 2007 Posts: 21920
|
Posted: Tue Nov 14, 2023 3:48 pm Post subject: |
|
|
Broadly, this is a problem with the model of replacing gcc with distcc gcc. The former is expensive on the workstation, while the latter can be cheap if it distributes successfully. However, the build tool has no insight into which will be the case for any given run, so it cannot intelligently adjust the job count up or down as needed. Even worse, most builds that can be distributed contain some steps which cannot be distributed, and make has no way to detect which will be distributed versus not. Ideally, there would be a --jobs=dynamic that would cause make (and every other make-like tool, of which there are many) to detect how many local and remote nodes it has, and to schedule jobs according to whether the job always runs locally or whether it can be distributed. Unfortunately, this requires a tighter integration than exists now, and in practice would require hundreds of projects to rework their build scripts (Makefile or equivalent) to communicate to make about how each line in the recipe will distribute (or not). Rust is a good example of where the current setup fails, since it has some C++ code that can be distributed, and some Rust code which is always local - but they are all launched out of a single package. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
metafarion n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 15 Mar 2012 Posts: 13 Location: Madison, WI
|
Posted: Mon May 27, 2024 2:32 pm Post subject: |
|
|
Months later, after gaining a few levels in bash scripting, I have the beginning of an idea to handle this, at least in an experimental and kludgey way: A daemon to dynamically alter MAKEOPTS and the contents of /etc/distcc/hosts and /etc/conf.d/distcc (and maybe other things) based on current system load.
I see the logic behind this being something like "Check the total CPU% load for processes with a niceness less than 1. For every X%, reduce the number of compile jobs by Y and the number of jobs distcc will accept by Z from their base values." Maybe we shuffle around if or where localhost appears in /etc/distcc/hosts at some point. It'd need tuning to find good values. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
eccerr0r Watchman
![Watchman Watchman](/images/ranks/rank-G-2-watchman.gif)
Joined: 01 Jul 2004 Posts: 9714 Location: almost Mile High in the USA
|
Posted: Mon May 27, 2024 2:40 pm Post subject: |
|
|
How are you changing the MAKEOPTS dynamically? Seems that once make/ninja starts, it's stuck with those values until it finishes. It would be nice if they could be changed.
However I did end up finding a possible thing that can be changed externally - /etc/distcc/hosts . I've manually futzed with the file, generally removing localhost when I see that rust is taking over, but this still does not prevent rust from taking 20 job slots, but it does let rust run on the localhost for firefox builds and sends off all the c++ to other machines. Else for other builds, running tasks on localhost is faster than sending jobs off to another machine and getting it back as network latency is not 0. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|