Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Could distcc be smarter about distributing the Load?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
metafarion
n00b
n00b


Joined: 15 Mar 2012
Posts: 13
Location: Madison, WI

PostPosted: Sun Nov 12, 2023 9:24 pm    Post subject: Could distcc be smarter about distributing the Load? Reply with quote

My understanding of the MAKEOPTS --jobs and --load-average parameters is that they are intended to help control how much parallelization the make process attempts, and how much load is placed on the local system, respectively. However, if distcc is enabled in portage, then --load-average seems to have the annoying side effect of also limiting how much the local machine is willing to distribute to compile nodes. Effectively, setting --load-average tells make not to attempt any parallelization beyond what the local machine itself is willing to compile directly, which kinda defeats the purpose of using distcc at all for me.

For example, in my setup on my 4C/8T workstation where I'd like to keep the system responsive for work, MAKEOPTS="-j9 -l6" results in one or two parallel gcc processes locally, which is good, but almost no jobs EVER being sent to the distcc nodes, because -l6 never allows those jobs to start, even if they wouldn't be adding to local load. If I change my MAKEOPTS to -j9 -l12, then my distcc nodes are fully utilized.... but my local machine is also severely taxed because of the increased load limit.

Is there a way to for make to be made aware of what's being processed elsewhere so it can intelligently spawn compile jobs without overloading the local system? Is there a smarter way to handle this situation?
Back to top
View user's profile Send private message
sublogic
Apprentice
Apprentice


Joined: 21 Mar 2022
Posts: 226
Location: Pennsylvania, USA

PostPosted: Mon Nov 13, 2023 2:35 am    Post subject: Reply with quote

I don't think it's distcc; it's make. From the info manual:
Quote:
You can use the '-l' option
to tell 'make' to limit the number of jobs to run at once, based on the
load average. The '-l' or '--max-load' option is followed by a
floating-point number. For example,

-l 2.5

will not let 'make' start more than one job if the load average is above 2.5.
(--max-load is a synonym for --load-average). So if the load limit is reached, make falls back to a serial build, until the load average comes back down --which takes time even if all the jobs have finished.

Maybe lower the --jobs and don't use --load-average at all ? Or at least increase it by 4 to 8X. From the uptime man page:
Quote:
Load averages are not normalized for the number of CPUs in a system, so a load
average of 1 means a single CPU system is loaded all the time while on a 4
CPU system it means it was idle 75% of the time.
(The load is the number of jobs in the run queue and can exceed the number of processors.)
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54454
Location: 56N 3W

PostPosted: Mon Nov 13, 2023 10:05 am    Post subject: Reply with quote

metafarion,

The order of hosts in /etc/distcc/hosts matters.
localhost should be last, if it appears an all,

Each entry is hostname/jobs. jobs is optional and defaults to 4.
distcc allocates jobs in the order hosts appear here, moving on to subsequent helpers when earlier ones are busy.

localhost is always used to process any jobs that fail to build on helpers.

Oh, I think that the keyword random is allowed to shuffle job allocation among available helpers.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
wjb
l33t
l33t


Joined: 10 Jul 2005
Posts: 614
Location: Fife, Scotland

PostPosted: Mon Nov 13, 2023 3:09 pm    Post subject: Reply with quote

distcc has some other options that help control what the client does

--localslots_cpp limits the number of processes doing pre-processing, this defaults to 8.

--localslots limits the number of processes running jobs that cannot be run remotely. Set to 1 on my N030 because memory-wise it can't really cope with more than one big link/compile at a time.
Back to top
View user's profile Send private message
metafarion
n00b
n00b


Joined: 15 Mar 2012
Posts: 13
Location: Madison, WI

PostPosted: Mon Nov 13, 2023 3:59 pm    Post subject: Reply with quote

NeddySeagoon wrote:
The order of hosts in /etc/distcc/hosts matters.
localhost should be last, if it appears an all,


Do you know if it's possible to omit localhost when using zeroconf? Trying to build anything with a hosts file that ONLY has +zeroconf in it seems to fail outright.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54454
Location: 56N 3W

PostPosted: Mon Nov 13, 2023 4:24 pm    Post subject: Reply with quote

metafarion,

I've never tried zeroconf. That comes under the heading of autoblackmagic, which is minimised or banned altogether here.

man distcc:

       +zeroconf
              This  option is only available if distcc was compiled with Avahi
              support enabled at configure time.  When this special  entry  is
              present  in  the  hosts list, distcc will use Avahi Zeroconf DNS
              Service Discovery  (DNS-SD)  to  locate  any  available  distccd
              servers  on  the local network.  This avoids the need to explic‐
              itly list the host names or IP addresses of  the  distcc  server
              machines.   The  distccd servers must have been started with the
              "--zeroconf" option to distccd.  An important caveat is that  in
              the  current  implementation, pump mode (",cpp") and compression
              (",lzo") will never be used for hosts located via zeroconf.


That's the limit of my knowledge of distcc and zeroconf.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9714
Location: almost Mile High in the USA

PostPosted: Mon Nov 13, 2023 6:19 pm    Post subject: Reply with quote

It definitely depends on the package one is trying to build what the best options are, and sometimes it depends on the phase of the build - rust comes to mind.

I've found that dynamically futzing with the hosts file during builds sometimes helps. Except on single core machines, having localhost in hosts definitely helps, but sometimes having it earlier is better, later in others. If you have a job that has a lot of c++, adding it later or not at all helps. But if it's a lot of small C files, the latency of sending it over the network hurts and having localhost first is better.

It's a tough call when your machine is a multicore machine. I've seen distcc/make waste cores on localhost as it's not actively preprocessing so I had to make use of localhost so it at least is building something...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3203

PostPosted: Mon Nov 13, 2023 7:41 pm    Post subject: Reply with quote

metafarion wrote:

Do you know if it's possible to omit localhost when using zeroconf? Trying to build anything with a hosts file that ONLY has +zeroconf in it seems to fail outright.


I think I eventually got distcc to work with avahi. I wonder if you're running into the same problem I had back then: distcc listening on ipv4 when avahi exchanges ipv6 addresses.
https://forums.gentoo.org/viewtopic-t-1083326-highlight-distcc.html
Back to top
View user's profile Send private message
metafarion
n00b
n00b


Joined: 15 Mar 2012
Posts: 13
Location: Madison, WI

PostPosted: Tue Nov 14, 2023 5:44 am    Post subject: Reply with quote

Zeroconf totally works, but you can't specify the order of remote hosts if you use it. I like it because there are lots of times in my environment where the other compile nodes are off or asleep or not present for some other reason, and it provides some flexibility there. I tested a little more today, and I was remembering incorrectly that having ONLY +zeroconf in your distcc hosts causes the job to fail. What it actually does, at least for me, is throw up a couple errors like this:
Code:
distcc[150] (dcc_parse_hosts) Warning: /var/tmp/portage/.distcc/zeroconf/hosts contained no hosts; can't distribute work
distcc[150] (dcc_zeroconf_add_hosts) CRITICAL! failed to parse host file
distcc[150] (dcc_build_somewhere) Warning: failed to distribute, running locally instead

Contrary to these warnings, after they repeat once or twice, they disappear and jobs do indeed begin distributing as you'd expect. So yes, having localhost omitted or listed after the other hosts, zeroconf'd or otherwise, is a somewhat blunt measure one can take to reduce the load on a workstation emerging packages.

You can see what I'm after in the broad strokes though: A mechanism by which the compile jobs can be distributed to the available hosts, AND the local machine participates in a way that is elastically responsive to its current workload. --load-average sounded like a handy way to do that, but it seems like it can't really because it's an option for make. Make isn't even aware that distcc is in play, so setting it at a level I think is appropriate for my workstation prevents any distribution from occurring.

Maybe the next best option is to just limit the local machine to one or two threads via distcc settings, though that's under-utilizing it in certain cases.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 21920

PostPosted: Tue Nov 14, 2023 3:48 pm    Post subject: Reply with quote

Broadly, this is a problem with the model of replacing gcc with distcc gcc. The former is expensive on the workstation, while the latter can be cheap if it distributes successfully. However, the build tool has no insight into which will be the case for any given run, so it cannot intelligently adjust the job count up or down as needed. Even worse, most builds that can be distributed contain some steps which cannot be distributed, and make has no way to detect which will be distributed versus not. Ideally, there would be a --jobs=dynamic that would cause make (and every other make-like tool, of which there are many) to detect how many local and remote nodes it has, and to schedule jobs according to whether the job always runs locally or whether it can be distributed. Unfortunately, this requires a tighter integration than exists now, and in practice would require hundreds of projects to rework their build scripts (Makefile or equivalent) to communicate to make about how each line in the recipe will distribute (or not). Rust is a good example of where the current setup fails, since it has some C++ code that can be distributed, and some Rust code which is always local - but they are all launched out of a single package.
Back to top
View user's profile Send private message
metafarion
n00b
n00b


Joined: 15 Mar 2012
Posts: 13
Location: Madison, WI

PostPosted: Mon May 27, 2024 2:32 pm    Post subject: Reply with quote

Months later, after gaining a few levels in bash scripting, I have the beginning of an idea to handle this, at least in an experimental and kludgey way: A daemon to dynamically alter MAKEOPTS and the contents of /etc/distcc/hosts and /etc/conf.d/distcc (and maybe other things) based on current system load.

I see the logic behind this being something like "Check the total CPU% load for processes with a niceness less than 1. For every X%, reduce the number of compile jobs by Y and the number of jobs distcc will accept by Z from their base values." Maybe we shuffle around if or where localhost appears in /etc/distcc/hosts at some point. It'd need tuning to find good values.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9714
Location: almost Mile High in the USA

PostPosted: Mon May 27, 2024 2:40 pm    Post subject: Reply with quote

How are you changing the MAKEOPTS dynamically? Seems that once make/ninja starts, it's stuck with those values until it finishes. It would be nice if they could be changed.

However I did end up finding a possible thing that can be changed externally - /etc/distcc/hosts . I've manually futzed with the file, generally removing localhost when I see that rust is taking over, but this still does not prevent rust from taking 20 job slots, but it does let rust run on the localhost for firefox builds and sends off all the c++ to other machines. Else for other builds, running tasks on localhost is faster than sending jobs off to another machine and getting it back as network latency is not 0.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum