Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Portage & Programming
  • Search

practicability of distcc?

Problems with emerge or ebuilds? Have a basic programming question about C, PHP, Perl, BASH or something else?
Post Reply
Advanced search
17 posts • Page 1 of 1
Author
Message
crappuccino
n00b
n00b
Posts: 15
Joined: Fri Mar 03, 2006 3:43 pm

practicability of distcc?

  • Quote

Post by crappuccino » Sat Jul 01, 2017 11:44 am

so i have a spare computer that i wanted to run distccd on in hopes of bringing down chromium compile times from 9hrs to 4.5 (that was the original idea anyway).
it took a while to get it running and i am severely underwhelmed by the results, am i doing it wrong or did i just expect too much?

so first of all, is there a way to force distcc to accept march=native? the two machines have almost identical cpus (e8400 and e8500 wolfdales, both e0 stepping) and i'd really like to avoid the argument mess.

secondly, i have noticed compile errors whenever the secondary machine was missing a library. since it's a barebone distcc server, it happens rather frequently. would i need to install everything on the secondary macine also? because that would defeat the purpose of distributing the compilation when i have to do it twice.

and lastly, when i compile something that's present on both machines (systemd or some such), the secondary machine only uses about 10% cpu on average, apparently because the overhead from passing hundreds of individually small files is vastly higher than actual cpu time.

so did i do something retarded or is it just not worth it?
Top
szatox
Advocate
Advocate
Posts: 3858
Joined: Tue Aug 27, 2013 12:35 pm

Re: practicability of distcc?

  • Quote

Post by szatox » Sat Jul 01, 2017 12:52 pm

crappuccino wrote: so first of all, is there a way to force distcc to accept march=native? the two machines have almost identical cpus (e8400 and e8500 wolfdales, both e0 stepping) and i'd really like to avoid the argument mess.
Nope. It was possible in old versions and gave quite a few people some headaches.
secondly, i have noticed compile errors whenever the secondary machine was missing a library. since it's a barebone distcc server, it happens rather frequently. would i need to install everything on the secondary macine also? because that would defeat the purpose of distributing the compilation when i have to do it twice.

and lastly, when i compile something that's present on both machines (systemd or some such), the secondary machine only uses about 10% cpu on average, apparently because the overhead from passing hundreds of individually small files is vastly higher than actual cpu time.

so did i do something retarded or is it just not worth it?
Use pump mode. You can either enable it in configs (add distcc to emerge's FEATURES) or invoke pump emerge <something>
Also, bump the total number of parallel operation by a factor of 3 (you can use makeopts for parallel compilation within a single package and emerge_default_opts or something like that for building multiple packages).
And take into consideration that some packages explicitly disable distcc because reasons. Usually those reasons are related to known bugs.
Top
crappuccino
n00b
n00b
Posts: 15
Joined: Fri Mar 03, 2006 3:43 pm

Re: practicability of distcc?

  • Quote

Post by crappuccino » Sat Jul 01, 2017 1:35 pm

szatox wrote:Use pump mode.
i have to admit that i have no idea what "pump mode" is/does, but i have these in my make.conf:

Code: Select all

MAKEOPTS="-j9 -l2"
FEATURES="distcc distcc-pump"
what's missing?
Top
szatox
Advocate
Advocate
Posts: 3858
Joined: Tue Aug 27, 2013 12:35 pm

  • Quote

Post by szatox » Sat Jul 01, 2017 2:49 pm

Looks fine. Almost.

Code: Select all

MAKEOPTS="-j9 -l2" 
Why would you limit load to 2? Do you only have a single core there? Make it slightly bigger than number of cores you have in this machine. It's a good starting point, you can try to fine tune it later.
Building more packages at the same time would help too. Some packages don't like parallel make and disable it.

What does `distcc-config --get-hosts` say? Does your distcc know where to send the jobs and how to do that?
Top
crappuccino
n00b
n00b
Posts: 15
Joined: Fri Mar 03, 2006 3:43 pm

  • Quote

Post by crappuccino » Sat Jul 01, 2017 3:20 pm

according to the wiki, -l2 means 2 local cores so it doesn't distribute work unless there are more than 2 possible jobs (which should usually be the case with thousands of individual source files per package).
szatox wrote:What does `distcc-config --get-hosts` say?

Code: Select all

192.168.1.4,cpp,lzo
distcc knows about the other computer and it does distribute work (hence the 10% cpu usage there), but only if all necessary libraries are present (rarely happens); and when it does, speed seems to be limited by passing the jobs rather than actually compiling them.

edit:
for example, if i understand correctly, the chromium package consists of roughly 27000 individual jobs and compiles for 9 hours on a single machine, meaning about 1 second per file.
what i'd expect distcc to say is "here, have these 13500 files and the necessary .h's, compile them with these settings and give them back". instead it seems to go "here's a .c, try to compile that for me, if it fails i'll do it myself", for every single individual file. and that process takes longer than the one second to just compile it locally.
Top
szatox
Advocate
Advocate
Posts: 3858
Joined: Tue Aug 27, 2013 12:35 pm

  • Quote

Post by szatox » Sat Jul 01, 2017 4:18 pm

Well, this is roughly what it's supposed to do.

Code: Select all

man make:
       -l [load], --load-average[=load]
            Specifies that no new jobs (commands) should be started  if  there
            are  others  jobs running and the load average is at least load (a
            floating-point number).  With no argument, removes a previous load
            limit.
What wiki was that? You're passing that -l to make, which obviously is not what you want in this case. Increase this number.

Add information about the number of tasks distcc should delegate to your slave. At least twice as much as number of cores there. Maybe 3 times as much. Something along the lines: 192.168.1.4/16,cpp,lzo
You can add your localhost too, to increase total number of cores in use (I'd start with a limit equal to number of local cores - 2). Without that you will compile locally only after remote compilation fails.
Try removing one of distcc features from portage config, they could conflict with each other in some funny way. And you can try launching it as `pump emerge` instead of plain `emerge`. Should be redundant if you use distcc-pump feature, but it's been a long time since I played with distcc for the last time, so it'd try it anyway.
Pump mode is preferable to regular distcc because it delegates more work to the slaves and includes extra data, so the slaves don't need their own copy of all the headers and stuff. You still want to keep the compiler's version in sync to avoid unexplainable runtime errors.
Top
Hu
Administrator
Administrator
Posts: 24403
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Sat Jul 01, 2017 4:29 pm

-l2, short for --load-average 2, is a request to GNU make (and possibly other build systems that understand MAKEOPTS). It is not about number of local cores (present, idle, or otherwise). It is about what the local machine's kernel reports as the load average of the system. If your local load average gets above 2, Make will not generate extra jobs, even if the --jobs parameter would allow it. Load average is roughly correlated with how busy your system is, but there are ways for it not to be well-aligned with what you want. If there is a Wiki page which contradicts this, please link to it.

distcc is interposed in front of gcc, as a way of getting a compiled object without running the compiler locally. This means that, even in a perfectly compatible build system, the build system spins off one distcc process locally for each source file to be distributed. This adds notable overhead relative to your hyptothetical model where half the sources are distributed off as a batch.

distcc in historical (non-pump) mode distributed only the compilation phase. It did not distribute preprocessing, nor linking. As I understand it, even pump mode does not distribute linking. Based on my understanding, the presence of absence of libraries on the volunteers is irrelevant for both non-pump and pump modes. If you have specific output that contradicts this, please share it. In pump mode (and only pump mode), the presence of matching system headers on the volunteers is required (as documented in man distcc section HOW DISTCC-PUMP MODE WORKS).

As for -march=native, as szatox said, that caused more problems than it solved. Pick a -march suitable for the machine which will run the code and use that explicitly. gcc can tell you the exact meaning of -march on your hardware if you want to look up and hardcode an equivalent (and distcc-compatible) list of options. You only need to do this once per CPU upgrade, so the burden is small (and might make for a tiny (but likely immeasurable) improvement in build speed).
Top
crappuccino
n00b
n00b
Posts: 15
Joined: Fri Mar 03, 2006 3:43 pm

  • Quote

Post by crappuccino » Sat Jul 01, 2017 4:55 pm

the -l2 setting is proposed on the distcc page of the gentoo wiki:
https://wiki.gentoo.org/wiki/Distcc#With_Portage

i have removed load limit altogether and increased total job number to 12 (i don't want to get too crazy here since my computers only have 4gb ram each). so far so good, i'm getting higher usage on the secondary machine now.

the reason why i'm talking about libraries is this error message that i got when i tried to distcc thunar:

Code: Select all

In file included from thunarx-renamer.c:27:0:
../thunarx/thunarx-renamer.h:28:21: fatal error: gtk/gtk.h: No such file or directory
 #include <gtk/gtk.h>
                     ^
compilation terminated.
i've seen similar things in other packages.

last question though: do i need to add localhost to the known workers (and start a local distccd) to use local cpus or is this always implied?
Top
patrix_neo
Guru
Guru
User avatar
Posts: 520
Joined: Thu Jan 08, 2004 1:59 pm
Location: The Maldives

  • Quote

Post by patrix_neo » Sat Jul 01, 2017 5:18 pm

If you could make distcc distribute the headers as well as the necessary code, I think it would work. As of 2006 I stoped using distcc, because of all the culprits it involves.

Mirrored systems, it is a case using it.
Top
Hu
Administrator
Administrator
Posts: 24403
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Sat Jul 01, 2017 5:30 pm

That is a problem with a system header, not a library. As cautioned in the man page I mentioned above, you need system headers to match if you want to use pump mode. Install the headers on the volunteer or disable pump mode. Non-pump mode does not require matching system headers on the volunteers, because it runs all preprocessing locally.

As I read the section HOST SPECIFICATIONS, you need to tell distcc to use localhost, but you do not need to run a distccd locally. It will run the local workload directly when it picks localhost as the worker.
Top
Ant P.
Watchman
Watchman
Posts: 6920
Joined: Sat Apr 18, 2009 7:18 pm
Contact:
Contact Ant P.
Website

  • Quote

Post by Ant P. » Sat Jul 01, 2017 6:54 pm

Instead of -march=native, use what this gives you:

Code: Select all

echo $(
    gcc -v -march=native -x c /dev/null 2>&1 \
        | fgrep -- '-march' \
        | egrep -o ' (-m|--param )\S+' \
        | fgrep -v -- '-mno-'
)
Top
patrix_neo
Guru
Guru
User avatar
Posts: 520
Joined: Thu Jan 08, 2004 1:59 pm
Location: The Maldives

  • Quote

Post by patrix_neo » Sat Jul 01, 2017 6:58 pm

Hu wrote:That is a problem with a system header, not a library. As cautioned in the man page I mentioned above, you need system headers to match if you want to use pump mode. Install the headers on the volunteer or disable pump mode. Non-pump mode does not require matching system headers on the volunteers, because it runs all preprocessing locally.

As I read the section HOST SPECIFICATIONS, you need to tell distcc to use localhost, but you do not need to run a distccd locally. It will run the local workload directly when it picks localhost as the worker.
I did not say anything about you being wrong, I just fleshed things out, or so I thought. You seems to have way better know-how than I will, like, ever have (smiley here).
Top
crappuccino
n00b
n00b
Posts: 15
Joined: Fri Mar 03, 2006 3:43 pm

  • Quote

Post by crappuccino » Sat Jul 01, 2017 7:47 pm

alright i got it mostly working i guess, better for some packages than others (i had to completely disable distcc for llvm). chromium happily uses 100% on both machines though.
is there a cheap and easy way to provide the secondary machine with the necessary headers to do pump for all packages?
Ant P. wrote:Instead of -march=native, use what this gives you:

Code: Select all

echo $(
    gcc -v -march=native -x c /dev/null 2>&1 \
        | fgrep -- '-march' \
        | egrep -o ' (-m|--param )\S+' \
        | fgrep -v -- '-mno-'
)
yeah i did that and got 15 arguments or so, had hoped i could use native since they are the same for both cpus.
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56104
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

  • Quote

Post by NeddySeagoon » Sat Jul 01, 2017 8:32 pm

crappuccino,

distcc is more generally useful than you need. -march=native used to work but the result in a bit of a mess when distcc invokes the native gcc on a helper instead of a cross compiler.

Speed gains vary. Only compiling and optionally, preprocessing is distributed. The host has to do everything else.
To see what's going on run distccmon on the system doing the distributing.

Code: Select all

DISTCC_DIR="/var/tmp/portage/.distcc/" distccmon-text 5
will print a snapshot of whats being built where every 5 sec.
Its quite normal to see nothing from time to time.

distcc only distributes C and C++
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
szatox
Advocate
Advocate
Posts: 3858
Joined: Tue Aug 27, 2013 12:35 pm

  • Quote

Post by szatox » Sat Jul 01, 2017 8:56 pm

I'm gonna bring distcc back with multiple machines running from a single disk image. Have tested stuff like that before, it is simple and low maintenance as long as you can keep that image read-only. Ever wanted to roll your own liveCD? Now you have a valid reason :lol:
I'll most likely virtualise it for convenience, though PXE is also a vailable option if you have spare hardware. In this case "spare" means "you would not use it otherwise".

Another way, easier to start with but less strict (read: more error prone and requiring some attention) would involve a shared portage tree and synchronized updates. E.g. make sure your package masks/keywords on both machines match, sync portage, and then update both machines before another sync.
Building binary packages helps to avoid redundant compilations.
Top
Hu
Administrator
Administrator
Posts: 24403
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Sat Jul 01, 2017 9:21 pm

patrix_neo wrote:
Hu wrote:...
I did not say anything about you being wrong, I just fleshed things out, or so I thought. You seems to have way better know-how than I will, like, ever have (smiley here).
I was responding to the post above you. I don't remember if your post was present when I began composing, but I remember I was responding specifically to the code block because it showed a compiler error with headers, but OP kept referring to problems caused by missing libraries. :)
Top
darklegion
Guru
Guru
Posts: 468
Joined: Sun Nov 14, 2004 1:47 am

  • Quote

Post by darklegion » Tue Jul 04, 2017 1:41 am

A post mentioned this earlier, but to spell it out explicitly, consider configuring this to build multiple packages at once:

Code: Select all

EMERGE_DEFAULT_OPTS="--jobs=n --load-average=n"
(you can omit the load-average part if you like. I like to use it with machines that are prone to overheating)
It won't help you with compiling just chromium alone, but in cases where you are compiling a whole system, having multiple programs compiled at once helps fill the gaps when other compiles are being configured/installing/etc. I'd start with something like --jobs=4. You could also try using --jobs without an argument, and then set --load-average to something reasonable; I'm not sure which is the better approach.
Top
Post Reply

17 posts • Page 1 of 1

Return to “Portage & Programming”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic