Portage needs multiproc/thread ability. Enhancement proposal

Message

DualCpuUser · Post by **DualCpuUser** » Tue Nov 14, 2006 10:29 am

Hello all,

After watching my system plug away from TC, emerge -e system, -e world. It appears that many or most makefiles to not appear to really use "-jX" to much of any real value. Either the make files are to old, the program to short or complicated to redesign for MP, etc. the reason, I watched the build with 'top -d 1" and one of my two processors (yes dual core) is idle or mostly idle most of the time.

Emerge seems to have the ability to multi-task already by managing the use of Distcc if avail correct? Well if portage had an internal variant then non-dependent packages could be built in parallel, thus using our X-1 processors. Emerge builds a dependency graph already, the output of "-t" shows that.

Now how to manage the ebuilds for the edge cases that the make files actually work. I think the package maintainers should KNOW what the config/make files do of their respective ebuilds they are responsible for and how parallel the make processes is.

I propose the use of an ebuild or eclass keyword addition, say if most make files are single make. Then on the rare packages where -"jX" is taken advantage of then we include "PORTAGE_SERIALIZE_PKGBLD" set to 1 or something of that style. The Tool-Chain would have this set as it makes sense to do so I think, but from then on, Portage could start spitting out ebuilds per the depgraph.

I feel this is important as we really see more and more multi-cpu architectures in use by the general Gentoo population and along with that, we do build up from sources, whereas the majority of distro's do not. All means of speeding up our build times is or should be a mandate. You devs would see the largest gains here as you loop through builds more often than anyone I would think.

I thinking about it, since emerge has the ability to build/install packages from a left to right order on the command line w/o altering the sequence. Each dep-graph chain could be flattened to a spawn of emerges. This of a "-t" option flattened to top level (left most) series of builds, then that list is now a list of emerges.

One good advantage of having the emerge list.... flattened is if one 'set' failed to build wouldn't prevents others from continuing. Using the keyword above, if an ebuild contained it, the 1-X loop of spawning off the emerges/ebuilds would halt, till that process/compiland completed.

Now let the discussion commence, not FLAMES please, discussion.

Regards, Ron

jcat · Post by **jcat** » Tue Nov 14, 2006 11:50 am

Does GCC not have the ability to utilise multiple cpu's already? This would seem to be a basic requirement for Devs to speed up compile times.

Cheers,
jcat

DualCpuUser · Post by **DualCpuUser** » Tue Nov 14, 2006 12:00 pm

No there isn't much parallelism in the GCC process, its arch, is front end, back end, then externals asm and linking. Compiling is for the most part a linear process.

jcat · Post by **jcat** » Tue Nov 14, 2006 12:04 pm

This is probably a silly question, but how does DistGCC work then?

Cheers,
jcat

fangorn · Post by **fangorn** » Tue Nov 14, 2006 12:30 pm

Every bigger package is built of more than one module. These modules are all compiled seperately. This you can distribute between multiple machines.

But on topic.

If you flatten out each and every dependency tree, you will get a low number of trees in the beginning of an emerge process that rapidly grows when the basic packages (like xorg, gtk, qt, ... are built). I am visualizing a tree for an emerge -e world on my machines. *shudder*

How do you want to decide if you spawn another build process when reaching a point where you can split in two independent branches?

Nice idea, but I already see 20 build processes fighting for the cycles of 2 cpu cores.

jcat · Post by **jcat** » Tue Nov 14, 2006 12:48 pm

fangorn wrote:Every bigger package is built of more than one module. These modules are all compiled seperately. This you can distribute between multiple machines.

Ok, thanks for the info

Cheers,
jcat

DualCpuUser · Post by **DualCpuUser** » Tue Nov 14, 2006 2:27 pm

The tool chain is basically edge cased for this, as it would have to be build in order.

I don't know Portage internal design, but from flash reading of the DistCC website, it sits under make, ie: $CC=distcc. So if the make file sucks at "-j2" or doesn't even care for parallel comp. DistCC won't help. I'm not 100%. Again that's besides the point. I didn't see a large majority of parallel compilization going on. I had an idle proc. Yeah sometimes I would see cc1 listed twice, but it was very rare.

Now the idea here is to bump up Portage functionality a bit. It has per package locking, so can't build a package twice I believe, it knows about dependencies, it walks the list levels for ya to show you a dep-graph. So its emerging the lowest levels packages which most likely non-dependent of the other leaf nodes(req. packages) of a higher up one.

For make files that really are parallel then use the above mentioned example ENV:Var. Portage looks in the ebuild, finds it set, then sets a flag to suspend spawning off new ebuilds till the flag is clear.

This seems to be an easier design to add functionality, build time performance without requiring dev to validate and/or rewrite per-package makefiles.
-Ron

Mad Merlin · Post by **Mad Merlin** » Wed Nov 15, 2006 1:02 am

FYI: Parallel make is a feature of make, not a feature of the makefiles (the opposite of most programming languages). The makefiles that do not parallelize well are either pathological cases, or do not contain enough information to correctly parallelize and simply work by chance when run serially. Essentially, make already does *exactly* what you propose portage should do, except it does it on a "targets within an application" level instead of an "applications on a system" level.

I'm not saying that this is a bad idea (it's a good idea, but difficult to get right), just presenting facts about make.

someone19 · Post by **someone19** » Wed Nov 15, 2006 2:38 am

There's already a patch available in these forums that implements exactly this - do a search about 1-1.5 months ago. I believe the devs liked it and it'll be a FEATURE in an upcoming portage.

Shan · Post by **Shan** » Wed Nov 15, 2006 3:30 am

I hate to post a "Well it works for me" but quite simply...it does. Admittedly it didn't to begin with when I used the (apparently out of date) formula for figuring the -j value: #CPUS+1. I can't recall where I read it (I think the gentoo-wiki) but it seems as though the general concensus is now # CPU's (or cores as the case may be) * 2 +1. EG for my AMD64 X2 4800 my make.conf has -j5 and it seems as though both cores are thoroughly utilized during compilation.

My guess is that this "rule" only applies to dual core systems, and not true multi-cpu. That is, multiple-cores on one die opposed to multiple separate chips. Furthering speculation I would imagine this is due to a dual-core system having a much lower latency compared to multi-chip setups due to intercommunication being handled right on the die. Since there's less time spent "working" on communication, there's more free time for actual compilation, which means things get done quicker and the end result is still waiting for things to do unless you have enough queued up to go.

Now while none of this is what you're proposing, I personally think having more "makes" running for one program is better than having multiple program compilations going. Unlike CPU and RAM speeds, hard drive performance hasn't increased much over the last decade or so. To be sure capacity has skyrocketed, and new technologies exist that are indeed faster, but comparatively speaking your bottleneck is still disk I/O. With parallel compilations of a single package, you have the initial package decompression, configuring, and so on all the way through installation, but the main section (compilation) is largely done in memory (on systems with sufficient ram to allow). Unfortunately there is still sufficient enough disk I/O going on that if you were to toss in a second (third or even fourth) package being compiled any performance gains would likely be negated by the degraded disk I/O. I think, even at best, you would only be able to attain the same performance as a system with a properly set -j flag. Remember, the recomendations in make.conf aren't laws, merely guidelines for what will be sufficient for most people.

For a not quite apt analogy: one of the big negatives to KDE being turned into split packages over the monolithic versions of old was that for a complete install, you would have to decompress the tarballs for most "core" packages several times, once for each "minor" package needed to be installed. To be sure, to install all of KDE via the split packages is significantly slower than the monolithic versions because of all this extra mayhem.

I guess a more useful analogy would be in hauling weights across the room. Sure you could (possibly) move all 600 pounds (or kilos if you prefer) in one act, but you'll be really sluggish in doing it (and I'm sure you're doctor wouldn't reccomend it). Odds are you'll probably be far slower than if you'd done 10 sets of 60 pounds a piece because you'll tie yourself out quicker and might even hurt yourself if your body (hardware) is a bit flaky.

Two final things however: An offshoot of your idea would be a version of portage that continues a merge even when a package fails. For example, it peeves me to no end when I setup my machine to update overnight, and it bops out two or three packages in wasting a good nights compile. It would be incredibly lovely if portage could detect A) the failure B) continue on with the emerge command removing any packages that depended on the failed compile. Atleast that way SOME of the work would get done without a total waste.

Lastly: A lot of this post is based on speculation of what I know, what I think I know, and what I'm completely guessing about. I'm sure someone will come along and correct me, but this makes sense to me (as of present with my current level of understanding)

DualCpuUser · Post by **DualCpuUser** » Wed Nov 15, 2006 10:46 am

someone19 wrote:There's already a patch available in these forums that implements exactly this - do a search about 1-1.5 months ago. I believe the devs liked it and it'll be a FEATURE in an upcoming portage.

Sorry to be dense, maybe because its late, search on what? Since I didn't read anything of it. I don't recall what keywords t search for.

TY,
Ron

DualCpuUser · Post by **DualCpuUser** » Wed Nov 15, 2006 11:22 am

Shan wrote:I hate to post a "Well it works for me" but quite simply...it does. Admittedly it didn't to begin with when I used the (apparently out of date) formula for figuring the -j value: #CPUS+1. I can't recall where I read it (I think the gentoo-wiki) but it seems as though the general concensus is now # CPU's (or cores as the case may be) * 2 +1. EG for my AMD64 X2 4800 my make.conf has -j5 and it seems as though both cores are thoroughly utilized during compilation.

Yes I used -J5 I believe.

My guess is that this "rule" only applies to dual core systems, and not true multi-cpu. That is, multiple-cores on one die opposed to multiple separate chips. Furthering speculation I would imagine this is due to a dual-core system having a much lower latency compared to multi-chip setups due to intercommunication being handled right on the die. Since there's less time spent "working" on communication, there's more free time for actual compilation, which means things get done quicker and the end result is still waiting for things to do unless you have enough queued up to go.

Mult-core is the same (99+%) the same as multi-proc, most implentations are sharing the external bus (Intel), or connected to the crossbar switch in AMD. Core2Duo since it shares the L2, it can snoop and talk with the 2nd core w/o bus cycles basically on the pins. But before the dual cores were connected to the same IO pins as if they were seperate.

Now while none of this is what you're proposing, I personally think having more "makes" running for one program is better than having multiple program compilations going. Unlike CPU and RAM speeds, hard drive performance hasn't increased much over the last decade or so. To be sure capacity has skyrocketed, and new technologies exist that are indeed faster, but comparatively speaking your bottleneck is still disk I/O. With parallel compilations of a single package, you have the initial package decompression, configuring, and so on all the way through installation, but the main section (compilation) is largely done in memory (on systems with sufficient ram to allow). Unfortunately there is still sufficient enough disk I/O going on that if you were to toss in a second (third or even fourth) package being compiled any performance gains would likely be negated by the degraded disk I/O. I think, even at best, you would only be able to attain the same performance as a system with a properly set -j flag. Remember, the recomendations in make.conf aren't laws, merely guidelines for what will be sufficient for most people.

On say, my system 3.15Ghz Core2Duo 2GB box. 4MB cache, The system was barely paging with the above mentioned MakeOpt. But you remined me of something I was going to discus and forgot.
A backage build is more than compiling. And I will say ST here loosely as at a very fine grain, interrupt handling could be on a 2nd proc, but that doesn't chage the overall process. And yes you can have mutiple threads badly written, or an algorythm just didn't need to be multithreaded to create speed gains, but I digress.

1) Source download. (Single )
2) Source expansion (Single threaded) (Not to bad speed wise)
-- #3 can take a good portion of the time in a medium sized package.
3) Automake, Auto config, ./config... (again single threaded)
4) Compilization, (single threaded process)
5) Link step
6) Pkg compression.
7) Install.

Portage DB work.

Now with all these linear processes, that chain together to form a 'job'/'batch', They may not 'sit' on a proc, but once scheduled for the timeslice its on a proc, idling the other. DistCC speeds up step #4, yes on BIG compiles it can cut down on compiles, but the rest of the batch is still linear.

Which package was it that drives me nuts. GTK+? Making some files, the program xslt<someting> 100% cpu, single threaded and nothing else getting done for a great number of minutes, where as if the ebuilds where chuggling along on the other processor, things could be moving forward and please don't mention about some use flag, that's not the main point.

Package installation is a linear, date in, process, data out, into another process, munge that data, output result. By emerging X non-depedant builds together we can bring back some multi-tasking to this process.

Two final things however: An offshoot of your idea would be a version of portage that continues a merge even when a package fails. For example, it peeves me to no end when I setup my machine to update overnight, and it bops out two or three packages in wasting a good nights compile. It would be incredibly lovely if portage could detect A) the failure B) continue on with the emerge command removing any packages that depended on the failed compile. Atleast that way SOME of the work would get done without a total waste.

Yes I have been bitten by this too. Always seems to be 10 mins after you walked away from the monitor, no matter how long you sat there watching it.