well, it chokes on stuff that wont distribute (like gcc or qt stuff) with just 2 threads. Rather logical to choke with 8 threads.
what i did observe, and rather recent, is gcc. took a while to build it, and i did multiple builds on both b+ and and pi2. the one with arm7a, 4 cores and 1Gb ram. gcc has stuff like genattrtab genautomata that hog incredible amounts of ram (way above what b+ has) and get stuck for days if you throw some other threads on top of that. And since gcc doesn't work with distcc no point in having multiple threads. AGAIN, in builds like gcc.
in other stuff, i'll make a few tests with postfix.
First: -j8. compile times are around 3 secs in distcc log. some are bigger (4-5-6 secs), some are smaller. 1-2 secs. but most are around 3-4 secs. seen some very long ones. 17 secs is the biggest i seen. with -j8 cpu had almost 0 idleness according to the /proc/stat file.
total compile time:
real 20m44.202s
user 13m41.640s
sys 4m35.410s
Second -j2. compile times reduced significantly. most of them are under the half second mark. the biggest ones i seen are 1.5 sec. this method also didn't increase the idleness of the cpu according to /proc/stat.
total compile time:
real 19m51.747s
user 13m29.840s
sys 4m31.240s
the reason behind these numbers is very easy. when you have a smp machine, and 2 processes fight for cpu power, then they are moved in 2 different cores. in a unicore environment, the scheduler has to distribute ticks for each process. and when it does that, looses ticks by doing management of resources between them. so, theoretically, if you do 2 tasks in one core, it would take more time to finish them simultaneously rather than in a row. and the more tasks you add to that equation, the more resources you loose managing those multitasks in one core.
however, i suspect it makes sense to always have an extra task. so if you have 1 core, you should have 2 parallel tasks. the reason is that they almost never hog the cpu at the same time. while hogs cpu, the other one closes and prepares. and then they change between them. adding a third however makes 2 of them hog the cpu at the same time and time increases dramatically. even if you have 4 cores, you should have 5 tasks. not 8. that 5th one can change any of the other 4 and it's sufficient to accelerate things.
I understand why intuitively it makes sense to add more threads. Because you would think that PI has to wait for server to reply with compiled code, while the PI stands idle while it could send another package to compile to another server. In a single core environment (like with b+) this never happens unless for a very short time if it would have been build with -j1. But starting with -j2, one distcc cleans up, the other one starts. and it's starting to create 0 idle. this is when it works best. adding anything on top of this, will just waste speed. compression for instance is hogging cpu for no point at all. Again intuitively you think, I compress this data and there will be less data to transmit. but it takes more time to compress/uncompress that data than it would take to wait for it to transmit whole. and it takes more cpu cycles that you dont have doing that.
