Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
VirtualBox: Running skinny VMs on Windows for distcc service
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Sat Mar 10, 2018 5:27 pm    Post subject: VirtualBox: Running skinny VMs on Windows for distcc service Reply with quote

One of the best things about the Gentoo distribution is that it’s 100% source and highly customizable.
One of the worst things about the Gentoo distribution is that it’s 100% source and you have to compile everything.

So how do you throw more CPU cores at the emerge compilation?

With Virtualbox, build a really skinny distcc VM, and run them headless on Windows machines.

A VM that is idle consumes a little bit of memory (about 40 MB – your mileage may vary) and virtually no CPU cycles yet is at the ready to perform work when called on.

In my home network, I have 4 physical 64-bit dual core Gentoo machines running distcc as well as 3 64-bit dual core Windows machines running the skinny distcc host VMs for a total 14 CPU cores in the distcc network. So my /etc/portage/make.conf contains DISTCC_HOSTS that lists them all, and MAKEOPTS has -j 24 -l14. The -j is calculated by (N CPUs – 2) * 2. The -l value is the load limiter, so if you distcc VM hits a 14 load, it won’t be sent any more distcc jobs until that load comes down (at least this is my understanding – post a correction if I’m wrong, I’d welcome it).

The first task is to build the first distcc VM (you only build it once, and then copy the VM to your Windows hosts). This is just like you’d build out a regular Gentoo machine, following the Gentoo hand book. There’s virtually no need for a high number of kernel modules, just enough to talk to the devices that Virtualbox presents to the VM (it should be very similar to Optimizing the kernel for VMware). Usually allocating 2048 MB to the VM, along with swap space, is sufficient to run the VM and distcc. I configure a 200 GB VDI for the VMs hard drive. VDI files grow as needed, and when all said and done is a reasonable 10 GB VDI file (your mileage may vary). The VM’s build culminates with emerging the distcc and ccache packages.

Once the VM is completed and tested to accepting distcc requests from other Gentoo machines on the network, shut it down, and copy the DVI file to all the Windows machines that’ll be hosts for the VM.

On each of the Windows machines configure the VM, changing the IP address and system names, so there are a few files that need revision: /etc/conf.d/net, /etc/conf.d/hostname for sure (might be others if you have the VMs offering other services).

Configuring the VMs on the Windows machines to start up on boot is left to the reader. I wrote a .Net program that registers itself as a Windows service to start and stop the VM, but it’s not really code that’s ready for prime time, from my view. But it solved my problem. There are other tools to do this discussed in the Virtualbox forums that are free to download.

If a concern is that sometimes the Windows machines aren’t available on the network, not to worry. distcc ends up running the compilation locally if it can’t hand off the compilation to a distcc server.

Also, the distcc VMs gcc should stay in lock step with the rest of the Gentoo’s gcc version. But since you can easily build binary packages and then emerge them, you only have to compile GCC once.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6740
Location: almost Mile High in the USA

PostPosted: Sun Mar 11, 2018 2:18 am    Post subject: Reply with quote

This is an alternative?

https://forums.gentoo.org/viewtopic-t-66930-start-0.html

Not sure how well maintained this path is, however; even if cygwin is slower, at least virtualization (and the spectre/meltdown problems) won't have to be paid multiple times.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Sun Mar 11, 2018 4:46 am    Post subject: Reply with quote

While I have run Cygwin in the past, but I never tried to setup a Cygwin or colinux distcc host for gentoo, so I can't judge how easy or difficult it would be.

What is nice is that headless distcc VMs are transparent to the Windows user (my wife didn't even know that I had set it up).
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Sun Mar 11, 2018 4:56 am    Post subject: Reply with quote

Yeah, threw up colinux quick to take a look at it, but didn't like it much.

Seriously, why have a hobbled linux when you can have a real Gentoo VM? And a 64 bit one at that?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6740
Location: almost Mile High in the USA

PostPosted: Sun Mar 11, 2018 5:46 pm    Post subject: Reply with quote

because a full VM is "costly" in both RAM and CPU cycles. I wouldn't called cygwin "hobbled linux" ... it's running natively under Windows, Windows is dealing with memory/context swaps the best way it knows how to, and no memory/cpu cycles are wasted in a scheduler running in a scheduler.

It just doesn't look like Linux/Unix, that's about it.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Sun Mar 11, 2018 7:28 pm    Post subject: Reply with quote

eccerr0r wrote:
because a full VM is "costly" in both RAM and CPU cycles. I wouldn't called cygwin "hobbled linux" ... it's running natively under Windows, Windows is dealing with memory/context swaps the best way it knows how to, and no memory/cpu cycles are wasted in a scheduler running in a scheduler.

It just doesn't look like Linux/Unix, that's about it.


While true, we've long since passed the days were machine resources were valued more than people's time and effort.

Given the capabilities of the typical Windows machine these days, (>2.2 GHz, dual core 64 bit, >8 GB RAM, and now SSDs even more prevalent), there's lots of cycles there to be had.

Rationalizing the cygwin or colinux gcc / tool chain environment to the full Gentoo system such that the object files produced by cygwin or colinux can be used by the full Gentoo one is a pitfall, that didn't seem straight forward to me at all.

But the 'P' in 'PC' is personal, so to each their own.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6740
Location: almost Mile High in the USA

PostPosted: Mon Mar 12, 2018 12:28 am    Post subject: Reply with quote

You should time with and without the windows helper.
You'll notice that if the helper is slow enough, it's not worth it to even have the helper...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Mon Mar 12, 2018 12:53 am    Post subject: Reply with quote

eccerr0r wrote:
You should time with and without the windows helper.
You'll notice that if the helper is slow enough, it's not worth it to even have the helper...


I have my solution and I'm satisfied with it. But don't let me stop you from exploring it.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6740
Location: almost Mile High in the USA

PostPosted: Mon Mar 12, 2018 2:12 am    Post subject: Reply with quote

Don't be surprised if you find out the helper doesn't really help after all is done and through. From my long experimentation with distcc, having a slow helper whether it's due to network latency or slow compilation is a net wash or even net detriment for small file compilation.

Just warning your solution may not be found as optimal as you might think.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 17349

PostPosted: Mon Mar 12, 2018 2:36 am    Post subject: Reply with quote

eccerr0r wrote:
Don't be surprised if you find out the helper doesn't really help after all is done and through. From my long experimentation with distcc, having a slow helper whether it's due to network latency or slow compilation is a net wash or even net detriment for small file compilation.

Just warning your solution may not be found as optimal as you might think.
Given your distcc experience, do you think that is due to the limitations of the VM in CPU and RAM? I'm wondering if use of Cygwin (or WSL?) would be "more efficient" in allowing more usage of the system's resources, along with less abstraction getting in the way.
_________________
The whole system has to go. The modern criminal justice system is incompatible with Neuroscience. --Sapolsky
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6740
Location: almost Mile High in the USA

PostPosted: Mon Mar 12, 2018 3:06 am    Post subject: Reply with quote

The major problem with cygwin is compiling new versions of gcc that match that of your Gentoo boxes ... I don't know how well the environment works in building gcc, especially 'bleeding edge' compilers and whether cygwin is even a "supported target". As I have not really experimented much with cygwin (due to lack of worthwhile windows boxes) I can't give any direction on this except the theory.

The major problem with VMs of any sort is that it needs to emulate all the privileged instructions. One might think that gcc does not use any, but keep in mind the privilege changes when it accesses the disk (and in the case of distcc, network) as well as the VM scheduling overhead that's double scheduled by the windows scheduler. Plus I don't know of the penalties that the meltdown/spectre mitigation will do with VM, so it would be most ideal not to have to emulate them.

It's been said that good VMMs can get about 90% of native speed - but this is only memory execution with minimal disk IO. I've never seen good results with disk/network IO on VMMs. I personally use KVM QEMU and get nowhere near this 90% and overall speeds have been closer to the 70% mark (and I've heard reports that the meltdown/spectre mitigation is said to drop it to 50%...)
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Mon Mar 12, 2018 3:07 am    Post subject: Reply with quote

pjp wrote:
eccerr0r wrote:
Don't be surprised if you find out the helper doesn't really help after all is done and through. From my long experimentation with distcc, having a slow helper whether it's due to network latency or slow compilation is a net wash or even net detriment for small file compilation.

Just warning your solution may not be found as optimal as you might think.
Given your distcc experience, do you think that is due to the limitations of the VM in CPU and RAM? I'm wondering if use of Cygwin (or WSL?) would be "more efficient" in allowing more usage of the system's resources, along with less abstraction getting in the way.


Maybe, if you can resolve the differences in the gcc and tool chain so they are object compatible.

From what I've read up on, VMs generally do pretty well with compute, but less well with disk IO performance.
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Mon Mar 12, 2018 3:22 am    Post subject: Reply with quote

eccerr0r wrote:
The major problem with cygwin is compiling new versions of gcc that match that of your Gentoo boxes ... I don't know how well the environment works in building gcc, especially 'bleeding edge' compilers and whether cygwin is even a "supported target". As I have not really experimented much with cygwin (due to lack of worthwhile windows boxes) I can't give any direction on this except the theory.

The major problem with VMs of any sort is that it needs to emulate all the privileged instructions. One might think that gcc does not use any, but keep in mind the privilege changes when it accesses the disk (and in the case of distcc, network) as well as the VM scheduling overhead that's double scheduled by the windows scheduler. Plus I don't know of the penalties that the meltdown/spectre mitigation will do with VM, so it would be most ideal not to have to emulate them.

It's been said that good VMMs can get about 90% of native speed - but this is only memory execution with minimal disk IO. I've never seen good results with disk/network IO on VMMs. I personally use KVM QEMU and get nowhere near this 90% and overall speeds have been closer to the 70% mark (and I've heard reports that the meltdown/spectre mitigation is said to drop it to 50%...)


Still, even if it's at 50% loss over theoretical for the 3 Windows hosted VMs on the machines that are dedicated for Windows use primarily, the 50% that you do gain helps over not using them at all.

Hey, I'm not trying to force anyone to anything they don't want to. I'm just sharing what I did, and saying it helps.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 6740
Location: almost Mile High in the USA

PostPosted: Mon Mar 12, 2018 3:30 am    Post subject: Reply with quote

That's not exactly the way of thinking about it, it's considering you have say four 2GHz boxes, and you end up basically using a 2GHz box as your host and three 1GHz boxes as your helpers. Couple that with network latency, it does get to a point where the "1GHz" box might not even be worth having in the pool.

Try distccmon-gui and watching it. I've seen many times where the "2GHz" machine has an idle core while it waits for data to come back from the "1GHz" boxes where that 2GHz machine could have been doing the job of the "1GHz" machine instead of waiting for it. If you have enough "1GHz" machines it may help as you have enough of them to wait on, but this depends on what you're compiling/how it's being scheduled by make/ninja/....
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 17349

PostPosted: Mon Mar 12, 2018 4:22 am    Post subject: Reply with quote

Interesting, thanks. At least until spectre issues are resolved, gcc & toolchain compatibility could be a challenge.

Maybe a VM set for the max CPUs and memory, and then remotely controlling whether or not it is paused or resumed. A 200GB VDI seems large if the host is on an SSD. I think the one I just bought was 250GB (never mind that the system will likely never independently use more than 50GB).

Or even a small distcc/OS partition and remotely controlling which OS is active and boots next. If a native Windows program can set a "next boot only" option, that could work really well. Interesting project options. :)
_________________
The whole system has to go. The modern criminal justice system is incompatible with Neuroscience. --Sapolsky
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Tue Mar 13, 2018 12:57 am    Post subject: Reply with quote

pjp wrote:
Interesting, thanks. At least until spectre issues are resolved, gcc & toolchain compatibility could be a challenge.

Maybe a VM set for the max CPUs and memory, and then remotely controlling whether or not it is paused or resumed. A 200GB VDI seems large if the host is on an SSD. I think the one I just bought was 250GB (never mind that the system will likely never independently use more than 50GB).

Or even a small distcc/OS partition and remotely controlling which OS is active and boots next. If a native Windows program can set a "next boot only" option, that could work really well. Interesting project options. :)


Hmm. Not sure which VDI would grow to 200 GB. My little distcc VM is only a little over 10 GB. None of my VDIs are over 77 GB.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 17349

PostPosted: Tue Mar 13, 2018 1:21 am    Post subject: Re: VirtualBox: Running skinny VMs on Windows for distcc ser Reply with quote

What did you mean with 200 GB VDI in the following section?

eohrnberger wrote:
I configure a 200 GB VDI for the VMs hard drive. VDI files grow as needed, and when all said and done is a reasonable 10 GB VDI file (your mileage may vary). The VM’s build culminates with emerging the distcc and ccache packages.

_________________
The whole system has to go. The modern criminal justice system is incompatible with Neuroscience. --Sapolsky
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Tue Mar 13, 2018 2:15 am    Post subject: Re: VirtualBox: Running skinny VMs on Windows for distcc ser Reply with quote

pjp wrote:
What did you mean with 200 GB VDI in the following section?

eohrnberger wrote:
I configure a 200 GB VDI for the VMs hard drive. VDI files grow as needed, and when all said and done is a reasonable 10 GB VDI file (your mileage may vary). The VM’s build culminates with emerging the distcc and ccache packages.


Oh. Let me clarify. The internal size for the HD is 200 GB (max storage). The VDI file is only storing the data that you write to it, so around 10 GB.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 17349

PostPosted: Tue Mar 13, 2018 5:02 am    Post subject: Reply with quote

You're referring to dynamically allocated VDI, aren't you? Such that if some event caused spurious logging, it could eventually consume 200GB, correct?
_________________
The whole system has to go. The modern criminal justice system is incompatible with Neuroscience. --Sapolsky
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Wed Mar 14, 2018 2:52 am    Post subject: Reply with quote

pjp wrote:
You're referring to dynamically allocated VDI, aren't you? Such that if some event caused spurious logging, it could eventually consume 200GB, correct?


I suppose so. However, there's an option that emulates an SSD, which trims the file system allocation of free space (if I understand it correctly), so I suppose that I would do the following after a spurious logging event eventually consume 200GB:

  • Clean up and eliminate the spurious logging event (something's not happy)
  • Turn on SSD emulation to trim down the free space from the file system
  • Shutdown the VM and run vboxmanage compact on the idle VDI
  • resume running the VM
Now, not having done this before, and having logrotate in place to clean up and keep the log space down, I can't really comment on the efficacy of this procedure.

On the other hand, you could create a new VDI, and tar over from the starting VDI to reduce the allocation of the VDI.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 17349

PostPosted: Wed Mar 14, 2018 4:28 am    Post subject: Reply with quote

Interesting, thanks. I'll have to look into that feature. I'm not sure how I feel about VirtualBox manipulating something inside the OS to stop the logging event. For example, how can it know it is a problem as opposed to a desirable event to be logged?
_________________
The whole system has to go. The modern criminal justice system is incompatible with Neuroscience. --Sapolsky
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Wed Mar 14, 2018 4:45 am    Post subject: Reply with quote

pjp wrote:
Interesting, thanks. I'll have to look into that feature. I'm not sure how I feel about VirtualBox manipulating something inside the OS to stop the logging event. For example, how can it know it is a problem as opposed to a desirable event to be logged?


I don't think that VBox isn't going to change anything about the logging event, but could address the excess disk allocation resulting from the logging event to keep the host's VDI file from growing overly large.

The configuration issue causing the logging event would still be up to you to resolve, but if the VDI file grows too large, I think there are ways ot squeeze it back down to manageable sizes.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 17349

PostPosted: Wed Mar 14, 2018 5:06 am    Post subject: Reply with quote

Oh, OK. I'd probably just go with a smaller fixed size.
_________________
The whole system has to go. The modern criminal justice system is incompatible with Neuroscience. --Sapolsky
Back to top
View user's profile Send private message
eohrnberger
Apprentice
Apprentice


Joined: 09 Dec 2004
Posts: 169

PostPosted: Wed Mar 14, 2018 5:44 am    Post subject: Reply with quote

pjp wrote:
Oh, OK. I'd probably just go with a smaller fixed size.

From what I've seen, there's not all that much penalty for a larger fixed size, but do as you will.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum