View previous topic :: View next topic |
Author |
Message |
popel n00b
Joined: 29 May 2004 Posts: 3
|
Posted: Sat May 29, 2004 6:26 pm Post subject: multi threaded emerge- downloading while compiling |
|
|
Is there any way for downloading and compiling at the same time?
Background:
I have a "smal" internet connection. The download of bigger packages needs often more than 5 minutes. By compiling while download i think i could reduce buildtime for my system to the half. |
|
Back to top |
|
|
fifo Guru
Joined: 14 Jan 2003 Posts: 437
|
Posted: Sat May 29, 2004 6:31 pm Post subject: |
|
|
Yes, use emerge with the "-f" switch to do the downloading, then once it's downloaded the first package or two, start the actual emerge command along side it to do the compiling. |
|
Back to top |
|
|
popel n00b
Joined: 29 May 2004 Posts: 3
|
Posted: Sat May 29, 2004 6:38 pm Post subject: |
|
|
Yes, i tried this.
but sometimes my cpu is just faster than the download. and then sometimes something ugly happens to the partly downloaded file: it gets corrupted.
there needs something to be build with monitoring, which checks whether the file is complete, else waits for it to complete. |
|
Back to top |
|
|
Nate_S Guru
Joined: 18 Mar 2004 Posts: 414
|
Posted: Sat May 29, 2004 7:31 pm Post subject: |
|
|
this topic has been throughly discussed, I believe the problem is that portage has problems when there is more than one instance of it running at a time. There have been several workarounds proposed, take a look on the thread about this in bugzilla. |
|
Back to top |
|
|
robmoss Retired Dev
Joined: 27 May 2003 Posts: 2634 Location: Jesus College, Oxford
|
|
Back to top |
|
|
meowsqueak Veteran
Joined: 26 Aug 2003 Posts: 1549 Location: New Zealand
|
Posted: Sat Jul 03, 2004 2:32 am Post subject: |
|
|
What if emerge created a lock indicating 'file is being downloaded' and the other instance of emerge saw that lock and built other downloaded packages (as long as dependencies are met) or else blocked on the lock. You just need a flag for emerge that tells it not to download anything and wait on locks.
Then you can have one emerge running -f downloading the files in a suitable order (by dependency as it does it now would be fine) and another emerge running -?? that tells it to do as I describe above.
This wouldn't be perfect, since the optimal build time depends on optimising the availability of source packages to the building 'thread', and this would be tricky to estimate (use source tarball size maybe, if we could assume build time and certainly download time are proportional to file size?), but at least the method above would work quite well generally. |
|
Back to top |
|
|
robmoss Retired Dev
Joined: 27 May 2003 Posts: 2634 Location: Jesus College, Oxford
|
Posted: Sat Jul 03, 2004 6:25 am Post subject: |
|
|
Well you probably want to download things in the same order you're going to emerge them. But your idea is nice, if a little tricky to implement. Maybe one to have a go at once the Portage API shows up? _________________ Reality is for those who can't face Science Fiction.
emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts |
|
Back to top |
|
|
meowsqueak Veteran
Joined: 26 Aug 2003 Posts: 1549 Location: New Zealand
|
Posted: Sat Jul 03, 2004 7:13 am Post subject: |
|
|
Well, I actually think it would be fairly simple to implement to start with - "oops, this package is 'download-locked' - we'll sit here and wait for it to finish". This will result in stop-start behaviour but it will still produce good results if you have a fast downlink. And it will prevent the problem of the second emerge catching up with the download. Since this already seems like a popular solution (two emerges, one downloading, one building started some time later) then waiting on 'download locks' will prevent the rather nasty effect of trying to download the same thing twice at the same time. People already do this (and in a way it's risky) so this could make it perfectly safe. After that, yes, it starts to get a little more complicated. We'd need a good heuristic to decide which package to download next for optimal performance (based on download speed, compilation speed, package size, maybe something that learns over time?).
Typing as I think: consider a package with size N bytes that takes D seconds to download and C seconds to compile (remember we often don't know C and D just yet - we'd have to make a good guess most of the time based on past history and N). If D << C then download this package early, since we'll have lots of time while it's building to download other packages. If D >> C then we won't get much benefit building this, so look for better cases first.
For the collection of packages, estimate C and D for each, and pick the one with the largest difference between C and D where D is less than C. Download this first.
This algorithm is 'greedy' and may not always work very well, since a). the heuristic estimation may fail horrendously or b). the first package you download may take so darn long to download you would have been better off to download everything else first and start compiling those and download the big package last.
Some packages contain a lot of non-compiled information (e.g. nvidia binary drivers and big tcl/tk or perl apps) so they may take a long time to download but are built in seconds. In this case using N as a metric isn't going to work too well. Might need some way of feeding back the success of the estimates back into the system so that the ebuilds contain this and the entire system learns... or something...
We could call it SkyNET... |
|
Back to top |
|
|
robmoss Retired Dev
Joined: 27 May 2003 Posts: 2634 Location: Jesus College, Oxford
|
Posted: Sat Jul 03, 2004 7:40 am Post subject: |
|
|
It's very tricky, yes! But I think that a naive implementation would certainly be better than no implementation at all. I think it's probably more sensible to stick a flag in the ebuild which denotes whether or not a package is "pure source" - so gcc is, openoffice is, but openoffice-bin isn't, and neither is the nvidia stuff. You can stick those last for compiling. The others - I suspect that it would make more sense to order things in such a way that smallest things go first, biggest things go last, whilst not breaking the depgraph. Of course, we can't do this until we get a proper depgraph. _________________ Reality is for those who can't face Science Fiction.
emerge -U will kill your Gentoo
ecatmur, Lord of Portage Bash Scripts |
|
Back to top |
|
|
tomk Bodhisattva
Joined: 23 Sep 2003 Posts: 7221 Location: Sat in front of my computer
|
|
Back to top |
|
|
|