devsk wrote:But I am not building clang with -ggdb...
Currently, you are not. My point was that there exist some extremely large packages, and it's not even that hard to drive them into serious failure modes. Thus, I appreciate why Portage gained this feature.
devsk wrote:I think you failed to notice that when the /var/tmp/portage size is 25GB, it said it needs 27GB and when the FS size is 35GB, it said it needs 37GB. If it needed 27GB, why is it complaining when it has 35GB? Without building any big package, its not happy that the FS got past its initial request of 27GB by 8GB.
No, I saw that, but I have not looked at the Portage code involved, and had nothing helpful to say about the calculation.
devsk wrote:Why I think its overtly aggressive is because if chromium is emerging at 18GB, the load average is going to be such that no other package requiring 9GB of space is going to be emerge'ing at the same time because its going to be a heavy weight if its 9GB.
Load average is inherently a lagging warning though. Suppose you had emerge set at
--jobs=2 --load-average=4, and there are exactly 2 packages available to start: chromium and clang, both of which are known to be huge. Since neither has started
yet, load average is still sitting near system idle, so your
--load-average parameter will not prevent starting both at once. Absent this feature, Portage would then start both, and yes, once they get past their configure stage and start creating compile jobs, load average will go up, but by then, it's too late for Portage to realize that starting both at once was a bad idea. With this feature, assuming the calculation is decent[1], Portage would recognize the upcoming problem and avoid starting both.
devsk wrote:I think the answer maybe to scale the space required with the size of download for each package in the emerge set. The set of packages and their sizes are known, we can multiply by a factor of 10 to account for tar.xz to expanded disk usage. That way if it is chromium being emerged, you know you need to start at 20GB but if its just some small 50 odd python packages, you don't need to reduce the parallelism!
This is an interesting idea, but a flat multiply by 10 seems overly optimistic to me. Going back again to my example of clang, the main source archive for clang-20.1.8 is only 141M, which your multiplier would predict to need ~1.41G. However, when built with
-ggdb (which, again, uses up a
lot of extra space due to debug symbols), clang easily exceeds 25G before it finishes building. Therefore, we need either to stipulate that there are user configurations for which this feature will be too optimistic[2] (and therefore it allows builds that an ideal algorithm would prohibit) or we need the feature to be more clever, whether by having a much more pessimistic multiplier, by examining more user state to try to pick a situation-appropriate multiplier, or by using some other algorithm.
[1]: I read the code snippet pasted here, but have not tried to evaluate whether this is a good choice of algorithm.
[2]: It's fine if this feature doesn't cover everyone. Building clang with full debug symbols is probably pretty unusual, and I have no complaints if the Gentoo developers want to take the position that this feature only covers people using the recommended simple flags of
-O2 -march=something. I know to be careful when I'm building clang with debug symbols.