Portage-NG: brainstorming

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

Hi folks,

Introduction
Portage is the best package manager in existence in my view. However, I feel
very confident that a much faster implementation design exists in the space of
valid designs, therefore I think it's worth pursuing.

I've been thinking about a redesign of portage and got some theories around
how it should be designed. I've also searched the forums and it seems that
there are still recent requests for a faster portage:
https://forums.gentoo.org/viewtopic-t-994458-highlight-faster+portage.html

Note: what I describe here is only a dependency resolver that should be
really fast (probably something like O(log n) or around that). For now, my
goal is to resolve the dependency tree real quick, then install them by
`emerge -O <pkg>`. In a later stage, we will optimize more and more bits, but
for now, I think it's a good idea to stick to enhancing the dependency
resolver. Feel free to post your different opinions.

More details (features, design, etc):
I've moved the text here into the Github's wiki here: https://github.com/Al-Caveman/Portage-NG/wiki

apathetic · n00b Joined: 28 Aug 2014 Posts: 36

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

apathetic · n00b Joined: 28 Aug 2014 Posts: 36

Roman_Gruber · Posted: Mon Sep 22, 2014 8:37 am Post subject:

for me it is an duplicate

it was discussed several times already.

what i remember was using git instead of rysnc and we should improve the current system and do not make a new one.

my 2c

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

One thing I didn't add previously is handling stable, non-stable (~amd) or
masked (9999**) packages.

To address this, I added one more field in pkgs_universe, which is the
"Stability class" field. This field is numeric (possibly an int). So that all
stable packages will have the same number, and all ~amd ones will be given yet
another number. Masked ones would have even a different one.

I think currently portage is hard-coded with ~amd, 9999** or whatever
stability levels. But this field would generalize it so that we can add new
stability classes such as "Ultra Stable" if there is one who is welling to
maintain them, without needing to modify the code.

steveL · Posted: Mon Sep 22, 2014 5:34 pm Post subject: Re: Portage-NG: brainstorming

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

mv · Watchman Joined: 20 Apr 2005 Posts: 6747

I am afraid that you are highly underestimating the algorithmic complexity of dependency resolution:
You do not have a tree but a graph with cycles (actually, at least three such graphs: one of DEPEND and one for RDEPEND, one due to the subslot dependencies), you have a huge number of branches in each node due to all sort of useflag combinations, etc.
I never thought about the actual complexity, but some portage developers spoke about NP-completeness. In any case, O(n^2) is probably a dream.

Improving the database format and introducing parallelization will only decrease the constants (unless you find some tricky database format which really reduces complexity - the format you suggested certainly does not). So in this sense, this is the same sort of snakeoil which you complain in pkgcore: Implementing in C instead of python is about reducing the constants, and usually by a much higher factor that you can get by parallelization or straightforward database format changes.

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

Added "karma" for packages. It can be manually set, but by default it is incremented every time some other package uses it as a dependency.

It will be helpful in deciding which dependency to choose when multiple options are available. I.e. if package X requires (Y or Z), then if Z has more karma, Z will be chosen.

This reveals an interesting optimization problem that always existed but I think was not discussed previously: should we choose the dep with the highest karma? Or should we choose the full dep tree with highest overall karma? Obviously the 1st is faster to resolve, but the 2nd could result in a more stable system with less installed packages.

steveL · Posted: Tue Sep 23, 2014 10:51 am Post subject: Re: Portage-NG: brainstorming

mv · Watchman Joined: 20 Apr 2005 Posts: 6747

hasufell · Retired Dev Joined: 29 Oct 2011 Posts: 429

dalu · Guru Joined: 20 Jan 2003 Posts: 530

Searching for dependencies of portage I stumbled on this thread. (Good that you didn't delete my account in my moment of rage at Gentoo stubbornness

)

Why not re-write it in Go?
It would speed things up (concurrency) and it would get rid of the python dependecy.
And you could still have it on multiple archs.

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

Thank you guys for your valuable input.

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

mv Hi again :lol:

done rubbing my brain cells for a time as I deem
satisfactory, and my thoughts are below:

So first point it's not all packages in portage that are subject to selection
(obviously). It's rather only all packages that are connected downstream the
graph that begins from the target package p1

The full portage graph is massive (includes packages P = {p1, p2, ...}, and if
we start searching the graph downstream p1, we will only see N many packages
connected.

So now, our pace of packages is N (usually much smaller than |P|), and let's
assume that the average number of deps per the the N many packages is X.

E.g. packages that need to be tested at the 1st level is X, and X*X in the 2nd
level, X*X*X in the 3rd level. Say wut.. Oh crap

you seem right.

That sounds like X^num_levels. num_levels = Xth_root(n), but who cares.

This is madness.. You're right, I'd be lucky to get O(n^2) let alone some
fluffy O(m log n).

So clearly this is not scalable, and all I can do is just hope that in reality
X and num_levels is not too large.

I need to think more about it... It seems graph traversal is a bad idea.

Any suggestions on what could possibly be relevant? Even remotely?

:cry:

steveL · Posted: Wed Sep 24, 2014 7:22 am Post subject:

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

hasufell · Retired Dev Joined: 29 Oct 2011 Posts: 429

IMO, if you really want to help don't go the "I'll rewrite foo from scratch" way. It'll take a few years just for people to recognize your reimplementation and even more years for users to actually switch. What we are left with then is yet another package manager with ~1 active dev.

Instead, do one of these:
* contribute to pkgcore revival
* gather people who are willing to fork paludis and make it a bit more user friendly

We already have solutions. But whatever you do... please don't contribute to portage. It's a codebase that should really die. It's making gentoo worse as a whole, because we keep adding broken features and hackeries to portage that are not even part of PMS and ebuild developers start relying on them, breaking cross-PM compatibility and whatnot.

steveL · Posted: Wed Sep 24, 2014 3:56 pm Post subject:

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

But pkgcore is kinda, imo, boring because it has python deps. It's like
solving it half-way through. Yeah it looks relatively fast (thanks to emerge
being slow). But I think, while at it, we should fix all the problems and not
settle for just "good enough".

IMO we should focus on having:

An optimal algorithm (which turns out to be some exponential madness,
but it seems things don't grow that large so that it's still computable).
Remove all possible constants, except the little added by C cause I
can't code a package manager in assembly without without feeling sad.
pkgcore seems to have removed a good bunch of such constants, but removing
a crap load of such constants is even better.
No run-time deps except the Linux kernel and a few mandatory stuff.
I.e. it will be statically linked. No fluffy python bits.
Good documentation from the start. I had to say this first but I think
it's boring so I put it last.

So yeah, another problem with pkgcore is that it's not documented well enough.
Because it if was, it was in the stage tarballs already.

Got a slightly off-topic question: how do some of you know that I don't know
how to handle the complexity of use flags? :p Did I say anything wrong
(except my O(m log n) madness)?

krinn · Watchman Joined: 02 May 2003 Posts: 7470

Al-Caveman · n00b Joined: 21 Sep 2014 Posts: 39

Thanks man, because of you posting that link, I learned about the BMH
algorithm =)

But this is sort of superficial. I.e. looking/sounding doubtful about the time
complexity analysis (a topic) does not mean that I don't know how portage
should work to properly resolve deps (another topic).

I could obviously be wrong about the dep resolution process too, but it would
be better if someone tells me what is wrong about them (as opposed to saying
"you were wrong in a different topic, therefore you must be wrong now too").