Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Portage-NG: brainstorming
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Mon Sep 22, 2014 12:15 am    Post subject: Portage-NG: brainstorming Reply with quote

Hi folks,

Introduction
Portage is the best package manager in existence in my view. However, I feel
very confident that a much faster implementation design exists in the space of
valid designs, therefore I think it's worth pursuing.

I've been thinking about a redesign of portage and got some theories around
how it should be designed. I've also searched the forums and it seems that
there are still recent requests for a faster portage:
https://forums.gentoo.org/viewtopic-t-994458-highlight-faster+portage.html

Note: what I describe here is only a dependency resolver that should be
really fast (probably something like O(log n) or around that). For now, my
goal is to resolve the dependency tree real quick, then install them by
`emerge -O <pkg>`. In a later stage, we will optimize more and more bits, but
for now, I think it's a good idea to stick to enhancing the dependency
resolver. Feel free to post your different opinions.


More details (features, design, etc):
I've moved the text here into the Github's wiki here: https://github.com/Al-Caveman/Portage-NG/wiki


Last edited by Al-Caveman on Mon Sep 22, 2014 8:57 pm; edited 2 times in total
Back to top
View user's profile Send private message
apathetic
n00b
n00b


Joined: 28 Aug 2014
Posts: 36

PostPosted: Mon Sep 22, 2014 4:45 am    Post subject: Re: Portage-NG: brainstorming Reply with quote

Al-Caveman wrote:

Note: what I describe here is only a dependency resolver that should be
really fast (probably something like O(log n) or around that). For now, my
goal is to resolve the dependency tree real quick, then install them by
`emerge -O <pkg>`. In a later stage, we will optimize more and more bits, but
for now, I think it's a good idea to stick to enhancing the dependency
resolver. Feel free to post your different opinions.


Too many words, I can't comprehend all of that given that I'm a bit stoned atm. Just one question - how exactly are you going to achieve graph traversal in O(log(n)) time? The best you can have methinks is O(n*log(n)).
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Mon Sep 22, 2014 6:10 am    Post subject: Re: Portage-NG: brainstorming Reply with quote

apathetic wrote:
Al-Caveman wrote:

Note: what I describe here is only a dependency resolver that should be
really fast (probably something like O(log n) or around that). For now, my
goal is to resolve the dependency tree real quick, then install them by
`emerge -O <pkg>`. In a later stage, we will optimize more and more bits, but
for now, I think it's a good idea to stick to enhancing the dependency
resolver. Feel free to post your different opinions.


Too many words, I can't comprehend all of that given that I'm a bit stoned atm. Just one question - how exactly are you going to achieve graph traversal in O(log(n)) time? The best you can have methinks is O(n*log(n)).


You're probably right. I was throwing wild approximating guesses and meant to
say that it's logish (O(log n) or O(n*log n) or whatever log something;
the usual tree business).

Thanks to you for raising this up, I'm now thinking about it more closely and
wonder if I can come with an exact time complexity for it.

Let's say the following:

  • We wish to find the dependency graph of package P1 with USE flags U =
    {f1, f2, ...}.
  • P1 has x many candidate children/deps (based on chosen USE flags).
  • For each of the x many candidates, we need to test them to see if they
    are a good path down the tree. This test happens by <CONDITION> which
    itself could be composed of y many logical statements inside it (e.g.
    combining multiple logical statements with AND/OR/etc).


Therefore, all of the x candidates will be evaluated y many times
each. So x*y in the 1st level level of the tree.

Similarly, as we recurse the tree down towards the candidates, we perform x*y
operations at every level. Let's say that number of levels is z.

In average, to me it seems that z = log n, where n is total number of
candidate packages under P1 down to the end of the tree. In other words, n =
total number of children of P1 + total number of children of the children of
P1 + ... etc. Probably, in average, n = x + x^2 + x^3 + .. + x^z.

Then I guess it should be O(x * y * log n)

However, I think that x and y are usually small numbers. While n is usually
noticeably bigger.

So to me it seems that, in reality, it's like O(small_num * small_num * log
large_num).

I think all this is the average time complexity as I assumed z = log n.

But the worst time complexity would be z = n (instead of z = log n). So I
guess that the worst time complexity would be O(x*y*n) which is madness. But
this is very rare to happen in my view.

One point that I didn't talk about is how did we find the part of the tree
where P1 exists? This can be done by a hash table. I.e. in pkgs_universe,
there can be a field that points right to the index of pkgs_deptree where P1
lives. This can happen with O(1) (assuming no hash collisions, obviously).

Feel free to suggest any tweaks, throw wild guesses or ask any question. I am
sure they will help us (at least me) to find something better.
Back to top
View user's profile Send private message
apathetic
n00b
n00b


Joined: 28 Aug 2014
Posts: 36

PostPosted: Mon Sep 22, 2014 7:13 am    Post subject: Re: Portage-NG: brainstorming Reply with quote

Al-Caveman wrote:

Let's say the following:

  • We wish to find the dependency graph of package P1 with USE flags U =
    {f1, f2, ...}.
  • P1 has x many candidate children/deps (based on chosen USE flags).
  • For each of the x many candidates, we need to test them to see if they
    are a good path down the tree. This test happens by <CONDITION> which
    itself could be composed of y many logical statements inside it (e.g.
    combining multiple logical statements with AND/OR/etc).



Does the phrase "breadth-first search" ring any bells? Lets put it this way - if you're not a programmer willing to sacrifice his time to write most of this stuff (or at least a working PoC), you'll most likely fail.
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3846
Location: Austro Bavaria

PostPosted: Mon Sep 22, 2014 8:37 am    Post subject: Reply with quote

for me it is an duplicate

it was discussed several times already.

what i remember was using git instead of rysnc and we should improve the current system and do not make a new one.

my 2c
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Mon Sep 22, 2014 3:41 pm    Post subject: Re: Portage-NG: brainstorming Reply with quote

apathetic wrote:
Al-Caveman wrote:

Let's say the following:

  • We wish to find the dependency graph of package P1 with USE flags U =
    {f1, f2, ...}.
  • P1 has x many candidate children/deps (based on chosen USE flags).
  • For each of the x many candidates, we need to test them to see if they
    are a good path down the tree. This test happens by <CONDITION> which
    itself could be composed of y many logical statements inside it (e.g.
    combining multiple logical statements with AND/OR/etc).



Does the phrase "breadth-first search" ring any bells? Lets put it this way - if you're not a programmer willing to sacrifice his time to write most of this stuff (or at least a working PoC), you'll most likely fail.

Yeah it's a BFS. And yes, I am assuming that I'll be the only programmer.
Others can chime in too and amplify the fun, but it's OK if I do it alone.

tw04l124 wrote:
for me it is an duplicate

it was discussed several times already.

what i remember was using git instead of rysnc and we should improve the current system and do not make a new one.

my 2c

I don't think that this is a duplicate, as it's the 1st attempt ever to use
binary formats with proper indexing for fastest possible dependency
resolution.

We can consider this as a research project to identify what we can find. Later
on, if we succeed, we can have an ebuild for this package manager for further
testing. If it shows to be reliable, we can then port the good bits to emerge.
Or, it can be ignored by the Gentoo devs and it would still be fine (I have no
hidden agendas in spreading my code :) -- I'm just doing what's fun).

I don't see how git vs. rsync is relevant to enhancing the dependency
resolution speed.
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Mon Sep 22, 2014 4:08 pm    Post subject: Reply with quote

One thing I didn't add previously is handling stable, non-stable (~amd) or
masked (9999**) packages.

To address this, I added one more field in pkgs_universe, which is the
"Stability class" field. This field is numeric (possibly an int). So that all
stable packages will have the same number, and all ~amd ones will be given yet
another number. Masked ones would have even a different one.

I think currently portage is hard-coded with ~amd, 9999** or whatever
stability levels. But this field would generalize it so that we can add new
stability classes such as "Ultra Stable" if there is one who is welling to
maintain them, without needing to modify the code.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Mon Sep 22, 2014 5:34 pm    Post subject: Re: Portage-NG: brainstorming Reply with quote

Al-Caveman wrote:
Portage is the best package manager in existence in my view.

Heh I used to think that too.
Quote:
However, I feel very confident that a much faster implementation design exists in the space of
valid designs, therefore I think it's worth pursuing.

Yes, it's called pkgcore. Everything you're talking about has already been done, by the previous lead developer of portage (ferringb), who stepped back to do a rewrite.

It's blazingly fast. The backend work is done in C, by the snakeoil package.

It fell behind a year or two ago, though radhermit is working on bringing it back to EAPI-5/6.

Hmm been a while, but last time I asked for a heads-up I was accused of trolling, so haven't checked in for a while. Wouldn't have minded, but I'd just offered to do the bash side of the work, which was what I wanted to discuss. Oh well.

Anyhow, there's no need to resort to binary hoopla: check out pkgcore, and see about helping out in #pkgcore on IRC: chat.freenode.net.
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Mon Sep 22, 2014 7:01 pm    Post subject: Re: Portage-NG: brainstorming Reply with quote

steveL wrote:
Al-Caveman wrote:
Portage is the best package manager in existence in my view.

Heh I used to think that too.
Quote:
However, I feel very confident that a much faster implementation design exists in the space of
valid designs, therefore I think it's worth pursuing.

Yes, it's called pkgcore. Everything you're talking about has already been done, by the previous lead developer of portage (ferringb), who stepped back to do a rewrite.

It's blazingly fast. The backend work is done in C, by the snakeoil package.

It fell behind a year or two ago, though radhermit is working on bringing it back to EAPI-5/6.

Hmm been a while, but last time I asked for a heads-up I was accused of trolling, so haven't checked in for a while. Wouldn't have minded, but I'd just offered to do the bash side of the work, which was what I wanted to discuss. Oh well.

Anyhow, there's no need to resort to binary hoopla: check out pkgcore, and see about helping out in #pkgcore on IRC: chat.freenode.net.


I did check them already! I think their speed boost is because of doing it in
C via snakeoil and probably some neat algorithmic tricks.

But I think that pkgcore's algorithmic tricks don't include indexing portage
itself. I guess they juse use /usr/portage/* as it is except for being more
efficient at some points. But it is still not fully efficient in my view.

Here, our goal is achieving perfect efficiency. I think if pkgcore is much
faster without indexing /usr/portage/*, then it means that we can be
ridiculously faster (almost like pacman or apt-get) if we index portage (as
far as dep. resolution is concerned).
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Mon Sep 22, 2014 7:28 pm    Post subject: Reply with quote

I am afraid that you are highly underestimating the algorithmic complexity of dependency resolution:
You do not have a tree but a graph with cycles (actually, at least three such graphs: one of DEPEND and one for RDEPEND, one due to the subslot dependencies), you have a huge number of branches in each node due to all sort of useflag combinations, etc.
I never thought about the actual complexity, but some portage developers spoke about NP-completeness. In any case, O(n^2) is probably a dream.

Improving the database format and introducing parallelization will only decrease the constants (unless you find some tricky database format which really reduces complexity - the format you suggested certainly does not). So in this sense, this is the same sort of snakeoil which you complain in pkgcore: Implementing in C instead of python is about reducing the constants, and usually by a much higher factor that you can get by parallelization or straightforward database format changes.
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Mon Sep 22, 2014 9:08 pm    Post subject: Reply with quote

mv wrote:
I am afraid that you are highly underestimating the algorithmic complexity of dependency resolution:
You do not have a tree but a graph with cycles (actually, at least three such graphs: one of DEPEND and one for RDEPEND, one due to the subslot dependencies), you have a huge number of branches in each node due to all sort of useflag combinations, etc.
I never thought about the actual complexity, but some portage developers spoke about NP-completeness. In any case, O(n^2) is probably a dream.

Improving the database format and introducing parallelization will only decrease the constants (unless you find some tricky database format which really reduces complexity - the format you suggested certainly does not). So in this sense, this is the same sort of snakeoil which you complain in pkgcore: Implementing in C instead of python is about reducing the constants, and usually by a much higher factor that you can get by parallelization or straightforward database format changes.


Good points. I replaced "tree" by "graph" as trees don't cycle around
(graphs do).

As for O(n^2), isn't that the worst case? I wonder what the average case would
be.

Back to O(x * y * z), I think z is usually between 1 and n. I said log n is
the average, but it was a guess.

I've been looking for a time complexity for the package dependency resolution
in emerge and I couldn't find any. If anyone can construct a better
representation kindly chime in!

Forgot to add, as for cyclic dependencies, I plan to only warn the user and
request him/her to fix it manually (for now). From what I heard, I think it's
also done manually in Gentoo too? But putting some intelligence to resolver
such cyclic deps automatically is just a matter of coding a function to
resolve it automatically (which is another task)
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Mon Sep 22, 2014 10:54 pm    Post subject: Reply with quote

Added "karma" for packages. It can be manually set, but by default it is incremented every time some other package uses it as a dependency.

It will be helpful in deciding which dependency to choose when multiple options are available. I.e. if package X requires (Y or Z), then if Z has more karma, Z will be chosen.

This reveals an interesting optimization problem that always existed but I think was not discussed previously: should we choose the dep with the highest karma? Or should we choose the full dep tree with highest overall karma? Obviously the 1st is faster to resolve, but the 2nd could result in a more stable system with less installed packages.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Tue Sep 23, 2014 10:51 am    Post subject: Re: Portage-NG: brainstorming Reply with quote

Al-Caveman wrote:
steveL wrote:
Yes, it's called pkgcore. Everything you're talking about has already been done, by the previous lead developer of portage (ferringb), who stepped back to do a rewrite.

It's blazingly fast. The backend work is done in C, by the snakeoil package.

I did check them already! I think their speed boost is because of doing it in
C via snakeoil and probably some neat algorithmic tricks.

But I think that pkgcore's algorithmic tricks don't include indexing portage
itself. I guess they juse use /usr/portage/* as it is except for being more
efficient at some points. But it is still not fully efficient in my view.

This makes no sense. You never mentioned pkgcore once in your post, yet now you've already dismissed it as "not indexing portage."
Quote:
Here, our goal is achieving perfect efficiency. I think if pkgcore is much
faster without indexing /usr/portage/*, then it means that we can be
ridiculously faster (almost like pacman or apt-get) if we index portage (as
far as dep. resolution is concerned).

It already is; and your entire post was about how portage is slow, how we should speed up that with your "great idea". Suddenly that same idea is needed in pkgcore, something you clearly haven't actually explored.

Whenever I see people write things like "our goal is perfect efficiency" it makes me think they don't really know what they're talking about. Everything else you've written makes me think that too.

Good luck.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Tue Sep 23, 2014 6:35 pm    Post subject: Reply with quote

Al-Caveman wrote:
As for O(n^2), isn't that the worst case? I wonder what the average case would be.

O(n^2) would be an average case one cannot even dream about:
If the comments about NP-complete are right, the worst case is O(2^n) (unless you solve the famous P=NP problem).
It looks reasonable: For each package "pulling" or "non-pulling" is possible. Unless you are a genius and solve P=NP in the positive, you practically have to try all combinations which are 2^n for n packages. Even if you can elmiinate by pruning (where the graph becomes a tree) most of the packages, if just 20 "hard" or so are left, you need a tremendous amount of time. Playing with the constants hardly would help.
Back to top
View user's profile Send private message
hasufell
Retired Dev
Retired Dev


Joined: 29 Oct 2011
Posts: 429

PostPosted: Tue Sep 23, 2014 11:20 pm    Post subject: Re: Portage-NG: brainstorming Reply with quote

Al-Caveman wrote:
(almost like pacman or apt-get)

pacman is fast because it doesn't do much and because the complexity of dependencies in arch linux is a lot less (e.g. no USE flags)
Back to top
View user's profile Send private message
dalu
Guru
Guru


Joined: 20 Jan 2003
Posts: 530

PostPosted: Wed Sep 24, 2014 1:03 am    Post subject: Reply with quote

Searching for dependencies of portage I stumbled on this thread. (Good that you didn't delete my account in my moment of rage at Gentoo stubbornness ;) )

Why not re-write it in Go?
It would speed things up (concurrency) and it would get rid of the python dependecy.
And you could still have it on multiple archs.
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Wed Sep 24, 2014 2:11 am    Post subject: Reply with quote

Thank you guys for your valuable input.

steveL wrote:
Al-Caveman wrote:
Here, our goal is achieving perfect
efficiency. I think if pkgcore is much faster without indexing /usr/portage/*,
then it means that we can be ridiculously faster (almost like pacman or
apt-get) if we index portage (as far as dep. resolution is concerned).

It already is; and your entire post was about how portage is slow, how we
should speed up that with your "great idea". Suddenly that same idea is needed
in pkgcore, something you clearly haven't actually explored.

Do you have any links that show that pmerge actually uses some indexing method
instead of directly using the text files in /usr/portage/*? pmerge is the bit
that does graph stuff.


mv wrote:
...
It looks reasonable: For each package "pulling" or "non-pulling" is
possible
. Unless you are a genius and solve P=NP in the positive, you
practically have to try all combinations which are 2^n for n packages. Even if
you can elmiinate by pruning (where the graph becomes a tree) most of the
packages, if just 20 "hard" or so are left, you need a tremendous amount of
time. Playing with the constants hardly would help.

Not true for these reasons (unless you show me otherwise; I would be thankful):

  • It's not each package. It's rather each package that is connected
    downstream[ the package that is desired to be installed. Why would I
    consider every package, even those that are not connected to the subset of
    the graph that relates to the target packae?
  • As raised earlier by apathetic, this is a BFS search, except that I
    stop as soon as I detect a circular dependency (which is easy). And the
    time complexity of BFS is explained here: http://en.wikipedia.org/wiki/Breadth-first_search#Time_and_space_complexity

Update: sorry, I need to re-think a bit about the time complexity. I'm
re-thinking now. I'll post my thoughts once I'm done. You could be right about
all your comments. Sorry for the mess.


hasufell wrote:
Al-Caveman wrote:
(almost like pacman or apt-get)

pacman is fast because it doesn't do much and because the complexity of
dependencies in arch linux is a lot less (e.g. no USE flags)

I know, this is why I said "almost like ... etc". Thank you for ensuring
that this point is not misunderstood by others.


dalu wrote:
...
Why not re-write it in Go?
It would speed things up (concurrency) and it would get rid of the python dependecy.
And you could still have it on multiple archs.

I haven't decided the implementation yet. Now let's sort the feasibility of
this new design. I'll code some prototypes and run some tests. Finally we will
discuss the implementation details of the 1st function prototype!
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Wed Sep 24, 2014 2:59 am    Post subject: Reply with quote

mv Hi again :lol: done rubbing my brain cells for a time as I deem
satisfactory, and my thoughts are below:

So first point it's not all packages in portage that are subject to selection
(obviously). It's rather only all packages that are connected downstream the
graph that begins from the target package p1

The full portage graph is massive (includes packages P = {p1, p2, ...}, and if
we start searching the graph downstream p1, we will only see N many packages
connected.

So now, our pace of packages is N (usually much smaller than |P|), and let's
assume that the average number of deps per the the N many packages is X.

E.g. packages that need to be tested at the 1st level is X, and X*X in the 2nd
level, X*X*X in the 3rd level. Say wut.. Oh crap :( you seem right.

That sounds like X^num_levels. num_levels = Xth_root(n), but who cares.

This is madness.. You're right, I'd be lucky to get O(n^2) let alone some
fluffy O(m log n).

So clearly this is not scalable, and all I can do is just hope that in reality
X and num_levels is not too large.

I need to think more about it... It seems graph traversal is a bad idea.

Any suggestions on what could possibly be relevant? Even remotely?

:cry:
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Wed Sep 24, 2014 7:22 am    Post subject: Reply with quote

Al-Caveman wrote:
Do you have any links that show that pmerge actually uses some indexing method
instead of directly using the text files in /usr/portage/*? pmerge is the bit that does graph stuff.

Whut really? lul. You really are extraordinary.

First off I never said anything about pkgcore using an index: I merely said it already is ridiculously fast.

Secondly, do you really think no-one's ever thought of this before? portage has had the ability to use a sqlite db since forever.

I'm not saying it can't help: merely that it has nothing to do with the algorithmic complexity of dependency resolution (which you appear to be tying yourself in knots about. ;) And in fact, once you've used pkgcore for any length of time (which would mean over a year ago) all thoughts of bothering with any additional optimisation go out of the window; YAGNI.

And finally, there is a metadata cache which gets used by portage (and synced with the tree); so it's not hitting all those files you think it is. Effectively it already has a prebuilt index, which is not going to get much faster; certainly not effecting the orders of improvement you appear to be hoping for.
Quote:
dalu wrote:
...
Why not re-write it in Go?
It would speed things up (concurrency) and it would get rid of the python dependecy.
And you could still have it on multiple archs.

I haven't decided the implementation yet. Now let's sort the feasibility of
this new design. I'll code some prototypes and run some tests. Finally we will
discuss the implementation details of the 1st function prototype!

Lul: blimey I've seen this so many times on the forums, and you want to watch it with Dalu (hey Dalu, good to see ya), as she tends to come out with that kind of enthusiastic response, and as so many times before, it's all "requirements specification" and no code.

I don't want to rain on your parade: I just want you to consider that perhaps the best solution isn't some grand new design and implementation discussed to death, before you even know what the problem space is about. Nor do I want to stop you experimenting on whatever ideas you have. That you don't even have an implementation language in mind, though, only sets off more alarm-bells, as does the way you're talking about "deciding the implementation".
Quote:
Any suggestions on what could possibly be relevant? Even remotely?

Learning to walk before you try to run? Seriously, if you're truly interested in learning this stuff, the best guy to learn from is ferringb, closely followed by zmedico. Neither have time to teach you the basics, but they both encourage new people (though ferringb can be kinda moody; zmedico is a much more effective lead overall, imo, and ferringb is deeper-tech. It'd be a lot better for Gentoo if they both worked on portcore or w/e.)

If you want to learn how to implement dependency resolution, learn how to write a basic make (which is just topological sort). Then add the USE-flag (optional) element to the dependency chains, and explore switching them on and off, at resolution as well as in config (input). The latter obviously adds dependencies, and the former decides if maybe we can switch them off instead, in case of conflict. There should never be any unconditional cycles, so the first part can be done with basic data about unconditional deps, and abort on cycle.

Once you have a skeleton resolver, then you will be in a position to assess what you're doing. ATM it's all talk with no exploration of the problem-space under your belt. And don't get attached to that skeleton: plan to rewrite it at least twice.

"The Awk Programming Language" (Aho, Kernighan & Weinberger, 1988) will teach you how to implement a basic make, with a script less than 50 lines long. See Chapter 7.4 (you'll need 7.3 for the background. Chapter 1 is a tutorial, and 2 is the language reference.)
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Wed Sep 24, 2014 8:04 am    Post subject: Reply with quote

Quote:
First off I never said anything about pkgcore using an index: I merely said it already is ridiculously fast.
This answers my question. Thanks.
Back to top
View user's profile Send private message
hasufell
Retired Dev
Retired Dev


Joined: 29 Oct 2011
Posts: 429

PostPosted: Wed Sep 24, 2014 1:33 pm    Post subject: Reply with quote

IMO, if you really want to help don't go the "I'll rewrite foo from scratch" way. It'll take a few years just for people to recognize your reimplementation and even more years for users to actually switch. What we are left with then is yet another package manager with ~1 active dev.

Instead, do one of these:
* contribute to pkgcore revival
* gather people who are willing to fork paludis and make it a bit more user friendly

We already have solutions. But whatever you do... please don't contribute to portage. It's a codebase that should really die. It's making gentoo worse as a whole, because we keep adding broken features and hackeries to portage that are not even part of PMS and ebuild developers start relying on them, breaking cross-PM compatibility and whatnot.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Wed Sep 24, 2014 3:56 pm    Post subject: Reply with quote

Al-Caveman wrote:
Quote:
First off I never said anything about pkgcore using an index: I merely said it already is ridiculously fast.
This answers my question. Thanks.

Hmm you appear to be underlining it, and focussing solely on that point, as if it were some sort of vindication of what you wrote.

I hope that I'm reading you wrong.

I don't have any issue with you exploring how this is done from scratch: you shouldn't need more than a couple of weeks to get a skeleton together. Not that it would deal with the various (at least 4) types of dependency, but enough for you to explore the nub of the issue: which is options/USE-flags that pull in more dependencies, and wherein you must have the capability for backtracking to test scenarios (ie turn off a flag, and consequent changes.) Really it's a closure over the latter, but there's no point discussing that if you don't know how to do normal dependency resolution.

As stated, though, what Gentoo needs most of all is to switch to pkgcore, so it's up to you what you want to explore.

For a start you must understand bash and how ebuilds are put together, which is a whole other language and set of constraints, as well as how eclasses interact, and how that is implemented, both on the bash side, and from the mangler side; which is based on the effects it has on the bash process.

Simply put, bash is the one language that is required by the ebuild format: no other is. If you don't learn it properly, you'll be in for a much harder time. #bash can help; beware 99% of sites on the internet, especially TLDP/ABS which is responsible for so many a travesty of scripting.

You also need an understanding of make, and autotools, with an awareness of cmake. ##workingset is the place to learn about those (and a whole lot more around the build process.) And ofc #awk will help you with awk, which is used in many more places than people realise.
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Wed Sep 24, 2014 7:53 pm    Post subject: Reply with quote

But pkgcore is kinda, imo, boring because it has python deps. It's like
solving it half-way through. Yeah it looks relatively fast (thanks to emerge
being slow). But I think, while at it, we should fix all the problems and not
settle for just "good enough".

IMO we should focus on having:

  • An optimal algorithm (which turns out to be some exponential madness,
    but it seems things don't grow that large so that it's still computable).
  • Remove all possible constants, except the little added by C cause I
    can't code a package manager in assembly without without feeling sad.
    pkgcore seems to have removed a good bunch of such constants, but removing
    a crap load of such constants is even better.
  • No run-time deps except the Linux kernel and a few mandatory stuff.
    I.e. it will be statically linked. No fluffy python bits.
  • Good documentation from the start. I had to say this first but I think
    it's boring so I put it last.


So yeah, another problem with pkgcore is that it's not documented well enough.
Because it if was, it was in the stage tarballs already.

Got a slightly off-topic question: how do some of you know that I don't know
how to handle the complexity of use flags? :p Did I say anything wrong
(except my O(m log n) madness)?
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Wed Sep 24, 2014 8:34 pm    Post subject: Reply with quote

Al-Caveman wrote:

Got a slightly off-topic question: how do some of you know that I don't know
how to handle the complexity of use flags? :p Did I say anything wrong
(except my O(m log n) madness)?

Personally how you speak about it
Quote:
In average, to me it seems that z = log n, where n is total number of
candidate packages under P1 down to the end of the tree. In other words, n =
total number of children of P1 + total number of children of the children of
P1 + ... etc. Probably, in average, n = x + x^2 + x^3 + .. + x^z.

Then I guess it should be O(x * y * log n)

However, I think that x and y are usually small numbers. While n is usually
noticeably bigger.

So to me it seems that, in reality, it's like O(small_num * small_num * log
large_num).

I think all this is the average time complexity as I assumed z = log n.
But the worst time complexity would be z = n (instead of z = log n). So I
guess that the worst time complexity would be O(x*y*n) which is madness. But
this is very rare to happen in my view.

Just compare with http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
This time, the guy knows obviously what he is speaking about, the algo pickup was discuss in a book he cite (of course we can safely suppose he had read more than just that book aiming at that algo)... He also clearly look at the bsd implementation of grep (and not just tell anyone bsd grep is slow, but actually REALLY look at its code). He explains what tricks and how the tricks will be use to actually met the goal.



To sum up : he looks like a Jedi Master, and you looks like a padawan
Back to top
View user's profile Send private message
Al-Caveman
n00b
n00b


Joined: 21 Sep 2014
Posts: 39

PostPosted: Thu Sep 25, 2014 2:25 am    Post subject: Reply with quote

Thanks man, because of you posting that link, I learned about the BMH
algorithm =)

But this is sort of superficial. I.e. looking/sounding doubtful about the time
complexity analysis (a topic) does not mean that I don't know how portage
should work to properly resolve deps (another topic).

I could obviously be wrong about the dep resolution process too, but it would
be better if someone tells me what is wrong about them (as opposed to saying
"you were wrong in a different topic, therefore you must be wrong now too").
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum