Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Alternative dependency solver for Portage (a prototype)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
gzoumix
n00b
n00b


Joined: 22 Dec 2017
Posts: 11

PostPosted: Sat Dec 23, 2017 7:25 pm    Post subject: Alternative dependency solver for Portage (a prototype) Reply with quote

I am very happy to announce that few colleagues and I implemented the prototype of a new dependency solver for emerge
This prototype is implemented in python 2.7 and available here: https://github.com/HyVar/gentoo_to_mspl

emerge is an amazing tool, highly optimized and all, but has one default: it can (rarely) fail.
Indeed, in some cases, emerge might not find a solution to an installation problem and give bad advice on how to continue. For instance, trying to install gnome-base/gnome only following emerge's advices (without looking at the documentation https://wiki.gentoo.org/wiki/GNOME/Guide) will only clutter the package.use file.

Our new dependency solver is slower than the one of emerge and has far fewer functionalities (it is a prototype), but never fails:
if a package is installable, our solver will find a solution, and if it is not, it can tell why (with an ugly syntax, sorry).

Of course, this solver can be improved in many ways, and comments, tests, and suggestions will be welcome and very useful.
Back to top
View user's profile Send private message
stqn
n00b
n00b


Joined: 07 Apr 2015
Posts: 47

PostPosted: Sun Dec 24, 2017 9:55 am    Post subject: Reply with quote

To be honest when I read the subject of this thread, I thought you had implemented *faster* dependency calculations, because that’s the big problem of emerge IMO!
But otherwise if your system can be made faster than emerge, it would be great :).
Back to top
View user's profile Send private message
gzoumix
n00b
n00b


Joined: 22 Dec 2017
Posts: 11

PostPosted: Sun Dec 24, 2017 12:43 pm    Post subject: Reply with quote

stqn wrote:
To be honest when I read the subject of this thread, I thought you had implemented *faster* dependency calculations, because that’s the big problem of emerge IMO!
But otherwise if your system can be made faster than emerge, it would be great :).


Thank you for your comment :).
I agree with you that emerge could be quicker.
I see three reasons why emerge can be take a very long time:

  1. the number of packages and the .ebuild format:
    in the portage tree, there are more or less 40,000 packages, each described in one bash script that can inherit from several .eclass.
    Loading one .ebuild file can take time, and loading 40,000 of them even more. emerge already partially solves these problem by using egencache files when possible (it is not always the case), and by being implemented asynchronously (that way, the solving is done at the same time as loading).
  2. the implementation language:
    python, even if it is quite quick, will never be as quick as C. Moreover, python has a global lock which forbids the usage of threads (which could be very useful for emerge).
  3. backtracking:
    emerge explores all the possible installations to find a solution, and this can take a very long time (it can also fail, which thus takes even more time).
    This is where our solver can be quicker, as it does not iterate over all possible solutions but solves the constraints, guaranteeing an answer in limited time (few minutes).

There are numerous ways in which we can improve the speed of our prototype.
However, I think it is important to first ensure the validity of our approach with tests, comments, and suggestions that I didn't think of,
and once the basis is solid, make it as efficient as possible.
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5468
Location: Removed by Neddy

PostPosted: Sun Dec 24, 2017 1:26 pm    Post subject: Reply with quote

gzoumix wrote:

[*] the implementation language:
python, even if it is quite quick, will never be as quick as C. Moreover, python has a global lock which forbids the usage of threads (which could be very useful for emerge).

Not strictly true... poor algorithms hurt more. Paludis is written in C++ and portage beats it (plenty of benchmarks on this). Yes python is "slow" but it doesn't have to be (''.join([str1,str2]) compared to str1+str2). There is then cython which converts python to C so you have the benefit of python to write and the speed of C

Also Python can have threads. The GIL exists because threads are a pain and it is still the simplest and fastest lock to ensure only one method accesses an attribute. You can fork processes in python, you can use the threading.thread module & then you can use the multiprocess module (3.5 onwards) but then that doesn't answer why multithreaded/process would improve things? the old regex adage is true to threading:
Quote:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

threading comes with worlds of pain: sync, locking (yes threading comes with locking that you would need to manage, something the GIL makes transparent from you..)
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6240

PostPosted: Sun Dec 24, 2017 1:36 pm    Post subject: Reply with quote

gzoumix wrote:
backtracking:
emerge explores all the possible installations to find a solution, and this can take a very long time (it can also fail, which thus takes even more time).
This is where our solver can be quicker

Dependency resolution is known to be NP complete. There are of course various ways to attack NP complete problems, but none can be guaranteed to succeed within a reasonable time if the problem has a moderate size. Why do you think that you can give such a guarantee?
Back to top
View user's profile Send private message
gzoumix
n00b
n00b


Joined: 22 Dec 2017
Posts: 11

PostPosted: Sun Dec 24, 2017 4:04 pm    Post subject: Reply with quote

Naib wrote:

Not strictly true... poor algorithms hurt more


Naib wrote:

Also Python can have threads. The GIL exists because threads are a pain and it is still the simplest and fastest lock to ensure only one method accesses an attribute.


Naib wrote:

threading comes with worlds of pain: sync, locking (yes threading comes with locking that you would need to manage, something the GIL makes transparent from you..)


Of course, you are right.
Poorly designed algorithms hurt far more than interpreted languages, and threading (without concurrency) and multiprocessing (duplicating the memory usage of the program in Linux) are available in python. Hence, to the best of my knowledge, threading in python does not make the program quicker, and multiprocessing can take a lot of memory (and a lot of time for copying the memory during a fork).

EDIT:Thinking about your comment, I looked again at the multiprocessing documentation, and creating a pool of workers at the start (thus avoiding the memory usage duplication) and using them with a queue when necessary could work to improve the efficiency of the prototype.
Thank you.

mv wrote:

Dependency resolution is known to be NP complete. There are of course various ways to attack NP complete problems, but none can be guaranteed to succeed within a reasonable time if the problem has a moderate size. Why do you think that you can give such a guarantee?


You are right, there is no hard guarantee (except the fact that the constraint to solve is bounded by the size of the portage tree).
In practice, however, we observe two phenomena:

  1. the number of packages involved in the construction of a constraint is comparatively low (around 4000)
  2. most NP problems have some kind of regularity that is exploited by the solvers to find a solution very quickly, similarly to the OCaml type system that is exponential in general, but linear in practice (this property of the OCaml type system has been explained, while to the best of my knowledge no clear notion of regularity has been identified yet for NP problems).

In our different tests, the solving time never exceeded 3 minutes.

emerge is very efficient for simple (most) installation problems.
But its exploration using backtracking can get slower than an SMT solver that uses specific strategies.
Interestingly, a similar work has been done for debian https://potassco.org/aspcud/ and we expect similar result with this prototype:
- sligthly slower than emerge in most cases (so one can imagine combining the two approaches to have the advantages of both)
- better identification and resolution of problems (e.g., gnome-base/gnome)
This prototype is thus a way to test a new strategy for dependency resolution in portage and to compare it with the one of emerge.
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 459
Location: NRW, Germany

PostPosted: Mon Jan 01, 2018 4:17 pm    Post subject: Reply with quote

So, pkgcore has the speed, you guys have the completeness … you should team up …
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 5281

PostPosted: Mon Jan 01, 2018 9:32 pm    Post subject: Reply with quote

Paludis has the completeness too (and ugly errors, and it's miserably slow, and it's the most user-hostile user-interactive software I've ever seen...)
Anything that can spare users from installing that is probably a good thing.
Back to top
View user's profile Send private message
gzoumix
n00b
n00b


Joined: 22 Dec 2017
Posts: 11

PostPosted: Wed Jan 03, 2018 3:58 am    Post subject: Reply with quote

Dr.Willy wrote:
So, pkgcore has the speed, you guys have the completeness … you should team up …


This is very interesting, I wasn't aware of this project.
Many thanks for the suggestion, I'll try to contact them and see if they are interested to discuss and maybe work together.
pkgcore is far more advanced than our prototype, and their experience would be a huge benefit for us.

Ant P. wrote:

Paludis has the completeness too (and ugly errors, and it's miserably slow, and it's the most user-hostile user-interactive software I've ever seen...)
Anything that can spare users from installing that is probably a good thing.


I very quickly looked at Paludis, and it seems that their approach is different than ours.
In principle, we "simply" translate the REQUIRED_USE and *DEPEND into SAT constraints, feed them to z3 and interpret the result. This should lower the number of possible bugs. In the end, most of the bugs we encountered when implementing our prototype were caused by the difficulty we had to fully understand two things:
i) how portage configures the packages (implicit IUSE flags, USE flag selection, mask, keywords, and licenses);
ii) and how portage manages some constraints in the *DEPEND variables whose semantics is not fully described in the documentation: atom matching and compact forms.
I compiled my understanding of these two points here (for the first point) and here (for the second point). I should check it w.r.t. the official specification that I just found...

Currently, our prototype is not very user-friendly.
First, we need to translate all of portage's packages into SAT constraints (with few annex information).
Then an emerge request with our prototype generates two files (when an installation is possible): a script with a list of packages to install and uninstall, and a package.use file with their USE flag configuration.
Finally, if no installation is possible, the error message is written using the SAT constraints, not the original *DEPEND syntax...

Our goal is to make our prototype as useful as possible, and user-friendliness is a big part of it.
We know how to solve the problems with the initial translation and the bad error messages.
However, suggestions or help from interested people will be necessary to integrate our prototype into the package installation process.

Other aspects for which our prototype might be useful can be the analysis of a portage repository or the exploration of configurations: since packages are translated in SAT constraints, it is easy to check the consistency of the REQUIRED_USE and *DEPEND constraints, compute the minimal system that includes a specific package, etc. I might open a new topic to discuss this.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5147
Location: The Peanut Gallery

PostPosted: Sat Jan 06, 2018 12:45 pm    Post subject: Reply with quote

stqn wrote:
To be honest when I read the subject of this thread, I thought you had implemented *faster* dependency calculations, because that’s the big problem of emerge
Lul, me too :-)
gzoumix wrote:
emerge is an amazing tool, highly optimized and all, but has one default: it can (rarely) fail.
Indeed, in some cases, emerge might not find a solution to an installation problem and give bad advice on how to continue. For instance, trying to install gnome-base/gnome only following emerge's advices (without looking at the documentation https://wiki.gentoo.org/wiki/GNOME/Guide) will only clutter the package.use file.

Our new dependency solver is slower than the one of emerge and has far fewer functionalities (it is a prototype), but never fails:
if a package is installable, our solver will find a solution, and if it is not, it can tell why (with an ugly syntax, sorry).
IDK that sounds a lot like portage to me.
Can you give an example or two, where emerge "fails" and yours succeeds?

ATM it's a bit unclear what "fail" means -- crapping out and telling you with lots of detail about why, isn't failure; so I guess "fail" would mean it doesn't find a solution, when one exists?

Also, have you looked at pkgcore? It is blazingly fast, and correct; it's basically portage2 by the previous main developer of portage, ferringb.

It fell behind with EAPI6, but if you can python, then any work on it would be of real use for the community, as well as providing you with some very optimised python using a C backend (snakeoil.) There's a lot to learn from it, if you're into python, IOW. (Most of the speedups in emerge over the last few years, have been imported from pkgcore; eg: frozen-sets.)
#pkgcore on IRC: chat.freenode.net or .org (might be ##; used to be #gentoo-pkgcore.)
Back to top
View user's profile Send private message
gzoumix
n00b
n00b


Joined: 22 Dec 2017
Posts: 11

PostPosted: Sat Jan 06, 2018 4:14 pm    Post subject: Reply with quote

steveL wrote:

IDK that sounds a lot like portage to me.
Can you give an example or two, where emerge "fails" and yours succeeds?

ATM it's a bit unclear what "fail" means -- crapping out and telling you with lots of detail about why, isn't failure; so I guess "fail" would mean it doesn't find a solution, when one exists?

Sorry if it wasn't clear: like you guessed, when I say "emerge can fail", I mean that emerge may not find a solution to an installation problem, while one does exist.

Example
During the development of our prototype, we tested it on the osboxes virtual machine (minimal) (I didn't want to break my desktop), and tried to install gnome-base/gnome. At that time, I didn't know that gnome didn't work with udev (I use openbox on my desktop)... So, our prototype stated that it wasn't possible (which seemed erroneous), and to debug it, we checked emerge's answer. emerge said that some use flags must be changed in order to continue. I applied the changes, rerun emerge, which told me that other use flags must be changed in order to continue. I followed emerge's instructions 3 or 4 times before checking the documentation and seeing that indeed, our tool was right and emerge's instructions were leading nowhere.

Since then, we improved our prototype and fixed many bugs, and it is able to say that
- by default, gnome cannot be installed
- but changing the default use flags, it can be installed after removing udev

Example
It happened to me many times that, trying to install some packages (I don't remember which ones, sorry), emerge answered me that it wasn't possible due to some conflicts between packages, but increasing the backtracking limit might solve the problem: by default, emerge can miss a solution.

There is two possible solutions at that point: increasing the backtracking limit of emerge, hoping that it will help it to find a solution without taking too much time and without strange behavior (like what happened with gnome); or using another solver, still hoping that it will not take too much time (indeed, as pointed out by mv, dependency resolution is NP-complete).

Our prototype is thus an alternative to emerge's dependency solver with absolute answers: a negative answer means that there is no solution. Additionally, our prototype didn't take too much time during testing (but this has to be confirmed by willing testers/users).


steveL wrote:

Also, have you looked at pkgcore? It is blazingly fast, and correct; it's basically portage2 by the previous main developer of portage, ferringb.

It fell behind with EAPI6, but if you can python, then any work on it would be of real use for the community, as well as providing you with some very optimised python using a C backend (snakeoil.) There's a lot to learn from it, if you're into python, IOW. (Most of the speedups in emerge over the last few years, have been imported from pkgcore; eg: frozen-sets.)
#pkgcore on IRC: chat.freenode.net or .org (might be ##; used to be #gentoo-pkgcore.)

Thanks for the suggestion. Dr.Willy had the same idea, and I left a message on pkgcore's IRC.
I would be very interested to learn from them and maybe contribute to the community (our prototype, written in python and bash, is a first step).

I will also go to FOSDEM this year, to discuss IRL.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 6855

PostPosted: Sat Jan 06, 2018 4:25 pm    Post subject: Reply with quote

gzoumix wrote:
I will also go to FOSDEM this year, to discuss IRL.

I was thinking people only go there to drink beers and have free goodies. :)
Back to top
View user's profile Send private message
gzoumix
n00b
n00b


Joined: 22 Dec 2017
Posts: 11

PostPosted: Sat Jan 06, 2018 5:29 pm    Post subject: Reply with quote

krinn wrote:

I was thinking people only go there to drink beers and have free goodies. :)

Good belgian beer is certainly a plus :)
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5147
Location: The Peanut Gallery

PostPosted: Sun May 20, 2018 11:38 am    Post subject: Reply with quote

Sorry to bump after so long. Just wondering how things are going.
gzoumix wrote:
.. when I say "emerge can fail", I mean that emerge may not find a solution to an installation problem, while one does exist.

Example
During the development of our prototype, we tested it on the osboxes virtual machine (minimal) (I didn't want to break my desktop), and tried to install gnome-base/gnome. At that time, I didn't know that gnome didn't work with udev (I use openbox on my desktop)... So, our prototype stated that it wasn't possible (which seemed erroneous), and to debug it, we checked emerge's answer. emerge said that some use flags must be changed in order to continue. I applied the changes, rerun emerge, which told me that other use flags must be changed in order to continue. I followed emerge's instructions 3 or 4 times before checking the documentation and seeing that indeed, our tool was right and emerge's instructions were leading nowhere.

Since then, we improved our prototype and fixed many bugs, and it is able to say that
- by default, gnome cannot be installed
- but changing the default use flags, it can be installed after removing udev

Example
It happened to me many times that, trying to install some packages (I don't remember which ones, sorry), emerge answered me that it wasn't possible due to some conflicts between packages, but increasing the backtracking limit might solve the problem: by default, emerge can miss a solution.

There is two possible solutions at that point: increasing the backtracking limit of emerge, hoping that it will help it to find a solution without taking too much time and without strange behavior (like what happened with gnome); or using another solver, still hoping that it will not take too much time (indeed, as pointed out by mv, dependency resolution is NP-complete).

Our prototype is thus an alternative to emerge's dependency solver with absolute answers: a negative answer means that there is no solution. Additionally, our prototype didn't take too much time during testing (but this has to be confirmed by willing testers/users).
What you wrote here, sounds like it could first be plugged in as a fallback in case of error (to give the absolute negative answer), and optionally as the main resolver
For portage, in the latter case, not pkgcore in terms of need; though who knows: perhaps you can improve it algorithmically.

How did you get along with the pkgcore folks?

I would like to stress that it sounds promising, which I didn't before. (My bad.)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum