Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Why has portage become so slow
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
Aiken
Apprentice
Apprentice


Joined: 22 Jan 2003
Posts: 239
Location: Toowoomba/Australia

PostPosted: Thu Oct 03, 2013 3:11 am    Post subject: Reply with quote

XavierMiller wrote:
And compared to paludis ?


After me being unhappy with the amount of time I spend waiting for emerge do not ask how many hours this took to set up. Was not going to try an unknown package manager on a live system was but curious enough to set up a test machine.

Replicating the server install

emerge -pvuDN world 1 min 16 sec
emerge -pveuDN world 1 min 17 sec 597 packages
cave resolve world 1 min 36
cave resolve --complete world 1 min 55 sec

Replicating one of the desktop installs

emerge -pvuDN world 2 min 18 sec
emerge -pveuDN world 2 min 30 sec 1209 packages
cave resolve world 3 min 8 sec
cave resolve --complete world 3 min 30 sec
_________________
Beware the grue.
Back to top
View user's profile Send private message
xaviermiller
Bodhisattva
Bodhisattva


Joined: 23 Jul 2004
Posts: 8708
Location: ~Brussels - Belgique

PostPosted: Thu Oct 03, 2013 10:06 am    Post subject: Reply with quote

Ok, it's worse, and due to the large amount of the portage tree.

Cant we imagine a modular tree, grouping some packages and classes into ... oh... overlays ?

So have a base tree + KDE tree + Gnome tree + ...

And a function that adds the needed subtree if required.

?
_________________
Kind regards,
Xavier Miller
Back to top
View user's profile Send private message
Yamakuzure
Advocate
Advocate


Joined: 21 Jun 2006
Posts: 2284
Location: Adendorf, Germany

PostPosted: Thu Oct 03, 2013 12:51 pm    Post subject: Reply with quote

How about substituting /var/db/pkg with an sqlite database? Parsing over 44k files (on my laptop) on *each* run is surely nothing that makes portage (or paludis, or whatever) quick...
_________________
Important German:
  1. "Aha" - German reaction to pretend that you are really interested while giving no f*ck.
  2. "Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Thu Oct 03, 2013 1:41 pm    Post subject: Reply with quote

Yamakuzure wrote:
How about substituting /var/db/pkg with an sqlite database?

Portage already caches it in /var/cache/edb/vdb_metadata.pickle, and even if it would not do, the linux kernel would have cached it after the first run. As I mentioned, you can also turn /var/db into squashfs which is then very likely much faster than a sqlite database even at the first run.
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Thu Oct 03, 2013 7:30 pm    Post subject: Reply with quote

Aiken wrote:
TomWij wrote:
I'm not too bothered of my entire world dependency tree resolution taking four minutes; with the amount of packages I have, it doesn't really seem too much...


It adds up. By the time I get to the 12th machine the short time of apt-get upgrade to see if anything needs updating gets quite tempting.


Eh, why not just do all the machines at once in parallel?

XavierMiller wrote:
Ok, it's worse, and due to the large amount of the portage tree.

Cant we imagine a modular tree, grouping some packages and classes into ... oh... overlays ?

So have a base tree + KDE tree + Gnome tree + ...

And a function that adds the needed subtree if required.

?


It is the dependency tree of your world set that needs to get smaller; you can easily exclude certain directories and ebuilds if you want to, but it will not really have an effect.

mv wrote:
Yamakuzure wrote:
How about substituting /var/db/pkg with an sqlite database?

Portage already caches it in /var/cache/edb/vdb_metadata.pickle, and even if it would not do, the linux kernel would have cached it after the first run. As I mentioned, you can also turn /var/db into squashfs which is then very likely much faster than a sqlite database even at the first run.


Exactly; what might just be needed here is caching a bit more, if you look at how much /etc/portage changes between merges (which is not much) and the part of the Portage tree that changes between syncs (which is also not much) you can easily cache the calculations results that are based on /etc/portage entries, then when the set of USE flags changes update the cache for those packages / USE flags; which should make USE reduce go lightning fast in the general case (because it a hash table lookup compared to parsing, matching USE flags and reducing). It seems like a simple idea, but it is a bit harder to implement though; I would like to, but I don't know the Portage code well enough yet.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Fri Oct 04, 2013 5:30 am    Post subject: Reply with quote

TomWij wrote:
if you look at how much /etc/portage changes between merges (which is not much)

Most people will have a rather small /etc/portage. Just recently I enlarged my /etc/portage/package.use by ~200 entries (now ~600 entries) - which is probably much more than most people have - among them several with wildcards, and practically I did not feel any time increase (I did not measure exactly though): It seems that applying /etc/portage on the tree's data does not cost much time (OTOH caching the modified data would cost much memory and/or disk space, maybe even dropping effiiciency on small memory systems [1GB or less of RAM is still not so infreqeuent]). The tree's data is also very quickly obtained through the metadata/* directory. Summarizing, I do not see much potential for optimizing here.
Back to top
View user's profile Send private message
Aiken
Apprentice
Apprentice


Joined: 22 Jan 2003
Posts: 239
Location: Toowoomba/Australia

PostPosted: Fri Oct 04, 2013 6:02 am    Post subject: Reply with quote

TomWij wrote:

Eh, why not just do all the machines at once in parallel?


They are not all available at the same time. Depending what a computer is currently being used for the extra cpu and io from emerge would be unwelcome. At least 1 computer easier to work around the user. Planning a dummy spit the next time that computer gets turned off mid update. They are not all exactly the same so an emerge -pvuDN world on 1 machine won't be definite guide for the others.

Had thought if have each machine sync via cron then email me if there are updates. Just checked and emerge returned 0 on a machine with no updates and returned 0 on a machine with updates. If that had worked it would have at least removed/hidden some of the waiting.
_________________
Beware the grue.
Back to top
View user's profile Send private message
xaviermiller
Bodhisattva
Bodhisattva


Joined: 23 Jul 2004
Posts: 8708
Location: ~Brussels - Belgique

PostPosted: Fri Oct 04, 2013 6:39 am    Post subject: Reply with quote

TomWij wrote:
XavierMiller wrote:
Ok, it's worse, and due to the large amount of the portage tree.

Cant we imagine a modular tree, grouping some packages and classes into ... oh... overlays ?

So have a base tree + KDE tree + Gnome tree + ...

And a function that adds the needed subtree if required.

?


It is the dependency tree of your world set that needs to get smaller; you can easily exclude certain directories and ebuilds if you want to, but it will not really have an effect.


My world is really simple, and has only entries for a common desktop environment : razor-qt, firefox, abiword, gnumeric, qt-creator, and some audio applications.
It is really clean, and I don't have exotic USE / masks / unmasks.

Don't blame the user, that config is the same for about 10 years, but the PMS cannot manage that "so huge world set"...
_________________
Kind regards,
Xavier Miller
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Fri Oct 04, 2013 7:49 am    Post subject: Reply with quote

mv wrote:
TomWij wrote:
if you look at how much /etc/portage changes between merges (which is not much)

Most people will have a rather small /etc/portage. Just recently I enlarged my /etc/portage/package.use by ~200 entries (now ~600 entries) - which is probably much more than most people have - among them several with wildcards, and practically I did not feel any time increase (I did not measure exactly though): It seems that applying /etc/portage on the tree's data does not cost much time (OTOH caching the modified data would cost much memory and/or disk space, maybe even dropping effiiciency on small memory systems [1GB or less of RAM is still not so infreqeuent]). The tree's data is also very quickly obtained through the metadata/* directory. Summarizing, I do not see much potential for optimizing here.


It costs about 10% of the time looking at the profile data (seen in first image); that is like making 40 seconds into 4 seconds, which is quite a gain on 4 minutes. You can't really half the time; so, we need to start somewhere. Please note that things like dependency string USE reduction are not cached in any way and are actually calculated. Embedded systems usually have smaller worlds; so, it scales down to them as well.

Aiken wrote:
TomWij wrote:

Eh, why not just do all the machines at once in parallel?


They are not all available at the same time. Depending what a computer is currently being used for the extra cpu and io from emerge would be unwelcome. At least 1 computer easier to work around the user. Planning a dummy spit the next time that computer gets turned off mid update. They are not all exactly the same so an emerge -pvuDN world on 1 machine won't be definite guide for the others.

Had thought if have each machine sync via cron then email me if there are updates. Just checked and emerge returned 0 on a machine with no updates and returned 0 on a machine with updates. If that had worked it would have at least removed/hidden some of the waiting.


That's a scheduling problem; you can just lower its nice, then it shouldn't bother what the computer is used for for the majority of cases.

PORTAGE_NICENESS="19"
PORTAGE_IONICE_COMMAND="ionice -c 3 schedtool -D \${PID}"

Of course make sure to not set wild jobs or load averages in MAKEOPTS or EMERGE_DEFAULT_OPTS.

As for checking what emerge did, you can grep the messages added to /var/log/emerge.log since you have started the emerge; the easiest way is to move emerge.log away as then it will only contain what's new, after you have grepped it you could then append it to the moved away file. Yes, emerge itself can't really do that given the multiple things it allows you to do the return code only expresses whether there was a failure.

XavierMiller wrote:
My world is really simple, and has only entries for a common desktop environment : razor-qt, firefox, abiword, gnumeric, qt-creator, and some audio applications.
It is really clean, and I don't have exotic USE / masks / unmasks.

Don't blame the user, that config is the same for about 10 years, but the PMS cannot manage that "so huge world set"...


Run `emerge -ep --tree --unordered-display ...` with my output patch on the five examples you gave and you will find out that it runs five times due to slot conflicts, each taking around a minute here; after that it gives you a whole tree of ~525 packages, and that's excluding the audio applications and anything else you run.
Back to top
View user's profile Send private message
schorsch_76
Guru
Guru


Joined: 19 Jun 2012
Posts: 450

PostPosted: Fri Oct 04, 2013 8:23 am    Post subject: Reply with quote

Someone mentioned sys-apps/paludis in this thread. I have read the homepage from it. It seems that is total parallel approach to portage. No python in its core. It uses the ebuild, but maintain it's own database. The mailing list archive is pretty quiet. Does anyone here use paludis?

Should this be taken to a new thread?
Back to top
View user's profile Send private message
xaviermiller
Bodhisattva
Bodhisattva


Joined: 23 Jul 2004
Posts: 8708
Location: ~Brussels - Belgique

PostPosted: Fri Oct 04, 2013 8:57 am    Post subject: Reply with quote

I still don't understand why this is a problem to have 600 installed packages. Before, emerge -DuNav world was faster.
Don't shoot the messenger ;)
_________________
Kind regards,
Xavier Miller
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Fri Oct 04, 2013 9:01 am    Post subject: Reply with quote

Because the underlying dependency tree has became more complex in size and diversity.
Back to top
View user's profile Send private message
xaviermiller
Bodhisattva
Bodhisattva


Joined: 23 Jul 2004
Posts: 8708
Location: ~Brussels - Belgique

PostPosted: Fri Oct 04, 2013 9:23 am    Post subject: Reply with quote

I will stop the here, I don't accept that kind of arguments that push the user to get more powerful hardware.

I don't find emerge useable because the added complexity and I will try to find simpler alternatives.
_________________
Kind regards,
Xavier Miller
Back to top
View user's profile Send private message
ulenrich
Veteran
Veteran


Joined: 10 Oct 2010
Posts: 1480

PostPosted: Fri Oct 04, 2013 9:59 am    Post subject: Reply with quote

XavierMiller wrote:
I don't find emerge useable because the added complexity and I will try to find simpler alternatives.

In contrast to Xavier it is bearable for me. I value the completeness of a software manager much higher. I know of Archlinux keeping that simplicity Xavier requests at the price of user intervention.

But I hope there is progress here at Gentoo:
- Instead of ever a new solver run when a subslot is found, why not one only run with a kind of preprocessor?
- Or, would it help to externalize solving dependencies by using dev-libs/libzypp?

But I guess in the long run the portage tree will grow exponentially like it did in the past.
- Is it suitable then to partition the tree (@system @apps @emulation) ?
- Another way would be to totally seperate the two releases Gentoo maintains (stable~unstable)
- A third possibility is like the path of systemd: Getting rock solid ground of underlying commodities it will be possible to unify the @system.


Last edited by ulenrich on Fri Oct 04, 2013 10:15 am; edited 1 time in total
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Fri Oct 04, 2013 10:10 am    Post subject: Reply with quote

Well, there is no need for argumentation; you are experiencing Moore's law.

There are not much alternatives. Paludis was shown to be slower in this thread, though I can't and won't confirm that. The other alternative is to wait for pkgcore to fully catch up with EAPI 5 (and soon EAPI 6), which was written to be more efficient but I have not yet tried out to see how much of that is true. Catching up with Portage is the main concern here; no use in running something, when it doesn't work most of the time or misses features that spare out a lot of time in other places.

Why are there not much alternatives? Because not much people are interested in writing them. With a team of a few developers, you can get a lot further...

I did a small start on my own, but seeing the amount of work involved it is too much to do alone if you are not deeply interested it in; the time to do that isn't worth as a way to deal with the time I am trying to spare out. In the time that I write an alternative, I could run the dependency tree calculation a thousand times....
Back to top
View user's profile Send private message
xaviermiller
Bodhisattva
Bodhisattva


Joined: 23 Jul 2004
Posts: 8708
Location: ~Brussels - Belgique

PostPosted: Fri Oct 04, 2013 10:17 am    Post subject: Reply with quote

For the moment, a good alternative for me is :
- go back home
- start the Gentoo machines
- emerge --sync && emerge -DuNav @world
- go down with the family (eating, bath of the kid)
- go back to the Gentoo machines

;)

I don't think working in a yet-another-cool-pms-I-will-write-alone is a good long-term solution, I would prefer let the Gentoo team continue his good work :)
_________________
Kind regards,
Xavier Miller
Back to top
View user's profile Send private message
Yamakuzure
Advocate
Advocate


Joined: 21 Jun 2006
Posts: 2284
Location: Adendorf, Germany

PostPosted: Fri Oct 04, 2013 10:19 am    Post subject: Reply with quote

mv wrote:
Yamakuzure wrote:
How about substituting /var/db/pkg with an sqlite database?

Portage already caches it in /var/cache/edb/vdb_metadata.pickle, and even if it would not do, the linux kernel would have cached it after the first run. As I mentioned, you can also turn /var/db into squashfs which is then very likely much faster than a sqlite database even at the first run.
I have, I have. I am one of your happiest squashmount users. ;)
_________________
Important German:
  1. "Aha" - German reaction to pretend that you are really interested while giving no f*ck.
  2. "Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Fri Oct 04, 2013 10:39 am    Post subject: Reply with quote

XavierMiller wrote:
I don't think working in a yet-another-cool-pms-I-will-write-alone is a good long-term solution, I would prefer let the Gentoo team continue his good work :)


Continuing that great work is really only a short term solution, and it is only a small team that works on it so it is not necessarily reliable in the long term; any day, its development could crawl to a halt, whatever the event is (age, illness, accident [see Seth Vidal], ...; hopefully not tragic). For a long term solution, refactoring or rewriting it with a larger team will eventually be necessary to have Gentoo live on with the increasing complexity.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9523
Location: beyond the rim

PostPosted: Fri Oct 04, 2013 10:54 am    Post subject: Reply with quote

TomWij wrote:
There are not much alternatives.

There is the alternative to solve the core problem and make the relevant EAPI5 features optional. Of course that would not go well with the PMS, formal-correctness, everything-automagic and no-user-responsibility factions so not likely not happen.
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Fri Oct 04, 2013 11:01 am    Post subject: Reply with quote

Genone wrote:
TomWij wrote:
There are not much alternatives.

There is the alternative to solve the core problem and make the relevant EAPI5 features optional. Of course that would not go well with the PMS, formal-correctness, everything-automagic and no-user-responsibility factions so not likely not happen.


The ones contributing to time are already optional, as shown earlier in this thread; even if it weren't, it is a fairly simple patch to apply. The problem in doing so is that by sparing out that half minute, you get to spend much more time elsewhere due to the reverse dependencies not having been rebuilt. It is a lot more than just EAPI 5 that this problem consists of...
Back to top
View user's profile Send private message
gienah
Developer
Developer


Joined: 24 Nov 2010
Posts: 212
Location: AU

PostPosted: Fri Oct 04, 2013 11:04 am    Post subject: Reply with quote

Some things that may help:

1) For overlays that use thin manifests is may help to generate the metadata when syncing, as described here
in the section Quickest start:

https://github.com/gentoo-haskell/gentoo-haskell/blob/master/README.rst

which you would have to adapt for any overlays your are using, I just run it for
every overlay:

Code:

argus ~ # cat /etc/eix-sync.conf
*
@egencache --jobs="$(($(nproc) + 1))" --repo=emacs --update
@egencache --jobs="$(($(nproc) + 1))" --repo=gentoo-haskell --update
@egencache --jobs="$(($(nproc) + 1))" --repo=lisp --update
@egencache --jobs="$(($(nproc) + 1))" --repo=x-portage --update
argus ~ #


2) You could try portage with the python3 use flag:

sys-apps/portage python3

I guess there is no point listening to me though, as I do both of these and its still slow
calculating dependencies (with python-3.3.2-r2). I suspect this is because I have hundreds
of Haskell packages installed, which use EAPI=5 subslot depends, in addition to lots of
other packages in portage which also use EAPI=5 subslot depends.

It used to take quite a while to run revdep-rebuild. The EAPI=5 subslot depends
stuff means that emerge is duing a lot of these calculations, so that running
revdep-rebuild usually find nothing to do.

3) Instead of using the python3 portage use flag, you could try pypy. I have not
tried that. I am tempted, just have not got around to trying it yet. OK I'll come
clean, I'm hoping someone brave will try it and let us know if it works and is
any faster :D

pypy requires at least 4GB of memory and hours to build. It would be best to
monitor it at first, if the pypy build starts swapping then I'd kill it and try
reducing the values of:

Code:

EMERGE_DEFAULT_OPTS="-jM"
MAKEOPTS="-jN"


in /etc/portage/make.conf to lower values (like -j2 or -j1). Apparently if pypy
starting swapping during the build it may thrash for who knows how long.

With sufficient memory there is some chance the emerge may be able to calculate
its dependencies faster with the pypy jit.
Back to top
View user's profile Send private message
xaviermiller
Bodhisattva
Bodhisattva


Joined: 23 Jul 2004
Posts: 8708
Location: ~Brussels - Belgique

PostPosted: Fri Oct 04, 2013 11:15 am    Post subject: Reply with quote

Hello,

Waiting for pkgcore, I will try your suggestions (egencache + python3 USE). For the moment, I blocked Python 3 in order to have only one common interpreter.
_________________
Kind regards,
Xavier Miller
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Fri Oct 04, 2013 11:15 am    Post subject: Reply with quote

Using PyPy here, just like Python 3 it doesn't really speed up things; because it results in no change in complexity.

It doesn't have the Global Interpreter Lock though; so, one could possibly turn it into a parallel dependency tree calculation.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9523
Location: beyond the rim

PostPosted: Fri Oct 04, 2013 2:39 pm    Post subject: Reply with quote

TomWij wrote:
It is a lot more than just EAPI 5 that this problem consists of...

Yeah I know. People should have stopped messing with dependencies after slot operators, all the (dependency)features added after that look good on paper and make the vocal factions happy but add up tons of complexity and abuse potential while providing questionable practical benefit (like REQUIRED_USE). No idea how users are supposed to figure out some of the modern-day error messages that even I don't understand. Pretty sure magically triggered rebuilds without seeing the reason will blow up badly in the long run.
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Fri Oct 04, 2013 4:49 pm    Post subject: Reply with quote

Genone wrote:
TomWij wrote:
It is a lot more than just EAPI 5 that this problem consists of...

Yeah I know. People should have stopped messing with dependencies after slot operators, all the (dependency)features added after that look good on paper and make the vocal factions happy but add up tons of complexity and abuse potential while providing questionable practical benefit (like REQUIRED_USE).


Guess the features could be individually benchmarked to see which ones actually do add much more time to the dependency tree calculation.

Genone wrote:
No idea how users are supposed to figure out some of the modern-day error messages that even I don't understand. Pretty sure magically triggered rebuilds without seeing the reason will blow up badly in the long run.


While the error messages could be a bit more verbose (as in documenting the syntax in place, link to extra documentation, etc...) they aren't really hard to figure out once you know the syntax; or at least, I haven't seen an unresolvable one for a very long time. Errors really become more unresolvable if you start to play with things like --dynamic-deps, which changes the ways dependencies are read out; as you'll need to debug /var/db/pkg going down that road.

With --tree --unordered-display you can see the possible dependencies that cause a magically triggered rebuild (marked with a small "r"); but to actually figure out which one, you will need to look into the ebuild. I'm not quite sure how one would express that in the emerge output; well, one could add a letter "m" to denote the parent will be rebuild, but I think output gets to be a mess at that point.

I don't really mind an optional feature like this adding half a minute of time to the dependency tree, or it not being entirely clear without looking at the ebuilds (why would I need to even know why it rebuilds?); because it spares out having to waste minutes otherwise.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next
Page 2 of 7

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum