Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Discussion & Documentation Gentoo Chat
  • Search

Emerge - the future?

Opinions, ideas and thoughts about Gentoo. Anything and everything about Gentoo except support questions.
Post Reply
  • Print view
Advanced search
123 posts
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next
Author
Message
Tronic
Apprentice
Apprentice
Posts: 194
Joined: Mon Jul 28, 2003 11:17 am
Location: Finland
Contact:
Contact Tronic
Website

Emerge - the future?

  • Quote

Post by Tronic » Thu Aug 14, 2003 1:30 pm

Not going to start a flamewar, nor propose that all Gentoo tools must be rewritten right now. But really, sooner or later it will have to rewritten. The purpose of this thread is to discuss what features it could have and how it could be implemented. Please don't post "funny" post about coffee maker and kicthen sink integration, thanks.

I'm sorry if this thread is a dupe - didn't find one already existing with a quick look, so posted a new one.

So, here's some ideas I have:


Design / code:

-Written in C++.

-Using internal text parsing, etc. (not calling awk or other external tools, because that is just slow).

-Threaded - after figuring out the dependancies, it could probably begin downloading all the packages (there could be a limit for simultaneus downloads, of course) and after some downloads are finished, the building of those packages can begin (assuming of course that the deps required for building are met). This gives really much improved performance, because network I/O and CPU-intensive compiling running side-by-side don't give any performance hit to each-other.

-Caching of databases (so that it doesn't need to read thousands of files from the filesystem every time, but could just check the headers of all packages from a single file) (and better make the database binary, not XML: I can't understand why everyone wants to use XML for everything these days, even when it really doesn't fit).

-Probably no internal UI, but only an interface for making one (and then one for curses, one for gtk, etc)


User-oriented features:

-Instead of the regular 50 lines/second build dump, display good overall progress indicators (and optionally the detailed build process in some window). This is getting even more important with threads!

-When something is masked, some errors occur etc, automatically display the _entire_ dependacy tree leading to that unsolvable error. This makes solving and understanding the problem a lot easier.

-Some intelligent system for pretty much automatically sharing the downloaded packages (distfiles) in a LAN (especially useful in campuses or home networks with many Gentoo boxes behind a single network link). Should probably be P2P, and not require setting up a real server and configuring all boxes to use it (and this is big enough to be a separate software project).


So, go on, throw in your ideas and needs... I'd like to hear what others have had in mind on how to improve this.

Feel free to also tell where I made mistakes. I realize that some of the things in my list are very difficult to implement, but I also know that all are doable.
Top
Tronic
Apprentice
Apprentice
Posts: 194
Joined: Mon Jul 28, 2003 11:17 am
Location: Finland
Contact:
Contact Tronic
Website

  • Quote

Post by Tronic » Thu Aug 14, 2003 1:39 pm

One addition, for which I don't have a clue on how to actually implement:

-Intelligently figure out which mirror to use and download packages from mirrors from which they are available from (at the rate new packages are appearing, soon it will be impossible to keep all packages on all mirrors; I think we all also have noticed the problem of recent software versions not being mirrored yet, when we are trying to fetch).
Top
Lovechild
Advocate
Advocate
User avatar
Posts: 2858
Joined: Fri May 17, 2002 12:00 pm
Location: Århus, Denmark

  • Quote

Post by Lovechild » Thu Aug 14, 2003 1:55 pm

psstt...

www.zynot.org - the future of Portage :)
Don't listen to sparc developers....
Top
Senso
Apprentice
Apprentice
User avatar
Posts: 250
Joined: Tue Jun 17, 2003 12:40 am
Location: Montreal, Quebec
Contact:
Contact Senso
Website

  • Quote

Post by Senso » Thu Aug 14, 2003 2:19 pm

Why do we see "Portage should be rewritten in C/C++" almost once a week? I'm a Python whore and so, I don't see the point in this statement. :)

Threading, caching and text parsing are all possible with Python. Sure, C/C++ is faster, but I really don't care about a speedup of 3% or 15%.
Python makes it easier for anyone to write ebuilds or fix a problem in an existing ebuild. Go try that with C. At least, C/C++ Portage could use some scripting language like Lua for the ebuilds to permit modularity but in this case, I don't see why we should drop Python.

I agree with a mid-to-long term rewrite, but I don't think it needs to be in another language.
Top
Ox-
Guru
Guru
User avatar
Posts: 305
Joined: Thu Jun 19, 2003 4:43 am

  • Quote

Post by Ox- » Thu Aug 14, 2003 2:49 pm

Lovechild wrote:psstt...

www.zynot.org - the future of Portage :)
http://www.joelonsoftware.com/articles/ ... 00069.html
Tronic wrote:Not going to start a flamewar, nor propose that all Gentoo tools must be rewritten right now. But really, sooner or later it will have to rewritten.
Don't let the above essay stop you... scratch whatever itch is bothering you. You have some good ideas, although I suspect it's a going to be a lot harder than you are expecting.

Anyway, the only feature I'll suggest is that this new tool should remain compatible with emerge as far as database formats, otherwise it'll never make it out of the gate unless you start your own new distribution as well :wink:
Top
Ox-
Guru
Guru
User avatar
Posts: 305
Joined: Thu Jun 19, 2003 4:43 am

  • Quote

Post by Ox- » Thu Aug 14, 2003 3:08 pm

Senso wrote:Threading, caching and text parsing are all possible with Python. Sure, C/C++ is faster, but I really don't care about a speedup of 3% or 15%.
To reinforce what you're saying, real speedups are going to come with improving the algorithms, not the language. If changing the algorithms to take advantage of threading will make a C++ version run faster, then it'll make a Python version run faster as well.

Also, I've only been using Gentoo a few months, but it looks to me like 99.9% of the time on an emerge is compiling with gcc (written in C) + rsync (yep, in C) + make (C) + wget (C). So, I'd be skeptical a C++ rewrite could even provide a 1% speedup of the overall emerge process.
Top
far
Guru
Guru
User avatar
Posts: 394
Joined: Mon Mar 10, 2003 12:30 am
Location: Stockholm, Sweden
Contact:
Contact far
Website

Re: Emerge - the future?

  • Quote

Post by far » Thu Aug 14, 2003 3:10 pm

Tronic wrote:-Written in C++.
There is no point. The only thing about portage that is slow is searching for packages, but that has nothing to do with Python. A rewrite in C++ would merely introduce bugs and make the source code several times as big.
-Using internal text parsing, etc. (not calling awk or other external tools, because that is just slow).
Does Portage use awk now? This is also much easier to do in Python.
-Threaded
I know people are working on this, although I think they would rather use multiple processes.
-Caching of databases
I believe there is a portage implementation with a database back end. There is nothing wrong with using xml. Unlike binary data, it is readable (and debuggable) by human beings.
-Probably no internal UI, but only an interface for making one
There is a (low-level) Python api, but it is not well documented.
-Instead of the regular 50 lines/second build dump, display good overall progress indicators
And how would the program know how far the build process has come? Make does not provide such information.

All these things have been discussed many times in other threads.
The Porthole Portage Frontend
Top
aethyr
Veteran
Veteran
User avatar
Posts: 1085
Joined: Sun Apr 06, 2003 5:16 pm
Location: NYC

  • Quote

Post by aethyr » Thu Aug 14, 2003 3:12 pm

Senso wrote:Why do we see "Portage should be rewritten in C/C++" almost once a week?
For the same reason that we see people saying that getting rid of the client/server model of XFree will magically make their desktops faster, without any real understanding of the issues involved. Frankly, it wouldn't matter much (proportionally) if emerge was an order of magnitude slower. I'd say that when it comes to any decent sized package, maybe 5% of the total time is due to the emerge tool.

Say emerge takes 60 seconds to get something done (which is longer than it really does take). You spend another 60 seconds downloading the package, at 100k a second, that's a 6mb package, something the size of mozilla-firebird maybe. You then spend an hour compiling the package (I'm not sure how long it really takes, but we're working on easy units here).

You've just spent 3720 seconds, 1.61% of which was spent in the emerge tool.

Let's say you make emerge 100 times faster (extraordinarily unreasonable, since it's doing a lot of disk access). You now have spent 0.6 seconds in "emerge", for a total of 3660.6 seconds, 0.0016% of which spent in "emerge". However, you've only saved yourself 59.4 seconds, or 1.6% of the total 3720 seconds originally spent emerging the package.

That's for a 6MB package. Even for a 2MB package, you still only save yourself 4.5% of the time spent. And that's if you make emerge 100 times faster (which will never happen, since most of the time is probably spent reading files off the disk).

If you make it 10 times faster, you see those numbers drop to 1.45% saved, and 4.1% saved. And that's starting off with the assumption that "emerge" took a full minute to do its job (which it doesn't).

If you think that coding emerge in C/C++ will suddenly make things better, you're really looking at the wrong bottlenecks.
Last edited by aethyr on Thu Aug 14, 2003 3:17 pm, edited 1 time in total.
Top
Senso
Apprentice
Apprentice
User avatar
Posts: 250
Joined: Tue Jun 17, 2003 12:40 am
Location: Montreal, Quebec
Contact:
Contact Senso
Website

  • Quote

Post by Senso » Thu Aug 14, 2003 3:15 pm

Ox- wrote: Also, I've only been using Gentoo a few months, but it looks to me like 99.9% of the time on an emerge is compiling with gcc (written in C) + rsync (yep, in C) + make (C) + wget (C). So, I'd be skeptical a C++ rewrite could even provide a 1% speedup of the overall emerge process.
True, it's an even better argument than what I wrote earlier. The Python stuff in Portage is used to call apps written in C. Python calls wget and tells it to get the source from $WEBSITE, etc. Most of the computing comes from C binaries.
So, I think there are many different ways to improve the Python code. Full threading like Tronic explained would greatly help. But since threading is "optional" in Python, not everyone could use it. In any case, it's a good idea.
Top
axxackall
l33t
l33t
User avatar
Posts: 651
Joined: Wed Nov 06, 2002 4:04 pm
Location: Toronto, Ontario, 3rd Rock From Sun

  • Quote

Post by axxackall » Thu Aug 14, 2003 3:19 pm

Senso wrote:Why do we see "Portage should be rewritten in C/C++" almost once a week? I'm a Python whore and so, I don't see the point in this statement. :)
Perhaps we should begin every week with a new thread like "let's rewrite all scripts to Python!", just to compensate C/C++ zealots :)

Hmm, that could be interesting system, where ALL non-3rd-party (all that belongs to Gentoo itself) software is written on Python: Portage, initscripts, installation scripts, system (network, user, disk etc) management tools, various other tools and utilities ... Even UI for all of that must be written with Tkinter.

So, what are C/C++ zealots supposed to do? If they are really skilled in C/C++ then they should help vendors of gcc, mozilla etc. Otherwise they should learn Python :)

P.S. Forgot to mention: in future Gentoo should be no place for Perl, Ruby, Tcl, Java - anything that is not Python.

P.P.S. It was really a joke ... mostly :)
"Lisp is a programmable programming language." - John Foderaro, CACM, September 1991
Top
charlieg
Advocate
Advocate
User avatar
Posts: 2149
Joined: Tue Jul 30, 2002 11:05 am
Location: Manchester UK
Contact:
Contact charlieg
Website

  • Quote

Post by charlieg » Thu Aug 14, 2003 3:27 pm

The main new features of portage will be the use of a DB.

Usage of a basic DB (Berkerly, anybody? or Metakit?) to start with. Then things like dependencies (forward and reverse) can be established incredibly quickly.

The 'rewrite in C++' arguments are always naive. There is not real reason to do this.

Zynot is going nowhere fast.
Want Free games?
Free Gamer - open source games list & commentary

Open source web-enabled rich UI platform: Vexi
Top
Ox-
Guru
Guru
User avatar
Posts: 305
Joined: Thu Jun 19, 2003 4:43 am

  • Quote

Post by Ox- » Thu Aug 14, 2003 3:35 pm

axxackall wrote:Perhaps we should begin every week with a new thread like "let's rewrite all scripts to Python!", just to compensate C/C++ zealots :)
I think we should change portage to use SCons instead of make! :twisted:
Top
Senso
Apprentice
Apprentice
User avatar
Posts: 250
Joined: Tue Jun 17, 2003 12:40 am
Location: Montreal, Quebec
Contact:
Contact Senso
Website

  • Quote

Post by Senso » Thu Aug 14, 2003 3:40 pm

charlieg wrote:The main new features of portage will be the use of a DB.

Usage of a basic DB (Berkerly, anybody? or Metakit?) to start with. Then things like dependencies (forward and reverse) can be established incredibly quickly.
Metakit... It uses a HUGE amount of RAM, compared to other indexing/DB systems. Even Jakarta Lucene is maybe 10x better considering RAM usage. Metakit could be a problem to "low quality" hardware users (like me).
Top
Senso
Apprentice
Apprentice
User avatar
Posts: 250
Joined: Tue Jun 17, 2003 12:40 am
Location: Montreal, Quebec
Contact:
Contact Senso
Website

  • Quote

Post by Senso » Thu Aug 14, 2003 3:43 pm

axxackall wrote: Hmm, that could be interesting system, where ALL non-3rd-party (all that belongs to Gentoo itself) software is written on Python: Portage, initscripts, installation scripts, system (network, user, disk etc) management tools, various other tools and utilities ... Even UI for all of that must be written with Tkinter.
Look at the GLIS thread. I'll soon (i.e. 2-3 days) start a Tkinter UI for this project. :) Gentoo Linux Install Script project. The project is still in it's infancy but it looks good to me.
Top
MrPyro
Tux's lil' helper
Tux's lil' helper
Posts: 121
Joined: Thu Aug 14, 2003 10:01 am
Location: Sheffield, England

  • Quote

Post by MrPyro » Thu Aug 14, 2003 4:05 pm

An idea that was mentioned in this thread: http://forums.gentoo.org/viewtopic.php?t=74143

Having Portage mark security updates as such, so that a server administrator can decide to just update security fixes while sticking with earlier versions of other code for stability.

The threading idea sounds good: download the first package, begin to compile it, while downloading the second package in the background.

Also, one thing I find annoying about emerge at the moment is that some packages have warning or informational messages that come up during installations: not compiler warnings, things like the message that tells you an easy way to configure Apache to use mod_php during the mod_php build. I tend to start my "emerge -pU --deep world" process running then go out somewhere, rather than sit and stare at my monitor watching compiler output, so I miss these messages. Some system where these kinds of messages are logged, so that they can be read later, would be helpful.
Top
Senso
Apprentice
Apprentice
User avatar
Posts: 250
Joined: Tue Jun 17, 2003 12:40 am
Location: Montreal, Quebec
Contact:
Contact Senso
Website

  • Quote

Post by Senso » Thu Aug 14, 2003 4:42 pm

MrPyro wrote: Also, one thing I find annoying about emerge at the moment is that some packages have warning or informational messages that come up during installations: not compiler warnings, things like the message that tells you an easy way to configure Apache to use mod_php during the mod_php build. I tend to start my "emerge -pU --deep world" process running then go out somewhere, rather than sit and stare at my monitor watching compiler output, so I miss these messages. Some system where these kinds of messages are logged, so that they can be read later, would be helpful.
Just a quick idea, but you could log normal output (1) to a file, cat it and remove all lines starting with "gcc" with sed. If you add more similar rules, you would maybe still have a lot of crap but it would be easier to browse the log and find useful messages.
An eventual command-line option doing this automatically is a good idea.
Top
()
l33t
l33t
Posts: 610
Joined: Mon Nov 25, 2002 4:10 pm

  • Quote

Post by () » Thu Aug 14, 2003 8:51 pm

Is there some lightweight (possibly object oriented) database system that could be worth looking at apart from metakit?
Top
Senso
Apprentice
Apprentice
User avatar
Posts: 250
Joined: Tue Jun 17, 2003 12:40 am
Location: Montreal, Quebec
Contact:
Contact Senso
Website

  • Quote

Post by Senso » Thu Aug 14, 2003 8:56 pm

() wrote:Is there some lightweight (possibly object oriented) database system that could be worth looking at apart from metakit?
I've been planning on playing with SQLite for a while... It's a SQL database, but embedded in your app directly (so the user doesn't have to download/install MySQL). The original version is for C/C++ but there are *many* wrappers for other languages, including Python, of course. I love the idea of an embedded SQL db but I've never really tried it.
Top
Tronic
Apprentice
Apprentice
Posts: 194
Joined: Mon Jul 28, 2003 11:17 am
Location: Finland
Contact:
Contact Tronic
Website

  • Quote

Post by Tronic » Thu Aug 14, 2003 9:15 pm

Hmm. I was surprised that so many people responded to that C++ part, which I didn't think was a big thing anyway (or maybe my message was too long and they only did read the first entry;).

Okay, let's break the problem up a bit. We have following slow points:
-Searching for packages (-s is too slow, -S is absofraggin'lutelyDAMNIT too slow)
-Figuring deps

The dependancy check isn't too slow at the moment, but I think that's something that is prone to suffer a lot of increased package numbers (6000 packages today, but really it should scale to much, much more than that). Haven't really thought about the algorithm and don't know what the current emerge uses for this, but if it is something that requires n^2 work (where n is the number of packages) or the like.. Well, it'll be real trouble.

Maybe it would be worth it to write these pieces in C/C++ and keep the rest in Python? Especially the searching is something I highly doubt could be fast enough on Python (except if you use some kind of word cache for that, but then you need to generate it and that brings new problems).
Top
Mystilleef
Guru
Guru
User avatar
Posts: 561
Joined: Sun Apr 27, 2003 6:12 pm
Location: Earth
Contact:
Contact Mystilleef
Website

Pure C with a Bash frontend.

  • Quote

Post by Mystilleef » Thu Aug 14, 2003 9:20 pm

C++!? Heck no! I'd rather portage was written in pure C with a Bash frontend. Python is less in the Unix/Linux spirit than Bash is.

Regards,

Mystilleef
simple, sleek and sexy text editor for gnome

"My logic is undeniable."
Top
far
Guru
Guru
User avatar
Posts: 394
Joined: Mon Mar 10, 2003 12:30 am
Location: Stockholm, Sweden
Contact:
Contact far
Website

  • Quote

Post by far » Thu Aug 14, 2003 9:33 pm

Tronic wrote:The dependancy check isn't too slow at the moment, but I think that's something that is prone to suffer a lot of increased package numbers
What does the number of packages have to do with dependancies?
If ebuild foo says "I depend on packages bar and baz", that will not change when the number of packages in portage increase.
Tronic wrote:Maybe it would be worth it to write these pieces in C/C++ and keep the rest in Python? Especially the searching is something I highly doubt could be fast enough on Python (except if you use some kind of word cache for that, but then you need to generate it and that brings new problems).
The speed of Python is not a problem. The problem is that you need to open and read 5000+ files. That will be slow no matter which language you use. Using a database backend could solve this problem.
The Porthole Portage Frontend
Top
carambola5
Apprentice
Apprentice
User avatar
Posts: 214
Joined: Wed Jul 10, 2002 8:53 pm

  • Quote

Post by carambola5 » Thu Aug 14, 2003 9:54 pm

Here's a question: why does each ebuild have its own DESCRIPTION variable? Shouldn't each package have this instead of each ebuild? Sure, the ebuilds could have EBUILD_DESCRIPTION variables that distinguish it from the other ebuilds in the same package, but overall, one piece of software should have one description.

And while we're at it, why does the description have to be a one-liner? Can we get a little more descriptive please? I mean...

Code: Select all

emerge -s koules
......
Description: fast action arcade-style game w/sound and network support
Not very descriptive in my book.
Top
far
Guru
Guru
User avatar
Posts: 394
Joined: Mon Mar 10, 2003 12:30 am
Location: Stockholm, Sweden
Contact:
Contact far
Website

  • Quote

Post by far » Thu Aug 14, 2003 10:08 pm

carambola5 wrote:Here's a question: why does each ebuild have its own DESCRIPTION variable? Shouldn't each package have this instead of each ebuild?
Well, a "package" is really just a directory containing ebuild files ...
carambola5 wrote:And while we're at it, why does the description have to be a one-liner?
Good question. I think multi-lingual descriptions should be possible too.
The Porthole Portage Frontend
Top
Tronic
Apprentice
Apprentice
Posts: 194
Joined: Mon Jul 28, 2003 11:17 am
Location: Finland
Contact:
Contact Tronic
Website

  • Quote

Post by Tronic » Thu Aug 14, 2003 10:24 pm

What does the number of packages have to do with dependancies?
If ebuild foo says "I depend on packages bar and baz", that will not change when the number of packages in portage increase.
Well, of course you then have to follow the trail, ask what bar and baz want, later figuring out if you can automatically solve some conflicts, etc. Once you get into deps of XFree86 libs, you'll soon be effectively travelling via deps of all the graphical apps.. (of course this depends a lot on how "smart" the deps checking system is, because more features == more things to check for)
The speed of Python is not a problem. The problem is that you need to open and read 5000+ files. That will be slow no matter which language you use. Using a database backend could solve this problem.
But if I had that database in one big file, with an average of 500 chars information per package, you'd still have to scan thru several megs of text and naturally also handle the database at the same time. Potentially you'd have to parse it in UTF-8 too. I don't have any benchmarks here on Python, but that might still be too slow (is it?)

Someone suggested SQL.. Dunno about its search performance either, could be fast too..
There is nothing wrong with using xml. Unlike binary data, it is readable (and debuggable) by human beings.
But at the same time it is difficult to read for machines (and it's software that should ever be reading or writing it anyway). The things we are talking about here are simplicity of implementation (escaping all data that isn't ASCII text, escaping XML reserved characters, parsing of tags that use loose syntax (the number of spaces between attributes and other such small things)), storage efficiency (big and ugly tags versus simple binary ones) and performance.
Last edited by Tronic on Thu Aug 14, 2003 10:32 pm, edited 1 time in total.
Top
Tronic
Apprentice
Apprentice
Posts: 194
Joined: Mon Jul 28, 2003 11:17 am
Location: Finland
Contact:
Contact Tronic
Website

  • Quote

Post by Tronic » Thu Aug 14, 2003 10:32 pm

Oh, about the progress indicators..

This box has been doing emerge gnome for around 15 hours now. The obvious problem with the current output is that I don't know what it is installing (no, didn't -p first), what it already has built (the scrollback can't get that far) nor how many packages there still are to go...

While the progress bars can't work when only building a single package, they'd surely be very useful in those operations which take the most time - those which recompile half of the entire system.

(and it doesn't really have to be a bar, many other ways of displaying the information might actually be better)
Top
Post Reply
  • Print view

123 posts
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next

Return to “Gentoo Chat”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic