http://www.joelonsoftware.com/articles/ ... 00069.html
Don't let the above essay stop you... scratch whatever itch is bothering you. You have some good ideas, although I suspect it's a going to be a lot harder than you are expecting.Tronic wrote:Not going to start a flamewar, nor propose that all Gentoo tools must be rewritten right now. But really, sooner or later it will have to rewritten.
To reinforce what you're saying, real speedups are going to come with improving the algorithms, not the language. If changing the algorithms to take advantage of threading will make a C++ version run faster, then it'll make a Python version run faster as well.Senso wrote:Threading, caching and text parsing are all possible with Python. Sure, C/C++ is faster, but I really don't care about a speedup of 3% or 15%.
There is no point. The only thing about portage that is slow is searching for packages, but that has nothing to do with Python. A rewrite in C++ would merely introduce bugs and make the source code several times as big.Tronic wrote:-Written in C++.
Does Portage use awk now? This is also much easier to do in Python.-Using internal text parsing, etc. (not calling awk or other external tools, because that is just slow).
I know people are working on this, although I think they would rather use multiple processes.-Threaded
I believe there is a portage implementation with a database back end. There is nothing wrong with using xml. Unlike binary data, it is readable (and debuggable) by human beings.-Caching of databases
There is a (low-level) Python api, but it is not well documented.-Probably no internal UI, but only an interface for making one
And how would the program know how far the build process has come? Make does not provide such information.-Instead of the regular 50 lines/second build dump, display good overall progress indicators
For the same reason that we see people saying that getting rid of the client/server model of XFree will magically make their desktops faster, without any real understanding of the issues involved. Frankly, it wouldn't matter much (proportionally) if emerge was an order of magnitude slower. I'd say that when it comes to any decent sized package, maybe 5% of the total time is due to the emerge tool.Senso wrote:Why do we see "Portage should be rewritten in C/C++" almost once a week?
True, it's an even better argument than what I wrote earlier. The Python stuff in Portage is used to call apps written in C. Python calls wget and tells it to get the source from $WEBSITE, etc. Most of the computing comes from C binaries.Ox- wrote: Also, I've only been using Gentoo a few months, but it looks to me like 99.9% of the time on an emerge is compiling with gcc (written in C) + rsync (yep, in C) + make (C) + wget (C). So, I'd be skeptical a C++ rewrite could even provide a 1% speedup of the overall emerge process.

Perhaps we should begin every week with a new thread like "let's rewrite all scripts to Python!", just to compensate C/C++ zealotsSenso wrote:Why do we see "Portage should be rewritten in C/C++" almost once a week? I'm a Python whore and so, I don't see the point in this statement.
Metakit... It uses a HUGE amount of RAM, compared to other indexing/DB systems. Even Jakarta Lucene is maybe 10x better considering RAM usage. Metakit could be a problem to "low quality" hardware users (like me).charlieg wrote:The main new features of portage will be the use of a DB.
Usage of a basic DB (Berkerly, anybody? or Metakit?) to start with. Then things like dependencies (forward and reverse) can be established incredibly quickly.
Look at the GLIS thread. I'll soon (i.e. 2-3 days) start a Tkinter UI for this project.axxackall wrote: Hmm, that could be interesting system, where ALL non-3rd-party (all that belongs to Gentoo itself) software is written on Python: Portage, initscripts, installation scripts, system (network, user, disk etc) management tools, various other tools and utilities ... Even UI for all of that must be written with Tkinter.
Just a quick idea, but you could log normal output (1) to a file, cat it and remove all lines starting with "gcc" with sed. If you add more similar rules, you would maybe still have a lot of crap but it would be easier to browse the log and find useful messages.MrPyro wrote: Also, one thing I find annoying about emerge at the moment is that some packages have warning or informational messages that come up during installations: not compiler warnings, things like the message that tells you an easy way to configure Apache to use mod_php during the mod_php build. I tend to start my "emerge -pU --deep world" process running then go out somewhere, rather than sit and stare at my monitor watching compiler output, so I miss these messages. Some system where these kinds of messages are logged, so that they can be read later, would be helpful.
I've been planning on playing with SQLite for a while... It's a SQL database, but embedded in your app directly (so the user doesn't have to download/install MySQL). The original version is for C/C++ but there are *many* wrappers for other languages, including Python, of course. I love the idea of an embedded SQL db but I've never really tried it.() wrote:Is there some lightweight (possibly object oriented) database system that could be worth looking at apart from metakit?

What does the number of packages have to do with dependancies?Tronic wrote:The dependancy check isn't too slow at the moment, but I think that's something that is prone to suffer a lot of increased package numbers
The speed of Python is not a problem. The problem is that you need to open and read 5000+ files. That will be slow no matter which language you use. Using a database backend could solve this problem.Tronic wrote:Maybe it would be worth it to write these pieces in C/C++ and keep the rest in Python? Especially the searching is something I highly doubt could be fast enough on Python (except if you use some kind of word cache for that, but then you need to generate it and that brings new problems).

Code: Select all
emerge -s koules
......
Description: fast action arcade-style game w/sound and network supportWell, a "package" is really just a directory containing ebuild files ...carambola5 wrote:Here's a question: why does each ebuild have its own DESCRIPTION variable? Shouldn't each package have this instead of each ebuild?
Good question. I think multi-lingual descriptions should be possible too.carambola5 wrote:And while we're at it, why does the description have to be a one-liner?
Well, of course you then have to follow the trail, ask what bar and baz want, later figuring out if you can automatically solve some conflicts, etc. Once you get into deps of XFree86 libs, you'll soon be effectively travelling via deps of all the graphical apps.. (of course this depends a lot on how "smart" the deps checking system is, because more features == more things to check for)What does the number of packages have to do with dependancies?
If ebuild foo says "I depend on packages bar and baz", that will not change when the number of packages in portage increase.
But if I had that database in one big file, with an average of 500 chars information per package, you'd still have to scan thru several megs of text and naturally also handle the database at the same time. Potentially you'd have to parse it in UTF-8 too. I don't have any benchmarks here on Python, but that might still be too slow (is it?)The speed of Python is not a problem. The problem is that you need to open and read 5000+ files. That will be slow no matter which language you use. Using a database backend could solve this problem.
But at the same time it is difficult to read for machines (and it's software that should ever be reading or writing it anyway). The things we are talking about here are simplicity of implementation (escaping all data that isn't ASCII text, escaping XML reserved characters, parsing of tags that use loose syntax (the number of spaces between attributes and other such small things)), storage efficiency (big and ugly tags versus simple binary ones) and performance.There is nothing wrong with using xml. Unlike binary data, it is readable (and debuggable) by human beings.