Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Long-lasting portage slowness issue: Really I/Os fault? No!
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
rvalles
Tux's lil' helper
Tux's lil' helper


Joined: 19 Feb 2003
Posts: 121

PostPosted: Sat Sep 10, 2005 3:03 pm    Post subject: Long-lasting portage slowness issue: Really I/Os fault? No! Reply with quote

Many times we've discussed portage slowness, that portage is slow, and at the end we always hear things like that it is because of the great amount of small files it does have to read, that filesystems are bad with this sort of I/O, etc.

Well, after many runs of "time emerge -Dupvt world" I came to "stabilize" (some ms more, some ms less, the point is that HD isn't touched anymore) the results to this figures:
Code:

real    0m46.013s
user    0m39.742s
sys     0m4.296s

As it's well known on the Linux world, I/O isn't something that happens on userspace. Therefore... where do this ~40 seconds come from? Yes, I cannot help but think that it's the pure fault of the portage implementation itself, and yeah, I'm very biased now, but I tend to think that the choice of Python may be related, too.

For the curious mind, the system being used on this little test is an athlon@600, 768mb of PC133 ram, and reiser4 on /. Portage is 2.0.51.22-r2.

Well, what do you think?
Back to top
View user's profile Send private message
spb
Retired Dev
Retired Dev


Joined: 02 Jan 2004
Posts: 2135
Location: Cambridge, UK

PostPosted: Sat Sep 10, 2005 4:05 pm    Post subject: Reply with quote

The problem is mainly that current Portage's dependency resolution code is, in a word, stupid.
Back to top
View user's profile Send private message
marduk
Retired Dev
Retired Dev


Joined: 20 Sep 2002
Posts: 78

PostPosted: Sat Sep 10, 2005 7:37 pm    Post subject: Reply with quote

Agreed, a lot of it can be attributed to the way that portage is written. For example, I still can't figure out why there are so many deep copies (which are expensive). That is one of the main reasons why "import portage" takes so long. I've had to rip out parts of portage.py and put it in my own code for packages.gentoo.org simply because importing the portage module takes way to long to be used practically in a cgi script.

Also, a lot of the code is just "old code" that hasn't been optimized. Consider the function grabfile() in portage_util.py. On my machine when I "import portage", grabfile() is called 1276 times (before I even call a function). The total time is 0.310 seconds. This isn't a lot by itself, but it all adds up. If you look at grabfile() it's using the old string module which has for a long time been deprecated. Strings are first-class objects in Python now. Also, file objects are generators, so reading all the lines of a file into memory and then iterating ofer those lines is no longer necessary.

I took that simple function, grabfile() and made a few changes to it:

Code:

--- portage_util.py.vanilla     2005-09-08 23:27:38.000000000 -0500
+++ portage_util.py     2005-09-10 14:22:44.000000000 -0500
@@ -21,22 +21,20 @@
                myfile=open(myfilename,"r")
        except IOError:
                return []
-       mylines=myfile.readlines()
-       myfile.close()
        newlines=[]
-       for x in mylines:
+       for x in myfile:
                #the split/join thing removes leading and trailing whitespace, and converts any whitespace in the line
                #into single spaces.
-               myline=string.join(string.split(x))
-               if not len(myline):
+               myline = ' '.join(x.split())
+               if not myline:
                        continue
                if myline[0]=="#":
                        # Check if we have a compat-level string. BC-integration data.
                        # '##COMPAT==>N<==' 'some string attached to it'
-                       mylinetest = string.split(myline, "<==", 1)
+                       mylinetest = myline.split( "<==", 1)
                        if len(mylinetest) == 2:
                                myline_potential = mylinetest[1]
-                               mylinetest = string.split(mylinetest[0],"##COMPAT==>")
+                               mylinetest = mylinetest[0].split("##COMPAT==>")
                                if len(mylinetest) == 2:
                                        if compat_level >= int(mylinetest[1]):
                                                # It's a compat line, and the key matches.
@@ -45,6 +43,7 @@
                        else:
                                continue
                newlines.append(myline)
+       myfile.close()
        return newlines


Then when I profiled the code, I found that the same 1276 calls only take 0.020 seconds. That's 15x faster!

So yeah, a lot of the code needs to be rewritten or is just not "smart" code. Using a directory structure of 20,000+ files as a database doesn't help either. A redesign/rewrite of portage is justifiable. This is being worked on, but may be a while before users see any notable differences.

-m
Back to top
View user's profile Send private message
Shadow Skill
Veteran
Veteran


Joined: 04 Dec 2004
Posts: 1023

PostPosted: Sat Sep 10, 2005 8:28 pm    Post subject: Reply with quote

I think that splitting the tree would also be very effective in terms of speeding things up for users and possibly even eliminate the need ffor the package. files. If the tree is split by Core Library > Window Manager > CVS derivatives therein you have approximately four to six trees not counting different architectures.[Assuming you need a misc category for certain applications.] If people want cvs applications they would just set the cvs tree as active, no more stupid "This package depends on foo but foo is masked by some really stupid config file." messages. The only files one would still need at all would be /et/cportage/package.mask and package.use. [There really needs to be a mask flag that only masks a package for the duration of the emerge operation, directly having to edit package.mask constantly is just dumb.]
_________________
Ware wa mutekinari.
Wa ga kage waza ni kanau mono nashi.
Wa ga ichigeki wa mutekinari.

"First there was nothing, so the lord gave us light. There was still nothing, but at least you could see it."
Back to top
View user's profile Send private message
Jeremy_Z
l33t
l33t


Joined: 05 Apr 2004
Posts: 671
Location: Shanghai

PostPosted: Sun Sep 11, 2005 4:33 am    Post subject: Reply with quote

So when is the Gentoo Summer of Code : Rewrite portage ? :D
_________________
"Because two groups of consumers drive the absolute high end of home computing: the gamers and the porn surfers." /.
My gentoo projects, Kelogviewer and a QT4 gui for etc-proposals
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum