View previous topic :: View next topic |
Author |
Message |
rvalles Tux's lil' helper
Joined: 19 Feb 2003 Posts: 121
|
Posted: Sat Sep 10, 2005 3:03 pm Post subject: Long-lasting portage slowness issue: Really I/Os fault? No! |
|
|
Many times we've discussed portage slowness, that portage is slow, and at the end we always hear things like that it is because of the great amount of small files it does have to read, that filesystems are bad with this sort of I/O, etc.
Well, after many runs of "time emerge -Dupvt world" I came to "stabilize" (some ms more, some ms less, the point is that HD isn't touched anymore) the results to this figures:
Code: |
real 0m46.013s
user 0m39.742s
sys 0m4.296s
|
As it's well known on the Linux world, I/O isn't something that happens on userspace. Therefore... where do this ~40 seconds come from? Yes, I cannot help but think that it's the pure fault of the portage implementation itself, and yeah, I'm very biased now, but I tend to think that the choice of Python may be related, too.
For the curious mind, the system being used on this little test is an athlon@600, 768mb of PC133 ram, and reiser4 on /. Portage is 2.0.51.22-r2.
Well, what do you think? |
|
Back to top |
|
|
spb Retired Dev
Joined: 02 Jan 2004 Posts: 2135 Location: Cambridge, UK
|
Posted: Sat Sep 10, 2005 4:05 pm Post subject: |
|
|
The problem is mainly that current Portage's dependency resolution code is, in a word, stupid. |
|
Back to top |
|
|
marduk Retired Dev
Joined: 20 Sep 2002 Posts: 78
|
Posted: Sat Sep 10, 2005 7:37 pm Post subject: |
|
|
Agreed, a lot of it can be attributed to the way that portage is written. For example, I still can't figure out why there are so many deep copies (which are expensive). That is one of the main reasons why "import portage" takes so long. I've had to rip out parts of portage.py and put it in my own code for packages.gentoo.org simply because importing the portage module takes way to long to be used practically in a cgi script.
Also, a lot of the code is just "old code" that hasn't been optimized. Consider the function grabfile() in portage_util.py. On my machine when I "import portage", grabfile() is called 1276 times (before I even call a function). The total time is 0.310 seconds. This isn't a lot by itself, but it all adds up. If you look at grabfile() it's using the old string module which has for a long time been deprecated. Strings are first-class objects in Python now. Also, file objects are generators, so reading all the lines of a file into memory and then iterating ofer those lines is no longer necessary.
I took that simple function, grabfile() and made a few changes to it:
Code: |
--- portage_util.py.vanilla 2005-09-08 23:27:38.000000000 -0500
+++ portage_util.py 2005-09-10 14:22:44.000000000 -0500
@@ -21,22 +21,20 @@
myfile=open(myfilename,"r")
except IOError:
return []
- mylines=myfile.readlines()
- myfile.close()
newlines=[]
- for x in mylines:
+ for x in myfile:
#the split/join thing removes leading and trailing whitespace, and converts any whitespace in the line
#into single spaces.
- myline=string.join(string.split(x))
- if not len(myline):
+ myline = ' '.join(x.split())
+ if not myline:
continue
if myline[0]=="#":
# Check if we have a compat-level string. BC-integration data.
# '##COMPAT==>N<==' 'some string attached to it'
- mylinetest = string.split(myline, "<==", 1)
+ mylinetest = myline.split( "<==", 1)
if len(mylinetest) == 2:
myline_potential = mylinetest[1]
- mylinetest = string.split(mylinetest[0],"##COMPAT==>")
+ mylinetest = mylinetest[0].split("##COMPAT==>")
if len(mylinetest) == 2:
if compat_level >= int(mylinetest[1]):
# It's a compat line, and the key matches.
@@ -45,6 +43,7 @@
else:
continue
newlines.append(myline)
+ myfile.close()
return newlines
|
Then when I profiled the code, I found that the same 1276 calls only take 0.020 seconds. That's 15x faster!
So yeah, a lot of the code needs to be rewritten or is just not "smart" code. Using a directory structure of 20,000+ files as a database doesn't help either. A redesign/rewrite of portage is justifiable. This is being worked on, but may be a while before users see any notable differences.
-m |
|
Back to top |
|
|
Shadow Skill Veteran
Joined: 04 Dec 2004 Posts: 1023
|
Posted: Sat Sep 10, 2005 8:28 pm Post subject: |
|
|
I think that splitting the tree would also be very effective in terms of speeding things up for users and possibly even eliminate the need ffor the package. files. If the tree is split by Core Library > Window Manager > CVS derivatives therein you have approximately four to six trees not counting different architectures.[Assuming you need a misc category for certain applications.] If people want cvs applications they would just set the cvs tree as active, no more stupid "This package depends on foo but foo is masked by some really stupid config file." messages. The only files one would still need at all would be /et/cportage/package.mask and package.use. [There really needs to be a mask flag that only masks a package for the duration of the emerge operation, directly having to edit package.mask constantly is just dumb.] _________________ Ware wa mutekinari.
Wa ga kage waza ni kanau mono nashi.
Wa ga ichigeki wa mutekinari.
"First there was nothing, so the lord gave us light. There was still nothing, but at least you could see it." |
|
Back to top |
|
|
Jeremy_Z l33t
Joined: 05 Apr 2004 Posts: 671 Location: Shanghai
|
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|