View previous topic :: View next topic |
Author |
Message |
tobidope n00b
Joined: 17 Aug 2003 Posts: 24 Location: Germany
|
Posted: Fri Dec 03, 2004 11:50 pm Post subject: [HOWTO] Speeding up portage with cdb -- UPDATE |
|
|
[HOWTO] Speeding up portage with cdb -- UPDATE
Changelog
- 19.12.2004 -- New portage_db_cdb.py module. It's a workaround to accelerate the building of the database after an emerge sync. Hopefully portage will never use jython, I need the destructor. On my ancient computer emerge metadata now last 10 minutes instead of 50!
- 19.12.2004 -- Second try. Note to myself "Also making tiny changes can break your system"
- 04.02.2005 -- Deleted the advice to use emerge --rege
- 16.02.2005 -- Added some lines about copyright
When it comes to speed the database backend of portage is rather slow because it's implemented as a raw abstraction in the filesystem. It is possible to replace this with a mysql backend which is not ideal, if you don't want to start a full-blown RDBMS. My solution is rather tiny, if you compare the code and the packages you need for that:
- You'll need the python interface python-cdb for DJB's cdb
Create /usr/lib/portage/pym/portage_db_cdb.py and put this in it:
Code: | # Copyright 2004, 2005 Tobias Bell <tobias.bell@web.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
import portage_db_template
import os
import os.path
import cPickle
import cdb
class _data(object):
def __init__(self, path, category, uid, gid):
self.path = path
self.category = category
self.uid = uid
self.gid = gid
self.addList = {}
self.delList = []
self.modified = False
self.cdbName = os.path.normpath(os.path.join(
self.path, self.category) + ".cdb")
self.cdbObject = None
def __del__(self):
if self.modified:
self.realSync()
self.closeCDB()
def realSync(self):
if self.modified:
self.modified = False
newDB = cdb.cdbmake(self.cdbName, self.cdbName + ".tmp")
for key, value in iter(self.cdbObject.each, None):
if key in self.delList:
if key in self.addList:
newDB.add(key, cPickle.dumps(self.addList[key], cPickle.HIGHEST_PROTOCOL))
del self.addList[key]
elif key in self.addList:
newDB.add(key, cPickle.dumps(self.addList[key], cPickle.HIGHEST_PROTOCOL))
del self.addList[key]
else:
newDB.add(key, value)
self.closeCDB()
for key, value in self.addList.iteritems():
newDB.add(key, cPickle.dumps(value, cPickle.HIGHEST_PROTOCOL))
newDB.finish()
del newDB
self.addList = {}
self.delList = []
self.openCDB()
def openCDB(self):
prevmask = os.umask(0)
if not os.path.exists(self.path):
os.makedirs(self.path, 02775)
os.chown(self.path, self.uid, self.gid)
if not os.path.isfile(self.cdbName):
maker = cdb.cdbmake(self.cdbName, self.cdbName + ".tmp")
maker.finish()
del maker
os.chown(self.cdbName, self.uid, self.gid)
os.chmod(self.cdbName, 0664)
os.umask(prevmask)
self.cdbObject = cdb.init(self.cdbName)
def closeCDB(self):
if self.cdbObject:
self.cdbObject = None
class _dummyData:
cdbName = ""
def realSync():
pass
realSync = staticmethod(realSync)
_cacheSize = 4
_cache = [_dummyData()] * _cacheSize
class database(portage_db_template.database):
def module_init(self):
self.data = _data(self.path, self.category, self.uid, self.gid)
for other in _cache:
if other.cdbName == self.data.cdbName:
self.data = other
break
else:
self.data.openCDB()
_cache.insert(0, self.data)
_cache.pop().realSync()
def has_key(self, key):
self.check_key(key)
retVal = 0
if self.data.cdbObject.get(key) is not None:
retVal = 1
if self.data.modified:
if key in self.data.delList:
retVal = 0
if key in self.data.addList:
retVal = 1
return retVal
def keys(self):
myKeys = self.data.cdbObject.keys()
if self.data.modified:
for k in self.data.delList:
myKeys.remove(k)
for k in self.data.addList.iterkeys():
if k not in myKeys:
myKeys.append(k)
return myKeys
def get_values(self, key):
values = None
if self.has_key(key):
if key in self.data.addList:
values = self.data.addList[key]
else:
values = cPickle.loads(self.data.cdbObject.get(key))
return values
def set_values(self, key, val):
self.check_key(key)
self.data.modified = True
self.data.addList[key] = val
def del_key(self, key):
retVal = 0
if self.has_key(key):
self.data.modified = True
retVal = 1
if key in self.data.addList:
del self.data.addList[key]
else:
self.data.delList.append(key)
return retVal
def sync(self):
pass
def close(self):
pass
if __name__ == "__main__":
import portage
uid = os.getuid()
gid = os.getgid()
portage_db_template.test_database(database,"/tmp", "sys-apps", portage.auxdbkeys, uid, gid)
|
Create /etc/portage/modules and fill in
Code: | portdbapi.auxdbmodule = portage_db_cdb.database
eclass_cache.dbmodule = portage_db_cdb.database
|
Now you'll have to regenerate the portage cache with
Now try some searches with emerge especially with --searchdesc
Code: | emerge --searchdesc python | should be much faster than before. You can compare the performance by switching back to the normal
db module. Just make a
Code: | mv /etc/portage/modules /etc/portage/__modules | I hope you enjoy your accelerated portage.
Last edited by tobidope on Wed Feb 16, 2005 8:26 pm; edited 6 times in total |
|
Back to top |
|
|
steveb Advocate
Joined: 18 Sep 2002 Posts: 4564
|
Posted: Sat Dec 04, 2004 12:18 am Post subject: |
|
|
nice |
|
Back to top |
|
|
rojaro l33t
Joined: 06 May 2002 Posts: 732
|
Posted: Sat Dec 04, 2004 11:34 am Post subject: |
|
|
Oh yes ... this is really nice, this should be included in portage! _________________ A mathematician is a machine for turning coffee into theorems. ~ Alfred Renyi (*1921 - †1970) |
|
Back to top |
|
|
TheCoop Veteran
Joined: 15 Jun 2002 Posts: 1814 Location: Where you least expect it
|
Posted: Sat Dec 04, 2004 12:41 pm Post subject: |
|
|
ooooooh this is good. prod the devs to put it in _________________ 95% of all computer errors occur between chair and keyboard (TM)
"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler
Change the world - move a rock |
|
Back to top |
|
|
PrakashP Veteran
Joined: 27 Oct 2003 Posts: 1249 Location: C.C.A.A., Germania
|
Posted: Sat Dec 04, 2004 1:22 pm Post subject: |
|
|
I like this. Painless and good results. |
|
Back to top |
|
|
tobidope n00b
Joined: 17 Aug 2003 Posts: 24 Location: Germany
|
Posted: Sat Dec 04, 2004 5:42 pm Post subject: |
|
|
You can also benchmark it against the portage_db_anydbm module by putting
Code: | portdbapi.auxdbmodule = portage_db_anydbm.database
eclass_cache.dbmodule = portage_db_anydbm.database
|
into /etc/portage/modules. In my opinion this is a bit slower than cdb, but you
don't have to code the db-update by hand. But the db files are also up to two times bigger than the same cdb-files. |
|
Back to top |
|
|
AlterEgo Veteran
Joined: 25 Apr 2002 Posts: 1619
|
Posted: Sat Dec 04, 2004 6:08 pm Post subject: |
|
|
Cool stuff! Works great for me. Thanks! |
|
Back to top |
|
|
John-Boy Guru
Joined: 23 Jun 2004 Posts: 442 Location: Desperately seeking moksha in all the wrong places
|
Posted: Sat Dec 04, 2004 6:13 pm Post subject: |
|
|
Nice |
|
Back to top |
|
|
PrakashP Veteran
Joined: 27 Oct 2003 Posts: 1249 Location: C.C.A.A., Germania
|
Posted: Sat Dec 04, 2004 7:13 pm Post subject: |
|
|
BTW, where are the cdb files kept? In /usr/portage/metadata? |
|
Back to top |
|
|
tobidope n00b
Joined: 17 Aug 2003 Posts: 24 Location: Germany
|
Posted: Sat Dec 04, 2004 7:33 pm Post subject: |
|
|
Quote: | BTW, where are the cdb files kept? In /usr/portage/metadata? |
They are kept in /var/cache/edb/dep/usr/portage. The metadata cache is generated by rsyncing. |
|
Back to top |
|
|
GentooBox Veteran
Joined: 22 Jun 2003 Posts: 1168 Location: Denmark
|
Posted: Sat Dec 04, 2004 7:35 pm Post subject: |
|
|
Can i see a benchmark before switching to cdb... please _________________ Encrypt, lock up everything and duct tape the rest |
|
Back to top |
|
|
matroskin Apprentice
Joined: 21 Jan 2003 Posts: 214
|
Posted: Sat Dec 04, 2004 8:28 pm Post subject: benchmark |
|
|
normal
% time emerge -S mozilla > /dev/null
real 2m39.952s
user 0m21.770s
sys 0m4.580s
cdb
% time emerge -S mozilla > /dev/null
real 0m23.169s
user 0m19.770s
sys 0m2.190s |
|
Back to top |
|
|
UberLord Retired Dev
Joined: 18 Sep 2003 Posts: 6835 Location: Blighty
|
Posted: Sat Dec 04, 2004 8:36 pm Post subject: |
|
|
Is there any speed increase updating the cache after syncing? |
|
Back to top |
|
|
PrakashP Veteran
Joined: 27 Oct 2003 Posts: 1249 Location: C.C.A.A., Germania
|
Posted: Sat Dec 04, 2004 9:20 pm Post subject: |
|
|
@tobidope
Is the other stuff in /var/cache/edb/dep/usr/portage actually needed after using cdb? I see files with .cpickle and various dirs with files which occupy a hunk of space... |
|
Back to top |
|
|
tobidope n00b
Joined: 17 Aug 2003 Posts: 24 Location: Germany
|
Posted: Sat Dec 04, 2004 9:22 pm Post subject: |
|
|
Quote: | Is there any speed increase updating the cache after syncing? |
I can give you no exact figures but yes, I think it's a lot faster. You can easily benchmark it by doing a Code: | time emerge metadata | with and without the cdb-backend. I won't do this on the machine I'm working on at the moment. It's a Code: | tobias@oldie tobias $ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 3
model name : Pentium II (Klamath)
stepping : 4
cpu MHz : 300.810
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov mmx
bogomips : 591.87
| with a 5 GB hard disk. It takes forever with the portage_db_flat module. |
|
Back to top |
|
|
tobidope n00b
Joined: 17 Aug 2003 Posts: 24 Location: Germany
|
Posted: Sat Dec 04, 2004 9:32 pm Post subject: |
|
|
Quote: | @tobidope
Is the other stuff in /var/cache/edb/dep/usr/portage actually needed after using cdb? I see files with .cpickle and various dirs with files which occupy a hunk of space... |
You can delete all directories and files under /var/cache/edb/dep/usr/portage
except for the files ending in *.cdb. A bit less cruft on your system |
|
Back to top |
|
|
PrakashP Veteran
Joined: 27 Oct 2003 Posts: 1249 Location: C.C.A.A., Germania
|
Posted: Sat Dec 04, 2004 9:39 pm Post subject: |
|
|
Hmm, it seems after an emerge sync the got removed by itself. |
|
Back to top |
|
|
s4kk3 Apprentice
Joined: 15 Oct 2004 Posts: 232 Location: Finland
|
Posted: Sat Dec 04, 2004 9:46 pm Post subject: |
|
|
Very nice. It took almost 3 minutes more to complete emerge --searchdesc python without cdb. This kind of speedups I love _________________ My own filemanager project |
|
Back to top |
|
|
tobidope n00b
Joined: 17 Aug 2003 Posts: 24 Location: Germany
|
Posted: Sat Dec 04, 2004 9:47 pm Post subject: |
|
|
@PrakashKC
And? Has the update of the portage cache been faster? I hope so, but my PC
I'm working with at the moment is rather slow. Both syncs (portage_db_flat and portage_db_cdb) took a long amount of time and I forgot to put a time in front of the emerge sync. |
|
Back to top |
|
|
PrakashP Veteran
Joined: 27 Oct 2003 Posts: 1249 Location: C.C.A.A., Germania
|
Posted: Sat Dec 04, 2004 9:58 pm Post subject: |
|
|
I have not timed it actually. It seemed slower with cdb, but it could be just a subjective impression because of the % rising slowly whereas with default portage you don't see the %...
But searching is now really fast - now competing with portage-c, though not quite as fast. |
|
Back to top |
|
|
UberLord Retired Dev
Joined: 18 Sep 2003 Posts: 6835 Location: Blighty
|
Posted: Sat Dec 04, 2004 11:01 pm Post subject: |
|
|
Default portage
Code: |
cd /var/cache/edb/dep/usr
rm * -Rf
time emerge metadata
real 0m40.575s
user 0m26.260s
sys 0m9.592s
time emerge -S mozilla
real 1m46.003s
user 1m36.215s
sys 0m8.721s
time emerge -upDv world
real 0m14.678s
user 0m13.289s
sys 0m1.130s
|
python-cdb powered portage
Code: |
cd /var/cache/edb/dep/usr
rm * -Rf
time emerge metadata
real 2m18.022s
user 0m26.538s
sys 0m31.505s
time emerge -S mozilla
real 1m39.624s
user 1m32.628s
sys 0m5.418s
time emerge -upDv world
real 0m13.856s
user 0m12.986s
sys 0m0.727s
|
As you can see, it takes just over 1 and a half minutes extra to update the cache using cdb for a 7 second gain searching and 1 second when updating world
Now, that's on an AMD64 3500 with 1 GIG memory and blazingly fast IDE disks formatted with resierfs
I did notice that when emerging metadata, the CPU isn't being utilized all that much - maybe there is a bottleneck somewhere in the code? Anyway, it's nice, has potential, but I don't think I'm going to use it. |
|
Back to top |
|
|
HackingM2 Apprentice
Joined: 26 Jul 2004 Posts: 245 Location: Cambridge, England
|
Posted: Sat Dec 04, 2004 11:57 pm Post subject: |
|
|
UberLord wrote: | Default portage
Code: |
cd /var/cache/edb/dep/usr
rm * -Rf
time emerge metadata
real 0m40.575s
user 0m26.260s
sys 0m9.592s
time emerge -S mozilla
real 1m46.003s
user 1m36.215s
sys 0m8.721s
time emerge -upDv world
real 0m14.678s
user 0m13.289s
sys 0m1.130s
|
python-cdb powered portage
Code: |
cd /var/cache/edb/dep/usr
rm * -Rf
time emerge metadata
real 2m18.022s
user 0m26.538s
sys 0m31.505s
time emerge -S mozilla
real 1m39.624s
user 1m32.628s
sys 0m5.418s
time emerge -upDv world
real 0m13.856s
user 0m12.986s
sys 0m0.727s
|
As you can see, it takes just over 1 and a half minutes extra to update the cache using cdb for a 7 second gain searching and 1 second when updating world
Now, that's on an AMD64 3500 with 1 GIG memory and blazingly fast IDE disks formatted with resierfs
I did notice that when emerging metadata, the CPU isn't being utilized all that much - maybe there is a bottleneck somewhere in the code? Anyway, it's nice, has potential, but I don't think I'm going to use it. |
I would agree for machines of high spec.
I have tried this on a Pentium Pro 200 with 1Gb of RAM and 5xSCSI RAID and expericed a massive speed-up (less than 20% of previous times) and on a quad P4 3Ghz with 2Gb of RAM and 4xSCSI RAID with a slow-down.
It seems to me that this is just a trade-off for disk IO over CPU, which is weird - I would have expected the reverse as both machines have fast disk arrays. |
|
Back to top |
|
|
PrakashP Veteran
Joined: 27 Oct 2003 Posts: 1249 Location: C.C.A.A., Germania
|
Posted: Sun Dec 05, 2004 9:28 am Post subject: |
|
|
@UberLord
I think there is something wrong with your sytem:
time emerge -S mozilla
real 0m21.519s
user 0m15.361s
sys 0m1.082s
with cdb.
AthlonXP 2.2 GHz, 1 GB RAM, / on RAID0 |
|
Back to top |
|
|
tobidope n00b
Joined: 17 Aug 2003 Posts: 24 Location: Germany
|
Posted: Sun Dec 05, 2004 10:23 am Post subject: |
|
|
I think, the problem with the really slow update after or lies within portage.py. After each key insertion a sync is made. This is no problem with portage_db_flat.py because there the definition of the sync method is Code: | def sync(self):
pass
| But with my module or portage_db_anydbm.py a I/O bound operation will be called. |
|
Back to top |
|
|
Leffe Tux's lil' helper
Joined: 07 Apr 2004 Posts: 145 Location: Sweden
|
Posted: Sun Dec 05, 2004 11:06 am Post subject: |
|
|
Hm, I wonder which is faster SQLite or CDB |
|
Back to top |
|
|
|