Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[HOWTO] Speeding up portage with cdb -- UPDATE
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9, 10  Next  
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
tobidope
n00b
n00b


Joined: 17 Aug 2003
Posts: 24
Location: Germany

PostPosted: Fri Dec 03, 2004 11:50 pm    Post subject: [HOWTO] Speeding up portage with cdb -- UPDATE Reply with quote

[HOWTO] Speeding up portage with cdb -- UPDATE

Changelog

  • 19.12.2004 -- New portage_db_cdb.py module. It's a workaround to accelerate the building of the database after an emerge sync. Hopefully portage will never use jython, I need the destructor. On my ancient computer emerge metadata now last 10 minutes instead of 50!
  • 19.12.2004 -- Second try. Note to myself "Also making tiny changes can break your system"
  • 04.02.2005 -- Deleted the advice to use emerge --rege
  • 16.02.2005 -- Added some lines about copyright

When it comes to speed the database backend of portage is rather slow because it's implemented as a raw abstraction in the filesystem. It is possible to replace this with a mysql backend which is not ideal, if you don't want to start a full-blown RDBMS. My solution is rather tiny, if you compare the code and the packages you need for that:

  1. You'll need the python interface python-cdb for DJB's cdb
    Code:

    emerge python-cdb

  2. Create /usr/lib/portage/pym/portage_db_cdb.py and put this in it:
    Code:
    # Copyright 2004, 2005 Tobias Bell <tobias.bell@web.de>
    #
    # This program is free software; you can redistribute it and/or modify
    # it under the terms of the GNU General Public License as published by
    # the Free Software Foundation; either version 2 of the License, or
    # (at your option) any later version.
    #
    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    # GNU General Public License for more details.
    #
    # You should have received a copy of the GNU General Public License
    # along with this program; if not, write to the Free Software
    # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

    import portage_db_template
    import os
    import os.path
    import cPickle
    import cdb


    class _data(object):

        def __init__(self, path, category, uid, gid):
            self.path = path
            self.category = category
            self.uid = uid
            self.gid = gid
            self.addList = {}
            self.delList = []
            self.modified = False
            self.cdbName = os.path.normpath(os.path.join(
                self.path, self.category) + ".cdb")
            self.cdbObject = None

        def __del__(self):
            if self.modified:
                self.realSync()

            self.closeCDB()

        def realSync(self):
            if self.modified:
                self.modified = False
                newDB = cdb.cdbmake(self.cdbName, self.cdbName + ".tmp")
               
                for key, value in iter(self.cdbObject.each, None):
                    if key in self.delList:
                        if key in self.addList:
                            newDB.add(key, cPickle.dumps(self.addList[key], cPickle.HIGHEST_PROTOCOL))
                            del self.addList[key]
                    elif key in self.addList:                   
                        newDB.add(key, cPickle.dumps(self.addList[key], cPickle.HIGHEST_PROTOCOL))
                        del self.addList[key]
                    else:
                        newDB.add(key, value)
                   

                self.closeCDB()

                for key, value in self.addList.iteritems():
                    newDB.add(key, cPickle.dumps(value, cPickle.HIGHEST_PROTOCOL))
               
                newDB.finish()
                del newDB
               
                self.addList = {}
                self.delList = []

                self.openCDB()

        def openCDB(self):
            prevmask = os.umask(0)
           
            if not os.path.exists(self.path):
                os.makedirs(self.path, 02775)
                os.chown(self.path, self.uid, self.gid)
               
            if not os.path.isfile(self.cdbName):
                maker = cdb.cdbmake(self.cdbName, self.cdbName + ".tmp")
                maker.finish()
                del maker
                os.chown(self.cdbName, self.uid, self.gid)
                os.chmod(self.cdbName, 0664)

            os.umask(prevmask)
               
            self.cdbObject = cdb.init(self.cdbName)

        def closeCDB(self):
            if self.cdbObject:
                self.cdbObject = None


    class _dummyData:
        cdbName = ""

        def realSync():
            pass
        realSync = staticmethod(realSync)


    _cacheSize = 4
    _cache = [_dummyData()] * _cacheSize


    class database(portage_db_template.database):   

        def module_init(self):
            self.data = _data(self.path, self.category, self.uid, self.gid)

            for other in _cache:
                if other.cdbName == self.data.cdbName:
                    self.data = other
                    break
            else:
                self.data.openCDB()
                _cache.insert(0, self.data)           
                _cache.pop().realSync()
               
        def has_key(self, key):
            self.check_key(key)
            retVal = 0

            if self.data.cdbObject.get(key) is not None:
                retVal = 1

            if self.data.modified:
                if key in self.data.delList:
                    retVal = 0
                if key in self.data.addList:
                    retVal = 1
               
            return retVal

        def keys(self):
            myKeys = self.data.cdbObject.keys()

            if self.data.modified:
                for k in self.data.delList:
                    myKeys.remove(k)
                for k in self.data.addList.iterkeys():
                    if k not in myKeys:
                        myKeys.append(k)
                       
            return myKeys

        def get_values(self, key):
            values = None
           
            if self.has_key(key):
                if key in self.data.addList:
                    values = self.data.addList[key]
                else:
                    values = cPickle.loads(self.data.cdbObject.get(key))

            return values
       
        def set_values(self, key, val):
            self.check_key(key)
            self.data.modified = True
            self.data.addList[key] = val

        def del_key(self, key):
            retVal = 0
           
            if self.has_key(key):
                self.data.modified = True
                retVal = 1
                if key in self.data.addList:
                    del self.data.addList[key]
                else:
                    self.data.delList.append(key)

            return retVal
                       
        def sync(self):
            pass
       
        def close(self):
            pass


    if __name__ == "__main__":
        import portage
        uid = os.getuid()
        gid = os.getgid()
        portage_db_template.test_database(database,"/tmp", "sys-apps", portage.auxdbkeys, uid, gid)

  3. Create /etc/portage/modules and fill in
    Code:
    portdbapi.auxdbmodule = portage_db_cdb.database
    eclass_cache.dbmodule = portage_db_cdb.database

  4. Now you'll have to regenerate the portage cache with
    Code:
    emerge metadata

Now try some searches with emerge especially with --searchdesc :wink:
Code:
emerge --searchdesc python
should be much faster than before. You can compare the performance by switching back to the normal
db module. Just make a
Code:
mv /etc/portage/modules /etc/portage/__modules
I hope you enjoy your accelerated portage.

Last edited by tobidope on Wed Feb 16, 2005 8:26 pm; edited 6 times in total
Back to top
View user's profile Send private message
steveb
Advocate
Advocate


Joined: 18 Sep 2002
Posts: 4564

PostPosted: Sat Dec 04, 2004 12:18 am    Post subject: Reply with quote

nice ;)
Back to top
View user's profile Send private message
rojaro
l33t
l33t


Joined: 06 May 2002
Posts: 732

PostPosted: Sat Dec 04, 2004 11:34 am    Post subject: Reply with quote

Oh yes ... this is really nice, this should be included in portage!
_________________
A mathematician is a machine for turning coffee into theorems. ~ Alfred Renyi (*1921 - †1970)
Back to top
View user's profile Send private message
TheCoop
Veteran
Veteran


Joined: 15 Jun 2002
Posts: 1814
Location: Where you least expect it

PostPosted: Sat Dec 04, 2004 12:41 pm    Post subject: Reply with quote

ooooooh this is good. prod the devs to put it in :P
_________________
95% of all computer errors occur between chair and keyboard (TM)

"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler

Change the world - move a rock
Back to top
View user's profile Send private message
PrakashP
Veteran
Veteran


Joined: 27 Oct 2003
Posts: 1249
Location: C.C.A.A., Germania

PostPosted: Sat Dec 04, 2004 1:22 pm    Post subject: Reply with quote

I like this. Painless and good results. :)
Back to top
View user's profile Send private message
tobidope
n00b
n00b


Joined: 17 Aug 2003
Posts: 24
Location: Germany

PostPosted: Sat Dec 04, 2004 5:42 pm    Post subject: Reply with quote

You can also benchmark it against the portage_db_anydbm module by putting
Code:
portdbapi.auxdbmodule = portage_db_anydbm.database
eclass_cache.dbmodule = portage_db_anydbm.database

into /etc/portage/modules. In my opinion this is a bit slower than cdb, but you
don't have to code the db-update by hand. But the db files are also up to two times bigger than the same cdb-files.
Back to top
View user's profile Send private message
AlterEgo
Veteran
Veteran


Joined: 25 Apr 2002
Posts: 1619

PostPosted: Sat Dec 04, 2004 6:08 pm    Post subject: Reply with quote

Cool stuff! Works great for me. Thanks!
Back to top
View user's profile Send private message
John-Boy
Guru
Guru


Joined: 23 Jun 2004
Posts: 442
Location: Desperately seeking moksha in all the wrong places

PostPosted: Sat Dec 04, 2004 6:13 pm    Post subject: Reply with quote

Nice :D
Back to top
View user's profile Send private message
PrakashP
Veteran
Veteran


Joined: 27 Oct 2003
Posts: 1249
Location: C.C.A.A., Germania

PostPosted: Sat Dec 04, 2004 7:13 pm    Post subject: Reply with quote

BTW, where are the cdb files kept? In /usr/portage/metadata?
Back to top
View user's profile Send private message
tobidope
n00b
n00b


Joined: 17 Aug 2003
Posts: 24
Location: Germany

PostPosted: Sat Dec 04, 2004 7:33 pm    Post subject: Reply with quote

Quote:
BTW, where are the cdb files kept? In /usr/portage/metadata?

They are kept in /var/cache/edb/dep/usr/portage. The metadata cache is generated by rsyncing.
Back to top
View user's profile Send private message
GentooBox
Veteran
Veteran


Joined: 22 Jun 2003
Posts: 1168
Location: Denmark

PostPosted: Sat Dec 04, 2004 7:35 pm    Post subject: Reply with quote

Can i see a benchmark before switching to cdb... please :)
_________________
Encrypt, lock up everything and duct tape the rest
Back to top
View user's profile Send private message
matroskin
Apprentice
Apprentice


Joined: 21 Jan 2003
Posts: 214

PostPosted: Sat Dec 04, 2004 8:28 pm    Post subject: benchmark Reply with quote

normal
% time emerge -S mozilla > /dev/null
real 2m39.952s
user 0m21.770s
sys 0m4.580s

cdb
% time emerge -S mozilla > /dev/null
real 0m23.169s
user 0m19.770s
sys 0m2.190s
Back to top
View user's profile Send private message
UberLord
Retired Dev
Retired Dev


Joined: 18 Sep 2003
Posts: 6835
Location: Blighty

PostPosted: Sat Dec 04, 2004 8:36 pm    Post subject: Reply with quote

Is there any speed increase updating the cache after syncing?
Back to top
View user's profile Send private message
PrakashP
Veteran
Veteran


Joined: 27 Oct 2003
Posts: 1249
Location: C.C.A.A., Germania

PostPosted: Sat Dec 04, 2004 9:20 pm    Post subject: Reply with quote

@tobidope

Is the other stuff in /var/cache/edb/dep/usr/portage actually needed after using cdb? I see files with .cpickle and various dirs with files which occupy a hunk of space...
Back to top
View user's profile Send private message
tobidope
n00b
n00b


Joined: 17 Aug 2003
Posts: 24
Location: Germany

PostPosted: Sat Dec 04, 2004 9:22 pm    Post subject: Reply with quote

Quote:
Is there any speed increase updating the cache after syncing?

I can give you no exact figures but yes, I think it's a lot faster. You can easily benchmark it by doing a
Code:
time emerge metadata
with and without the cdb-backend. I won't do this on the machine I'm working on at the moment. It's a
Code:
tobias@oldie tobias $ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 3
model name      : Pentium II (Klamath)
stepping        : 4
cpu MHz         : 300.810
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov mmx
bogomips        : 591.87
with a 5 GB hard disk. It takes forever with the portage_db_flat module.
Back to top
View user's profile Send private message
tobidope
n00b
n00b


Joined: 17 Aug 2003
Posts: 24
Location: Germany

PostPosted: Sat Dec 04, 2004 9:32 pm    Post subject: Reply with quote

Quote:
@tobidope

Is the other stuff in /var/cache/edb/dep/usr/portage actually needed after using cdb? I see files with .cpickle and various dirs with files which occupy a hunk of space...


You can delete all directories and files under /var/cache/edb/dep/usr/portage
except for the files ending in *.cdb. A bit less cruft on your system :o
Back to top
View user's profile Send private message
PrakashP
Veteran
Veteran


Joined: 27 Oct 2003
Posts: 1249
Location: C.C.A.A., Germania

PostPosted: Sat Dec 04, 2004 9:39 pm    Post subject: Reply with quote

Hmm, it seems after an emerge sync the got removed by itself. :)
Back to top
View user's profile Send private message
s4kk3
Apprentice
Apprentice


Joined: 15 Oct 2004
Posts: 232
Location: Finland

PostPosted: Sat Dec 04, 2004 9:46 pm    Post subject: Reply with quote

Very nice. It took almost 3 minutes more to complete emerge --searchdesc python without cdb. This kind of speedups I love :twisted:
_________________
My own filemanager project
Back to top
View user's profile Send private message
tobidope
n00b
n00b


Joined: 17 Aug 2003
Posts: 24
Location: Germany

PostPosted: Sat Dec 04, 2004 9:47 pm    Post subject: Reply with quote

@PrakashKC

And? Has the update of the portage cache been faster? I hope so, but my PC
I'm working with at the moment is rather slow. Both syncs (portage_db_flat and portage_db_cdb) took a long amount of time and I forgot to put a time in front of the emerge sync.
Back to top
View user's profile Send private message
PrakashP
Veteran
Veteran


Joined: 27 Oct 2003
Posts: 1249
Location: C.C.A.A., Germania

PostPosted: Sat Dec 04, 2004 9:58 pm    Post subject: Reply with quote

I have not timed it actually. It seemed slower with cdb, but it could be just a subjective impression because of the % rising slowly whereas with default portage you don't see the %...

But searching is now really fast - now competing with portage-c, though not quite as fast. ;)
Back to top
View user's profile Send private message
UberLord
Retired Dev
Retired Dev


Joined: 18 Sep 2003
Posts: 6835
Location: Blighty

PostPosted: Sat Dec 04, 2004 11:01 pm    Post subject: Reply with quote

Default portage
Code:

cd /var/cache/edb/dep/usr
rm * -Rf
time emerge metadata
real    0m40.575s
user    0m26.260s
sys     0m9.592s

time emerge -S mozilla
real    1m46.003s
user    1m36.215s
sys     0m8.721s

time emerge -upDv world
real    0m14.678s
user    0m13.289s
sys     0m1.130s


python-cdb powered portage
Code:

cd /var/cache/edb/dep/usr
rm * -Rf
time emerge metadata
real    2m18.022s
user    0m26.538s
sys     0m31.505s

time emerge -S mozilla
real    1m39.624s
user    1m32.628s
sys     0m5.418s

time emerge -upDv world
real    0m13.856s
user    0m12.986s
sys     0m0.727s


As you can see, it takes just over 1 and a half minutes extra to update the cache using cdb for a 7 second gain searching and 1 second when updating world

Now, that's on an AMD64 3500 with 1 GIG memory and blazingly fast IDE disks formatted with resierfs

I did notice that when emerging metadata, the CPU isn't being utilized all that much - maybe there is a bottleneck somewhere in the code? Anyway, it's nice, has potential, but I don't think I'm going to use it.
Back to top
View user's profile Send private message
HackingM2
Apprentice
Apprentice


Joined: 26 Jul 2004
Posts: 245
Location: Cambridge, England

PostPosted: Sat Dec 04, 2004 11:57 pm    Post subject: Reply with quote

UberLord wrote:
Default portage
Code:

cd /var/cache/edb/dep/usr
rm * -Rf
time emerge metadata
real    0m40.575s
user    0m26.260s
sys     0m9.592s

time emerge -S mozilla
real    1m46.003s
user    1m36.215s
sys     0m8.721s

time emerge -upDv world
real    0m14.678s
user    0m13.289s
sys     0m1.130s


python-cdb powered portage
Code:

cd /var/cache/edb/dep/usr
rm * -Rf
time emerge metadata
real    2m18.022s
user    0m26.538s
sys     0m31.505s

time emerge -S mozilla
real    1m39.624s
user    1m32.628s
sys     0m5.418s

time emerge -upDv world
real    0m13.856s
user    0m12.986s
sys     0m0.727s


As you can see, it takes just over 1 and a half minutes extra to update the cache using cdb for a 7 second gain searching and 1 second when updating world

Now, that's on an AMD64 3500 with 1 GIG memory and blazingly fast IDE disks formatted with resierfs

I did notice that when emerging metadata, the CPU isn't being utilized all that much - maybe there is a bottleneck somewhere in the code? Anyway, it's nice, has potential, but I don't think I'm going to use it.


I would agree for machines of high spec.

I have tried this on a Pentium Pro 200 with 1Gb of RAM and 5xSCSI RAID and expericed a massive speed-up (less than 20% of previous times) and on a quad P4 3Ghz with 2Gb of RAM and 4xSCSI RAID with a slow-down.

It seems to me that this is just a trade-off for disk IO over CPU, which is weird - I would have expected the reverse as both machines have fast disk arrays.
Back to top
View user's profile Send private message
PrakashP
Veteran
Veteran


Joined: 27 Oct 2003
Posts: 1249
Location: C.C.A.A., Germania

PostPosted: Sun Dec 05, 2004 9:28 am    Post subject: Reply with quote

@UberLord

I think there is something wrong with your sytem:

time emerge -S mozilla
real 0m21.519s
user 0m15.361s
sys 0m1.082s

with cdb.

AthlonXP 2.2 GHz, 1 GB RAM, / on RAID0
Back to top
View user's profile Send private message
tobidope
n00b
n00b


Joined: 17 Aug 2003
Posts: 24
Location: Germany

PostPosted: Sun Dec 05, 2004 10:23 am    Post subject: Reply with quote

I think, the problem with the really slow update after
Code:
emerge sync
or
Code:
emerge metadata
lies within portage.py. After each key insertion a sync is made. This is no problem with portage_db_flat.py because there the definition of the sync method is
Code:
def sync(self):
    pass
But with my module or portage_db_anydbm.py a I/O bound operation will be called.
Back to top
View user's profile Send private message
Leffe
Tux's lil' helper
Tux's lil' helper


Joined: 07 Apr 2004
Posts: 145
Location: Sweden

PostPosted: Sun Dec 05, 2004 11:06 am    Post subject: Reply with quote

Hm, I wonder which is faster SQLite or CDB :)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9, 10  Next
Page 1 of 10

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum