Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Unsupported Software
  • Search

Script to cut rsync file list by upto 95%

This forum covers all Gentoo-related software not officially supported by Gentoo. Ebuilds/software posted here might harm the health and stability of your system(s), and are not supported by Gentoo developers. Bugs/errors caused by ebuilds from overlays.gentoo.org are covered by this forum, too.
Post Reply
Advanced search
50 posts
  • Previous
  • 1
  • 2
Author
Message
doppelganger
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 84
Joined: Wed Jun 30, 2004 10:42 pm

update

  • Quote

Post by doppelganger » Fri Aug 26, 2005 4:42 pm

well I re-ran emerge --sync and everything seemed to work fine then. I guess the first run creates the files it uses for the second --sync.....<SOLVED>
Top
ZoeF
n00b
n00b
Posts: 17
Joined: Sat Jul 02, 2005 7:17 am

  • Quote

Post by ZoeF » Sun Aug 28, 2005 4:24 pm

Thank you!

I love this script. My rsyncs are less than a tenth of what they used to be.
Top
gentop
l33t
l33t
Posts: 639
Joined: Mon Nov 29, 2004 6:03 pm

  • Quote

Post by gentop » Tue Oct 18, 2005 9:53 am

Hi,

qpkg is deprecated now. So, if you run prlock.py and qpkg is not found edit it: Replace

Code: Select all

pkgl = os.popen('/bin/bash /usr/bin/qpkg -nc -I','r').readlines() 
with

Code: Select all

pkgl=[]
for i in os.listdir('/var/db/pkg'):
        for j in os.listdir('/var/db/pkg/%s' % (i)):
                elem = re.search('^(.*\/.*)-[0-9].*$',"%s/%s" % (i,j)).group(1)
                try:
                        pkgl.index(elem)
                except:
                        pkgl.append(elem)
If you still want to use qpkg, look at /usr/lib/gentoolkit/bin/qpkg. So you can also change

Code: Select all

pkgl = os.popen('/bin/bash /usr/bin/qpkg -nc -I','r').readlines()
to

Code: Select all

pkgl = os.popen('/bin/bash /usr/lib/gentoolkit/qpkg -nc -I','r').readlines()
But this is deprecated!

//gentop
Top
_hephaistos_
Advocate
Advocate
User avatar
Posts: 2694
Joined: Wed Apr 07, 2004 5:58 am
Location: salzburg, austria

  • Quote

Post by _hephaistos_ » Tue Oct 18, 2005 5:33 pm

@gentop: the path should be: /usr/lib/gentoolkit/bin/qpkg

but thanks for the replace code!

cheers
-l: signature: command not found
Top
Yonathan
l33t
l33t
User avatar
Posts: 662
Joined: Wed Jan 05, 2005 12:22 pm

  • Quote

Post by Yonathan » Thu Dec 29, 2005 3:27 pm

when i add two packets with:
prlock A
prlock B
i can't find A in rsync_excludes, so i have to re-add A manually to rsync_...

why? someone has a solution for that problem?

yona
Athlon XP+ 2400 Thunderbird,
Abit NF7
1536MB DDR (266),
Radeon 9200 (256mb)
gentoo 2.6.19-r5
Top
indanet
n00b
n00b
Posts: 54
Joined: Sun Sep 05, 2004 11:34 pm

  • Quote

Post by indanet » Fri Dec 30, 2005 8:47 pm

Hi!
Yonathan wrote:when i add two packets with:
prlock A
prlock B
i can't find A in rsync_excludes, so i have to re-add A manually to rsync_...
I can't reproduce your problem. Here's my version of the script:
[removed, see below]
Last edited by indanet on Sat Dec 31, 2005 11:52 am, edited 2 times in total.
Top
Gentree
Watchman
Watchman
User avatar
Posts: 5350
Joined: Tue Jul 01, 2003 12:51 am
Location: France, Old Europe

  • Quote

Post by Gentree » Sat Dec 31, 2005 12:39 am

Nice idea , I've been wanting to sort out some form of rsync_excludes for a long time but I could never get it to work even with a trivial exclude file. Never sussed why not, and never got any help. :(

One fault is that it assumes default portage directories. You should probably read $PORTDIR etc from make.conf and use those values instead.

Equally you check for the presence of rsync_excludes and break if it is not there as the std value. It would be better to read what is there and use it (heck , you've read this line in already , just parse out the filename.)

Great idea anyway should seriously lighten the load on the mirrors. 8)
Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86
Top
indanet
n00b
n00b
Posts: 54
Joined: Sun Sep 05, 2004 11:34 pm

  • Quote

Post by indanet » Sat Dec 31, 2005 11:57 am

Gentree wrote:One fault is that it assumes default portage directories. You should probably read $PORTDIR etc from make.conf and use those values instead.

Equally you check for the presence of rsync_excludes and break if it is not there as the std value. It would be better to read what is there and use it (heck , you've read this line in already , just parse out the filename.)
This version of the script searches make.conf as requested and checks if the environment variables $PORTDIR or $RSYNC_EXCLUDES are set.

prlock.py

Code: Select all

#!/usr/bin/python
#
# prlock.py (Portage Rsync Lockdown)
# This script creates a exclude list for portage's rsync. The exclude list
# excludes all but the installed packages (found with qpkg).
# Specify 'branch-name/package-name' arguments to unlock additional packages
# that are not yet installed.
# Remember to edit make.conf and make the dir /etc/portage
#
# *WARNING*
# If you use this script and find yourself wanting to add a new package...
# You MUST either add it to the exclude list (By passing it as an arg to
# prlock) or comment out the exclude list entirly in make.conf.
# Then do a sync
# THEN add the package
# If you don't you will add a package that may be out of date!
#
# Eg.....
#     > prlock.py dev-python/wxPython
# This will restrict the rsync to only the installed packages AND wxPython
# All other packages will not be updated.
#
# Nick Fisher prlock@nickdafish.com
#
import os, re, sys

# Quit if an arg that doesn't look like a package is encountered.
for arg in sys.argv[1:]:
    if not re.match('[-\w]+/[-\w]+',arg):
        print 'Line doesn\'t look like it is a package....\n',arg,'\n','...aborting.'
        sys.exit(1)

# Trying to get portage dir and exclude file from environment
excl = os.environ.get('RSYNC_EXCLUDES', '')
portdir = os.environ.get('PORTDIR', '')

# Extract path of exclude file from make.conf
re_excl = re.compile('^\s*RSYNC_EXCLUDEFROM="?(?P<excl>(/[-\w]+)+)"?\s*$')
re_portdir = re.compile('^\s*PORTDIR="?(?P<portdir>(/[-\w]+)+)"?\s$')
for line in open('/etc/make.conf','r').readlines():
    if not len(excl):
        res = re_excl.match(line)
        if res: excl = res.group('excl')
    if not len(portdir):
        res = re_portdir.match(line)
        if res: portdir = res.group('portdir')

if not len(excl):
    print 'WARNING! Could not get value of RSYNC_EXCLUDEFROM! Using default...'
    excl = '/etc/portage/rsync_excludes'
    print 'See make.conf(5) for help on setting the variable'
    print 'Portage will not exclude without it!'

if not len(portdir):
    portdir = '/usr/portage'

# Check that we will be able to write the file
#print 'Using',portdir,'as portage directory'
if not os.path.isdir(portdir):
    print portdir,"does not exist!\nPlease make the dir and try again."
    sys.exit(1)
#print 'Using',excl,'as exclude file'
if not os.path.isfile(excl):
    print excl,"does not exist!\nPlease make the file and try again."
    sys.exit(1)

# Find all the installed packages
pkgl=[]
for i in os.listdir('/var/db/pkg'):
    for j in os.listdir('/var/db/pkg/%s' % (i)):
        elem = re.search('^(.*\/.*)-[0-9].*$',"%s/%s" % (i,j)).group(1)
        try:
            pkgl.index(elem)
        except:
            pkgl.append(elem)

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)
    print 'Adding package from cmd line ',arg

# Add the branches to list and format
fmt = re.compile(r'[\w|-]*/')
list=[]
for line in pkgl:
    m = fmt.search(line)
    if not('+ '+m.group()+'\n' in list): list.append('+ '+m.group()+'\n')
    if not('+ metadata/cache/'+m.group()+'\n' in list): list.append('+ metadata/cache/'+m.group()+'\n')

# Add the package dir and all subdirs/files to list
for x in range(len(pkgl)):
    list.append('+ '+pkgl[x].strip()+'/\n')
    list.append('+ '+pkgl[x].strip()+'/**\n')
    list.append('+ metadata/cache/'+pkgl[x].strip()+'*\n')

# Append the directives to allow access to the metadata
list.append('+ metadata/\n')
list.append('+ metadata/*\n')
list.append('+ metadata/cache/\n')
list.append('+ metadata/cache/*\n')

# Allow sync for eclasses
# I don't know how I could exclude them well
list.append('+ eclass/\n')
list.append('+ eclass/**\n')

# Allow sync for files
# These shouldn't be excluded
list.append('+ files/\n')
list.append('+ files/**\n')

# Allow sync for libsidplay
# I havn't a clue what this branch is about
list.append('+ libsidplay/\n')
list.append('+ libsidplay/**\n')

# Allow sync for licenses
# Shouldn't be updated that often
list.append('+ licenses/\n')
list.append('+ licenses/**\n')

# Allow sync for packages
# Dunno what this branch is about
# Include for safteys sake
list.append('+ packages/\n')
list.append('+ packages/**\n')

# Allow sync for profiles
# Should smarten this up at some point
list.append('+ profiles/\n')
list.append('+ profiles/**\n')

# Allow sync for scripts
list.append('+ scripts/\n')
list.append('+ scripts/**\n')

# Sort the list for readability
list.sort()

# Add the include all files and exclude everything else
list.append('+ /*\n- *')

# Write the pkg list out to the exclude file
open(excl,'w').writelines(''.join(list))
I started to write a version which uses the gentoolkit and portage packages, which has colorized output. Maybe I'll be able to finish that in the next few days...

Best regards
indanet
Top
Gentree
Watchman
Watchman
User avatar
Posts: 5350
Joined: Tue Jul 01, 2003 12:51 am
Location: France, Old Europe

  • Quote

Post by Gentree » Sat Dec 31, 2005 12:42 pm

This version of the script searches make.conf as requested and checks if the environment variables $PORTDIR or $RSYNC_EXCLUDES are set.
That was fast , nice work.

Now to see if I can get rsync_excludeds to work at all here, so far it's rsync=rs-ache. :wink:
Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86
Top
IQgryn
l33t
l33t
Posts: 764
Joined: Mon Sep 05, 2005 10:54 pm
Location: WI, USA

Suggestions

  • Quote

Post by IQgryn » Tue Jan 10, 2006 6:16 am

I have a couple of suggestions:
  • Allow people to add their own rules. I have a few ideas on how to do this:
    • Add options (perhaps flags) to insert the contents of another file before and/or after the prlock output.
    • Instead of deleting and recreating the existing excludes file, only modify it (and perhaps have a flag to delete and re-create)
    • If there's an existing file, move it to rsync_exludes.old, and output to rsync_excludes.new. After the new file is created, run

      Code: Select all

      sdiff -o rsync_excludes rsync_exludes.old rsync_exludes.new
  • Allow the line that includes all files and folder in the root directory of the protage tree to be excluded (probably another flag). I know that this makes the script less general, but it irks me that there's a bunch of empty folder lying around. :roll: Another option would be to specifically exclude all the top-level folders that aren't used, which would be a little harder, but still keep the generality.
Top
indanet
n00b
n00b
Posts: 54
Joined: Sun Sep 05, 2004 11:34 pm

  • Quote

Post by indanet » Sat Jan 14, 2006 7:42 pm

This would be great:
If a packages is added by the commandline, automatically add it's dependencies. (This is probably not possible because the ebuild to look this up is missing. But maybe I'm wrong.)
Top
kernelcowboy
Guru
Guru
User avatar
Posts: 391
Joined: Sat Feb 14, 2004 8:34 pm
Location: New Plymouth, New Zealand

  • Quote

Post by kernelcowboy » Fri Mar 10, 2006 7:51 am

Very cool script. As a virtual server user over at Linode.com, I will make good use of it.

There seems to be a bit of a trap. Please follow along, and let me know if there's a way out...

Code: Select all

emerge X
emerge: there are no ebuilds to satisfy "blah/A".
./prlock.py blah/A
emerge sync
emerge X
emerge: there are no ebuilds to satisfy "blah/B".
./prlock.py blah/B
emerge sync

>>>
>>> Timestamps on the server and in the local repository are the same.
>>> Cancelling all further sync action. You are already up to date.
>>>
If I wait a bit, I can sync again, but this message is concerning me:

Code: Select all

Please note: common gentoo-netiquette says you should not sync more
than once a day.  Users who abuse the rsync.gentoo.org rotation
may be added to a temporary ban list.
What is the best way to deal with this?
Top
IQgryn
l33t
l33t
Posts: 764
Joined: Mon Sep 05, 2005 10:54 pm
Location: WI, USA

  • Quote

Post by IQgryn » Fri Mar 10, 2006 9:31 am

If you have another machine, which keeps the full tree, you could set it up as a portage mirror, and sync with it as often as you needed (it would only sync with the master servers every 24 hours). Check that out at http://gentoo-wiki.com/HOWTO_Local_Rsync_Mirror. Otherwise, as long as you don't do it too often, you won't be banned (keep it to 7 a week average, or maybe 10 in a week if you lay off the next). If you know you'll need to make a lot of changes, download a tarball of the portage tree, and extract it somewhere else on your machine. You can use rsync to update your tree from another folder on the same machine (just be sure to specify the excludes file as emerge synce normally would). Let me know if any of that is confusing; I'm running on less sleep than I prefer. :?
Top
kernelcowboy
Guru
Guru
User avatar
Posts: 391
Joined: Sat Feb 14, 2004 8:34 pm
Location: New Plymouth, New Zealand

  • Quote

Post by kernelcowboy » Fri Mar 10, 2006 8:15 pm

all good ideas, thanks!

i backed-up my portage tree locally before running rm -rf and prlock, but unfortunately, i didn't sync it beforehand. long story, but i'm out of luck there.

Is is possible to just copy bits of the tree into place. for example, my laptop is up-to-date. instead of making it a portage mirror, can i just copy ebuilds over manually and stick them in the tree on the server?
Top
Oyarsa
n00b
n00b
User avatar
Posts: 73
Joined: Mon Jul 01, 2002 6:01 pm
Location: Mars

How usefull is this script really?

  • Quote

Post by Oyarsa » Sat Apr 01, 2006 10:35 pm

Is anyone out there still using this script and finding it usefull? It just seams to create problems for me.
For example: I perform a emerge sync and then try to emerge -u system. I often get error
messages due to missing or ~86 masked packages which are new dependencies to my existing installed
packages. This requires a re-sync without the rsync_excludes file which loads all those ebuilds I was trying
to keep off my machine in the first place. How are folks getting around this?
Dew knot trussed yore spell chequer two fined awl ewer miss steaks.
Top
kernelcowboy
Guru
Guru
User avatar
Posts: 391
Joined: Sat Feb 14, 2004 8:34 pm
Location: New Plymouth, New Zealand

Re: How usefull is this script really?

  • Quote

Post by kernelcowboy » Sat Apr 01, 2006 11:11 pm

Oyarsa wrote:Is anyone out there still using this script and finding it usefull?
I do. Yes, I like it. I use it in a virtual server enviro; there are lots of good reasons to use it.
Oyarsa wrote:It just seams to create problems for me.
For example: I perform a emerge sync and then try to emerge -u system. I often get error
messages due to missing or ~86 masked packages which are new dependencies to my existing installed
packages. This requires a re-sync without the rsync_excludes file which loads all those ebuilds I was trying
to keep off my machine in the first place. How are folks getting around this?
Is the Warning part of the docs at the top of script not addressing this for you?

If you are updating the system, you should probably not exclude any of the dependencies unless you really really know what you're doing. :P
Top
Oyarsa
n00b
n00b
User avatar
Posts: 73
Joined: Mon Jul 01, 2002 6:01 pm
Location: Mars

Re: How usefull is this script really?

  • Quote

Post by Oyarsa » Sun Apr 02, 2006 1:56 am

kernelcowboy wrote:If you are updating the system, you should probably not exclude any of the dependencies unless you really really know what you're doing. :P
That's the whole point. If I'm not going to update my system, I have no need to sync in the first place.
In that case I might as well get rid of portage altogether.

My understanding is that the purpose of this script is to cut down on the number of files transfered by only
considering packages that are actually installed on the machine. Where I seem to be running into trouble
is when updates to installed packages list new packages as dependencies and the ebuilds for these new
dependencies are out of date or missing altogether. What I don't understand is how this script can be
usefull if this is a common occurence. I get the impression from your post that I need to maually edit the
rsync_excludes file that is created and remove the excludes that can potentially cause problems. Is this
correct? Maybe I'm missing something -- do I needing to run the script again after every sync?
Dew knot trussed yore spell chequer two fined awl ewer miss steaks.
Top
kernelcowboy
Guru
Guru
User avatar
Posts: 391
Joined: Sat Feb 14, 2004 8:34 pm
Location: New Plymouth, New Zealand

Re: How usefull is this script really?

  • Quote

Post by kernelcowboy » Sun Apr 02, 2006 2:21 am

Oyarsa wrote:
kernelcowboy wrote:If you are updating the system, you should probably not exclude any of the dependencies unless you really really know what you're doing. :P
That's the whole point. If I'm not going to update my system, I have no need to sync in the first place.
In that case I might as well get rid of portage altogether.

My understanding is that the purpose of this script is to cut down on the number of files transfered by only
considering packages that are actually installed on the machine. Where I seem to be running into trouble
is when updates to installed packages list new packages as dependencies and the ebuilds for these new
dependencies are out of date or missing altogether. What I don't understand is how this script can be
usefull if this is a common occurence.
Is the system a server with specific responsibility? I think the idea is, build up the server, when your happy, run this script to reduce the storage requirement. Since it's a server, you probably shouldn't update it too often. If it works, don't fix it. (i'm sure some people will disagree.) of course if a feature or bug fix is available, you'll update. but, probably just update that package. Do it deeply, so you keep things square. I do this in a virtual server environment, so I keep the bandwidth usage down, and disk usage down. two things i pay for. the other virtual users on the box probably benefit too. otherwise, i don't see a need for this script. clearly, it takes a bit of work, but if the benefit is real dollars, it's likely worth it.
Oyarsa wrote:I get the impression from your post that I need to maually edit the
rsync_excludes file that is created and remove the excludes that can potentially cause problems. Is this
correct? Maybe I'm missing something -- do I needing to run the script again after every sync?
I didn't write this script, nor do i really understand python much at all. But, my understanding is this. You run it, it looks at your current install; creates a list to exclude all packages you don't currently have need for. following the rest of the scripts instructions, you'll have updated emerge sync to only consider those packages you care about.

when you need something new, you'll have the script remove it from the exclude list.

Code: Select all

prlock.py dev-python/wxPython 
is how this is done, easily. (assuming wxPython is the package you needed, but didn't already have.)

Sometimes I find that I have to emerge sync too often with this script. I try to limit this. If you try to emerge a package with dependencies, you may only see the first dependency. You do a prlock.py theDep then emerge sync, then emerge again. you'll then see the next set of dependencies. you'll ned to run prlock.py again. but, when you emerge sync again, you'll get told off for abusing the servers.

I haven't worked out a good way to fix this yet. I've been copying ebuilds from another system, and just emerge sync'ing later to get it all proper. seems to work. Perhaps there's a way to get portage to tell you all the dependencies you'll need to get, and if we cross ref that through prlock.py somehow (perhaps i'll learn python :)), we can get a list stuff to emerge. That way, we only need to emerg sync once. (Provided you don't think up something else you want too soon.)
Top
lost+found
Guru
Guru
User avatar
Posts: 514
Joined: Mon Nov 15, 2004 6:56 pm
Location: North~Sea~Coa~s~~t~~~

Script that creates rsync excludes for Portage. (REPOST)

  • Quote

Post by lost+found » Thu Jun 01, 2006 6:17 pm

Hi, I made this shell script to use the rsync_excludes feature of Portage some time ago. It still works for me. But, be warned: Emerge will complain more sooner than later about missing ebuilds!
CHEEEEEEEEEERS.

Code: Select all

#!/bin/sh

# WARNING: Use rsync_excludes at your own risk. Make backups. :-)

# This simple script creates an /etc/portage/rsync_excludes file,
# for only syncing Portage (i.e. emerge --sync) to packages you
# have been interested in. It checks the contents of /var/db/pkg/
# for that. You will get a speed gain and can free some space,
# at the cost of Emerge complaining sooner or later about missing
# ebuilds. Duh, add them manually to your existing rsync_excludes
# file. To make it work, empty your /usr/portage directory (your
# distfiles and packages can stay) or create another dir, and
# use PORTDIR="/your-dir" in /etc/make.conf. Always add
# RSYNC_EXCLUDEFROM="/etc/portage/rsync_excludes" to /etc/make.conf,
# and set your profile and architecture below:
#
# Like in: /usr/portage/profiles/$PROFILE/$ARCH/
PROFILE="default-linux"
ARCH="x86"
#
# Good luck!


# Portage default location for the rsync_excludes file.
OUTPUT="/etc/portage/rsync_excludes"


# Backup existing rsync_excludes.
if [ -e $OUTPUT ]; then
mv $OUTPUT $OUTPUT.old || exit
fi

# Create an empty rsync_excludes.
touch $OUTPUT || exit

# Add the categories/packages in /var/db/pkg/*/* (ebuilds etc.)
for i in /var/db/pkg/*/*; do
CAT="`echo $i|sed 's:/var/db/pkg/\([^ ]*\)/.*:\1:'`"
PKG="`echo $i|sed 's:/var/db/pkg/.*/\([^ ]*\)-[0123456789].*:\1:'`"
grep "$CAT" $OUTPUT > /dev/null || (echo "+ $CAT/" >> $OUTPUT)
echo -e "+ $CAT/$PKG/\n+ $CAT/$PKG/**" >> $OUTPUT
done

# I think we need these...
echo "+ eclass/" >> $OUTPUT
echo "+ eclass/**" >> $OUTPUT
echo "+ metadata/" >> $OUTPUT
echo "+ metadata/*/" >> $OUTPUT

# Add metadata for the categories/packages in /var/db/pkg/*/*
for i in /var/db/pkg/*/*; do
CAT="`echo $i|sed 's:/var/db/pkg/\([^ ]*\)/.*:\1:'`"
PKG="`echo $i|sed 's:/var/db/pkg/.*/\([^ ]*\)-[0123456789].*:\1:'`"
grep "/$CAT" $OUTPUT > /dev/null || (echo "+ metadata/cache/$CAT/" >> $OUTPUT)
echo "+ metadata/cache/$CAT/$PKG-*" >> $OUTPUT
done

# ...and these too.
echo "- metadata/cache/**" >> $OUTPUT
echo "+ metadata/dtd/**" >> $OUTPUT
echo "+ metadata/glsa/**" >> $OUTPUT
echo "+ metadata/*" >> $OUTPUT
echo "+ profiles/" >> $OUTPUT
echo "+ profiles/base/" >> $OUTPUT
echo "+ profiles/base/**" >> $OUTPUT
echo "+ profiles/desc/" >> $OUTPUT
echo "+ profiles/desc/**" >> $OUTPUT
echo "+ profiles/$PROFILE/" >> $OUTPUT
echo "+ profiles/$PROFILE/$ARCH/" >> $OUTPUT
echo "+ profiles/$PROFILE/$ARCH/**" >> $OUTPUT
echo "- profiles/$PROFILE/*/" >> $OUTPUT
echo "+ profiles/$PROFILE/*" >> $OUTPUT
echo "+ profiles/updates/" >> $OUTPUT
echo "+ profiles/updates/**" >> $OUTPUT
echo "- profiles/*/" >> $OUTPUT
echo "+ profiles/*" >> $OUTPUT
echo "+ scripts/" >> $OUTPUT
echo "+ scripts/**" >> $OUTPUT
echo "- **/" >> $OUTPUT
echo "+ *" >> $OUTPUT
Notes:
Do a second emerge --sync, if you get this error:

Code: Select all

>>> Updating Portage cache: 
 Traceback (most recent call last): 
   File "/usr/bin/emerge", line 2705, in ? 
     oldcat = portage.catsplit(cp_list[0])[0] 
 IndexError: list index out of range
Just correct your symlink if you get this error:

Code: Select all

!!! ARCH is not set... Are you missing the /etc/make.profile symlink? 
 !!! Is the symlink correct? Is your portage tree complete?
You can even delete the contents of /var/cache/edb, as current Portage versions will recreate the cache from your smaller Portage tree ("emerge --metadata"). Never touch /var/db/pkg b.t.w.!!!

Since Portage 2.1:

Code: Select all

WARNING: usage of RSYNC_EXCLUDEFROM is deprecated, use PORTAGE_RSYNC_EXTRA_OPTS instead
/etc/make.conf change:

Code: Select all

PORTAGE_RSYNC_EXTRA_OPTS="--exclude-from=/etc/portage/rsync_excludes"
Script doesn't work with overlays.
Last edited by lost+found on Sat Nov 18, 2006 3:31 pm, edited 2 times in total.
Top
bur
Apprentice
Apprentice
Posts: 229
Joined: Fri Feb 20, 2004 10:17 pm

Keeping the portage tree small by using a whitelist

  • Quote

Post by bur » Wed Jun 28, 2006 12:12 am

When syncing the portage tree all available packages are updated, no matter that probably most of them will never be used by you. That wastes time and disk space on your box and ressources and traffic on the rsync-servers. I found this thread on the german board which inspired me on trying to make a whitelist that defines what parts of the portage tree will be updated when syncing and which won't. In the end I managed to shrink my /usr/portage/ directory from 121,224 files (396 MB) to 71,385 files (136 MB).

First you need to tell emerge that you want to use an exclude list for rsync:

Code: Select all

PORTAGE_RSYNC_EXTRA_OPTS="--exclude-from=/etc/portage/rsync_excludes"

Now for the important part, we need to define what should be synced. To do this first find out what packages are installed on your PC.

Code: Select all

buren ~ # ls /var/db/pkg/
app-admin       dev-java     media-libs    perl-core    x11-base
app-arch        dev-lang     media-sound   sys-apps     x11-drivers
[...]
This will print a list of all parts of the portage tree from which you emerged packages. Some of the directories might be empty, you should delete them before proceeding. Now for each non-empty directory in /var/db/pkg put an entry to /etc/portage/rsync_excludes like this:

Code: Select all

+ app-admin**
+ app-arch**
The '+' tells emerge to include these parts of the portage tree when syncing. The '**' is a placeholder that matches everything. '*' would only match until it encounters a '/' resulting in a broken tree.


Now that we included or whitelisted all packages we want to be updated, we need to blacklist everything else. This is done by adding every part of the tree to the blacklist. The '+'-entries will overrule the blacklist, it is important though that they come before the blacklistings, otherwise they won't overrule them.

Code: Select all

app-**
dev-**
games-**
gnome-**
gnustep-**
kde-**
mail-**
media-**
net-**
perl-**
rox-**
sci-**
sec-**
sys-**
www-**
x11-**
xfce-**
If we would just leave it at that the tree would still get corrupted as the blacklist above would exclude important parts of 'eclasses' and other vital parts of portage. So we need to whitelist those system-specific parts, too:

Code: Select all

+ eclass**
+ licenses**
+ profiles**
+ scripts**
+ virtual**
Important: Like before, these '+'-entries need to be placed above the blacklisting. Best to put it at the top of the file. So your rsync_excludes should look like this:

Code: Select all

#important parts of portage
+ eclass**
+ licenses**
+ profiles**
+ scripts**
+ virtual**

#whitelist all used parts of the tree
+ app-admin**
+ app-arch**
+ app-benchmarks**
+ app-crypt**
+ app-editors**
+ app-i18n**
+ app-misc**
+ app-portage**
+ app-shells**
+ app-text**
+ dev-db**
+ dev-java**
+ dev-lang**
+ dev-libs**
+ dev-perl**
+ dev-python**
+ dev-tex**
+ dev-util**
+ kde-base**
+ media-fonts**
+ media-gfx**
+ media-libs**
+ media-sound**
+ media-video**
+ net-analyzer**
+ net-dns**
+ net-firewall**
+ net-ftp**
+ net-libs**
+ net-misc**
+ net-nds**
+ net-print**
+ perl-core**
+ sys-apps**
+ sys-boot**
+ sys-devel**
+ sys-fs**
+ sys-kernel**
+ sys-libs**
+ sys-process**
+ www-client**
+ x11-apps**
+ x11-base**
+ x11-drivers**
+ x11-libs**
+ x11-misc**
+ x11-proto**
+ x11-terms**
+ x11-wm**

#blacklist everything
app-**
dev-**
games-**
gnome-**
gnustep-**
kde-**
mail-**
media-**
net-**
perl-**
rox-**
sci-**
sec-**
sys-**
www-**
x11-**
xfce-**

Now each time you want to emerge a package from a part of the tree you haven't used before, simply add it to the file. For example if I was to emerge net-im/gimp, I would add '+ net-im**' to rsync_excludes and sync the tree. If you unmerge a package and it was the only one of its kind you can remove the specific part from the whitelist. Example: I only have ethereal in net-analyzer, so if I unmerge ethereal, I can remove the '+ net-analyzer**' line. You can also delete the specific directory from /usr/portage - in this case 'rm -r /usr/portage/net-analyzer'.

It can happen that a new version of a package requires a dependency that resides in a blacklisted part of the tree. In that case portage will complain, specifying what package it misses. Just add it to the whitelist-section, resync and start the emerge again.

If you want an even smaller tree, instead of using the rather generic whitelist entries I use, you can specify the exact package you use. For example instead of '+ www-client**' you would use '+ www-client/mozilla**' and '+ www-client/opera**' if you had Firefox and Opera installed.


The method described works well for me, but if you're unsure you should backup your current portage tree and delete it if everything turns out as working well:

Code: Select all

mv /usr/portage /usr/portage.full
If the slim tree for whatever reason is broken, you can turn back to the old "full" tree:

Code: Select all

rm -r /usr/portage
mv /usr/portage.full /usr/portage

Any comments or ideas how to improve this are welcome. :)
Top
think4urs11
Bodhisattva
Bodhisattva
User avatar
Posts: 6659
Joined: Wed Jun 25, 2003 9:51 pm
Location: above the cloud

  • Quote

Post by think4urs11 » Wed Jun 28, 2006 6:00 am

in addition to the above: http://forums.gentoo.org/viewtopic-t-55031.html

partly dupes it.
Nothing is secure / Security is always a trade-off with usability / Do not assume anything / Trust no-one, nothing / Paranoia is your friend / Think for yourself
Top
bur
Apprentice
Apprentice
Posts: 229
Joined: Fri Feb 20, 2004 10:17 pm

  • Quote

Post by bur » Thu Jun 29, 2006 1:27 am

I was also thinking about using a script to set up the rsync_excludes file. This would make keeping it up-to-date much easier, also generating it as a whitelist that specifically whitelists single packages instead of whole "branches" (do you call it that? i mean things like kde-base, app-arch, sys-apps,...).
Top
curtis119
Bodhisattva
Bodhisattva
User avatar
Posts: 2160
Joined: Mon Mar 10, 2003 4:41 pm
Location: Toledo, Ohio,USA, North America, Earth, SOL System, Milky Way, The Universe, The Cosmos, and Beyond.

  • Quote

Post by curtis119 » Thu Jun 29, 2006 2:25 am

I merged the above three posts from a duplicate.
Gentoo: it's like wiping your ass with silk.
Top
lost+found
Guru
Guru
User avatar
Posts: 514
Joined: Mon Nov 15, 2004 6:56 pm
Location: North~Sea~Coa~s~~t~~~

Re: Keeping the portage tree small by using a whitelist

  • Quote

Post by lost+found » Thu Jun 29, 2006 9:25 am

bur wrote:... Example: I only have ethereal in net-analyzer, so if I unmerge ethereal, I can remove the '+ net-analyzer**' line. You can also delete the specific directory from /usr/portage - in this case 'rm -r /usr/portage/net-analyzer'...
If you want to remove parts from the rsync_excludes file, you should *always* remove them from your Portage tree manually. (ebuilds and metadata). Keeping parts in the Portage tree unupdated is not wise. Portage may reuse outdated ebuilds without warning this way.
Top
polyacryl
n00b
n00b
User avatar
Posts: 50
Joined: Sun Sep 14, 2003 7:42 pm

script to generate simple /etc/portage/rsync_excludes

  • Quote

Post by polyacryl » Mon Jul 10, 2006 8:38 am

Hello.

I wrote a simple script to generate the rsync_excludes file. As long as you don't want to mask single packages but whole categories (the stuff in /var/db/pkg/) it should be adequate.

Code: Select all

#!/bin/zsh
#
# generates /etc/portage/rsync_excludes
# comments to asdf@uni-koblenz.de

db=(/var/db/pkg/*)
ex=/etc/portage/rsync_excludes

# remove empty directories in /var/db/pkg/
rmdir $db 2>/dev/null

# whitelist system stuff
cat << EOF > $ex
+ eclass**
+ licenses**
+ profiles**
+ scripts**
+ virtual**

EOF

# whitelist used categories
for i in $db
do
        echo `basename $i` | awk '{print "+ "$NF"**"}' >> $ex
done

# blacklist everything else
cat << EOF >> $ex

**
EOF
Top
Post Reply

50 posts
  • Previous
  • 1
  • 2

Return to “Unsupported Software”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic