Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Script to cut rsync file list by upto 95%
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
NickDaFish
Tux's lil' helper
Tux's lil' helper


Joined: 12 Sep 2002
Posts: 112
Location: Boston, USA

PostPosted: Mon May 19, 2003 10:04 pm    Post subject: Script to cut rsync file list by upto 95% Reply with quote

I have a few gentoo servers that have only a few packages above the base install. Whenever I'd sync them I'd watch as they chewed up bandwidth and CPU grabbing tons of files they didn't and wouldn't need.
So I decided to build an exclude list thinking it would be easyer than making an include list.... Long story short I ended up writing this script...

Code:

#!/usr/bin/python
#
# prlock.py (Portage Rsync Lockdown)
# This script creates a exclude list for portage's rsync. The exclude list
# excludes all but the installed packages (found with qpkg).
# Specify 'branch-name/package-name' arguments to unlock additional packages
# that are not yet installed.
# Remember to edit make.conf and make the dir /etc/portage
#
# *WARNING*
# If you use this script and find yourself wanting to add a new package...
# You MUST either add it to the exclude list (By passing it as an arg to
# prlock) or comment out the exclude list entirly in make.conf.
# Then do a sync
# THEN add the package
# If you don't you will add a package that may be out of date!
#
# Eg.....
#     > prlock.py dev-python/wxPython
# This will restrict the rsync to only the installed packages AND wxPython
# All other packages will not be updated.
#
# Nick Fisher prlock@nickdafish.com
#
import os, re, sys

# Quit if an arg that doens't look like a package is encountered.
for arg in sys.argv[1:]:
    if not re.match('[\w|-]+/[\w|-]+',arg):
        print 'Line doens\'t look like it is a package....\n',arg,'\n','...aborting.'
        sys.exit(1)

# Check that the make.conf is ready for the exclude
for line in open('/etc/make.conf','r').readlines():
    if re.match('RSYNC_EXCLUDEFROM=/etc/portage/rsync_excludes', line): break
else:
    print 'WARNING! RSYNC_EXCLUDEFROM=/etc/portage/rsync_excludes was not found in /etc/make.conf'
    print 'Make sure that it isn\'t commented out. Portage will not exclude without it!'

# Check that we will be able to write the file
if not os.path.isdir('/etc/portage'):
    print "/etc/portage/ does not exist!\nPlease make the dir and try again."
    sys.exit(1)

# Run qpkg to find all the installed packages
pkgl = os.popen('/bin/bash /usr/bin/qpkg -nc -I','r').readlines()

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)
    print 'Adding package from cmd line ',arg

# Add the branches to list and format
fmt = re.compile(r'[\w|-]*/')
list=[]
for line in pkgl:
    m = fmt.search(line)
    if not('+ '+m.group()+'\n' in list): list.append('+ '+m.group()+'\n')
    if not('+ metadata/cache/'+m.group()+'\n' in list): list.append('+ metadata/cache/'+m.group()+'\n')

# Add the package dir and all subdirs/files to list
for x in range(len(pkgl)):
    list.append('+ '+pkgl[x].strip()+'/\n')
    list.append('+ '+pkgl[x].strip()+'/**\n')
    list.append('+ metadata/cache/'+pkgl[x].strip()+'*\n')

# Append the directives to allow access to the metadata
list.append('+ metadata/\n')
list.append('+ metadata/*\n')
list.append('+ metadata/cache/\n')
list.append('+ metadata/cache/*\n')

# Allow sync for eclasses
# I don't know how I could exclude them well
list.append('+ eclass/\n')
list.append('+ eclass/**\n')

# Allow sync for files
# These shouldn't be excluded
list.append('+ files/\n')
list.append('+ files/**\n')

# Allow sync for libsidplay
# I havn't a clue what this branch is about
list.append('+ libsidplay/\n')
list.append('+ libsidplay/**\n')

# Allow sync for licenses
# Should be updated that often
list.append('+ licenses/\n')
list.append('+ licenses/**\n')

# Allow sync for packages
# Dunno what this branch is about
# Include for safteys sake
list.append('+ packages/\n')
list.append('+ packages/**\n')

# Allow sync for profiles
# Should smarten this up at some point
list.append('+ profiles/\n')
list.append('+ profiles/**\n')

# Allow sync for scripts
list.append('+ scripts/\n')
list.append('+ scripts/**\n')

# Sort the list for readability
list.sort()

# Add the include all files and exclude everything else
list.append('+ /*\n- *')

# Write the pkg list out to the exclude file
open('/etc/portage/rsync_excludes','w').writelines(''.join(list))



If you ever need to add a new package remember to add it first to the exclude list by passing it as an arg to prlock. Then sync again to get all the latest files.

Things to note:
Remember to make the /etc/portage dir
Remember to uncomment the 'RSYNC_EXCLUDEFROM' in the make.conf
Remember to localise rsync as layed out in the GWN

Also Please note that I'm not and do not profess to be a Portage guru or a Python guru. Infact this is the first python script I have used in production. If you have any notes, bugs or comments about the script please message me and I will do an edit.... lets keep this thread clean.
I'm not really sure that this script is 100% safe. I've been using it for a while, I've have had no trouble and I see no reason why I should. If anyone does spot a problem please let me know!

Happy light weight rsyncing :wink:

Nick

EDIT: Added includes for portage dirs that should always be rsynced.


Last edited by NickDaFish on Mon Jun 07, 2004 3:34 pm; edited 2 times in total
Back to top
View user's profile Send private message
jmpnz
n00b
n00b


Joined: 19 May 2003
Posts: 23

PostPosted: Tue Mar 09, 2004 11:20 pm    Post subject: Reply with quote

qpkg has moved. Change this line :


Code:
pkgl = os.popen('/bin/bash /usr/portage/app-admin/gentoolkit/files/scripts/qpkg -nc -I','r').readlines()


to

Code:
pkgl = os.popen('/bin/bash /usr/bin/qpkg -nc -I','r').readlines()


Also it would be nice if the script would remove the directories no longer needed. Is there an emerge or gentoolkit way to do that?
Back to top
View user's profile Send private message
NickDaFish
Tux's lil' helper
Tux's lil' helper


Joined: 12 Sep 2002
Posts: 112
Location: Boston, USA

PostPosted: Tue Mar 09, 2004 11:25 pm    Post subject: Reply with quote

Thanks for that.... I've edited the script to reflect the change. :wink:
Back to top
View user's profile Send private message
jmglov
Retired Dev
Retired Dev


Joined: 03 Aug 2002
Posts: 23
Location: Yokohama, Japan

PostPosted: Fri Jun 04, 2004 1:42 am    Post subject: Re: Script to cut rsync file list by upto 95% Reply with quote

NickDaFish wrote:
Long story short I ended up writing this script...


And quite a nice script it is! May I suggest one change?

Change this:

NickDaFish wrote:
Code:

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)


To this:

Code:

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)
    print 'Adding package ',arg,'\n';


Note that I have suggested that your script be added to gentoolkit:

https://bugs.gentoo.org/show_bug.cgi?id=47715

Cheers!
_________________
Josh Glover <jmglov@gentoo.org>
Gentoo Developer (http://dev.gentoo.org/~jmglov/)
Back to top
View user's profile Send private message
NickDaFish
Tux's lil' helper
Tux's lil' helper


Joined: 12 Sep 2002
Posts: 112
Location: Boston, USA

PostPosted: Mon Jun 07, 2004 3:42 pm    Post subject: Re: Script to cut rsync file list by upto 95% Reply with quote

jmglov wrote:
May I suggest one change?

Change this:

NickDaFish wrote:
Code:

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)


To this:

Code:

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)
    print 'Adding package ',arg,'\n';



I don't *think* the
Code:
,'\n';
is really required.... You've been doing too much C :wink:

jmglov wrote:
Note that I have suggested that your script be added to gentoolkit:

https://bugs.gentoo.org/show_bug.cgi?id=47715

Cheers!

Cool..... I would be honored to have one of my sripts in the distro :D

However someone who knows what's going on in the portage tree should have a bit of a look and make sure I'm not doing samething daft (Missing important branches or something). I built alot of it through trial and error.....
Back to top
View user's profile Send private message
ruth
Retired Dev
Retired Dev


Joined: 07 Sep 2003
Posts: 640
Location: M / AN / BY / GER

PostPosted: Mon Jun 07, 2004 6:52 pm    Post subject: Reply with quote

hi,
i really _love_ your script... ;)
works great...
thanks a lot...

rootshell
_________________
"The compiler has tried twice to abort and cannot do so; therefore, compilation will now terminate."
-- IBM PL/I (F) error manual
Back to top
View user's profile Send private message
soth
Apprentice
Apprentice


Joined: 12 Sep 2003
Posts: 207

PostPosted: Tue Nov 23, 2004 8:14 pm    Post subject: Reply with quote

Splendid idea!
Though, I seem to be doing something wrong:


Code:


root@raziel ~/shellscripts # ./prlock.py                                                           
./prlock.py: line 30: syntax error near unexpected token `if'
./prlock.py: line 30: `    if not re.match('[\w|-]+/[\w|-]+',arg):'
root@raziel ~/shellscripts # nano prlock.py                                                       
root@raziel ~/shellscripts # ./prlock.py                                                           
./prlock.py: line 35: syntax error near unexpected token `('
./prlock.py: line 35: `for line in open('/etc/make.conf','r').readlines():'
root@raziel ~/shellscripts # ./prlock.py                                                           
root@raziel ~/shellscripts # nano prlock.py                                                       
root@raziel ~/shellscripts # ./prlock.py                                                           
./prlock.py: line 42: syntax error near unexpected token `('
./prlock.py: line 42: `if not os.path.isdir('/etc/portage'):'
root@raziel ~/shellscripts # nano prlock.py                                                       
root@raziel ~/shellscripts # ./prlock.py                                                           
./prlock.py: line 47: syntax error near unexpected token `('
./prlock.py: line 47: `pkgl = os.popen('/bin/bash /usr/bin/qpkg -nc -I','r').readlines()'



I tried to comment out the stuff it was complaining about and got the above, which leads me to think I missed something, which I cannot figure out.
would love to use it, but maybe I'm too stwuped =/
_________________
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
soth
Apprentice
Apprentice


Joined: 12 Sep 2003
Posts: 207

PostPosted: Tue Nov 23, 2004 9:04 pm    Post subject: Reply with quote

Geh. Now i got it to work.
I found an extra newline at the top of the script I pasted in there by mistake...
Stupid me.

Now another question;

should it not allow the syntax
RSYNC_EXCLUDEFROM="/etc/portage/rsync_excludes"

not just
RSYNC_EXCLUDEFROM=/etc/portage/rsync_excludes
which seems to be the case?

My bad if I'm in error.
Lovely script. I now sync only ~7800 files om my largest system =D
_________________
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
NickDaFish
Tux's lil' helper
Tux's lil' helper


Joined: 12 Sep 2002
Posts: 112
Location: Boston, USA

PostPosted: Wed Nov 24, 2004 9:17 pm    Post subject: Reply with quote

soth wrote:

should it not allow the syntax
RSYNC_EXCLUDEFROM="/etc/portage/rsync_excludes"

not just
RSYNC_EXCLUDEFROM=/etc/portage/rsync_excludes
which seems to be the case?

In the example make.conf that comes with portage there are no quotes.... and none are really needed. If you want to quote RSYNC_EXCLUDEFROM then alter the script :wink:

soth wrote:

Lovely script. I now sync only ~7800 files om my largest system =D

Cool! Another happy customer :D
Back to top
View user's profile Send private message
soth
Apprentice
Apprentice


Joined: 12 Sep 2003
Posts: 207

PostPosted: Wed Nov 24, 2004 10:09 pm    Post subject: Reply with quote

Ah you are right indeed. It's only url's, commands and variables that have quotes. Though portage says it's ok, then I thought scripts should comply, but I should prolly just shaddup and thank you again.

I'm very grateful. Keep up the good work. Thanks again! =)
_________________
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
wesblake
n00b
n00b


Joined: 25 Jun 2004
Posts: 52
Location: Sacramento

PostPosted: Thu Nov 25, 2004 2:16 am    Post subject: Dependency question Reply with quote

Ok, cool script. I'm about to run it on my laptop, but I have one question about how dependencies might work. In the past, I've noticed that some packages that I already have installed will have differen't or new dependencies when I do an update to the newer version of the already installed package. Will these still show when I do emerge -pu world? If they will still show at least, I don't mind having to add them late with prlock.py package-name. I'm mostly concerned that they might not show so the package I have will be updated but brocken because of some unknown new dependencies that were excluded.
Thanks.
_________________
<kow`> "There are 10 types of people in the world... those who understand binary and those who don't."
Back to top
View user's profile Send private message
soth
Apprentice
Apprentice


Joined: 12 Sep 2003
Posts: 207

PostPosted: Thu Nov 25, 2004 8:08 am    Post subject: Reply with quote

I'm not quite sure if I understand your question right, but as I understand it, portage is quite aware of the dependenciesof the packages. It will tell you if somethings missing installtime. Then you hvave to include that in your copy of oprtage and sync again I guess. That being the backside of this approach...
_________________
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
wesblake
n00b
n00b


Joined: 25 Jun 2004
Posts: 52
Location: Sacramento

PostPosted: Mon Nov 29, 2004 8:40 pm    Post subject: Reply with quote

Well, I think that answered my question, but how about a better example in case. Let's say:
I have Package A version 1.0 depending on packages X and Y.
I run this script and package Z is not emerged so added to the exclude list.
Then Package A version 2.0 comes out which now depends on package Z.
Since package Z was excluded, will I now have any issue emerging the new package A?

It sounds like from what you said that because it is a dependency, portage will still list it with emerge -pu packageA wether it is on the exclude list or not.
_________________
<kow`> "There are 10 types of people in the world... those who understand binary and those who don't."
Back to top
View user's profile Send private message
soth
Apprentice
Apprentice


Joined: 12 Sep 2003
Posts: 207

PostPosted: Mon Nov 29, 2004 9:07 pm    Post subject: Reply with quote

Well, that can happen without an upgrade I guess =)

Emerge will fail and puke...
what to do is to rerun the script like

prlock.py Z
emerge sync
emerge -Duva A
_________________
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
Insanity5902
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1228
Location: Fort Worth, Texas

PostPosted: Tue Nov 30, 2004 4:37 am    Post subject: Reply with quote

I was wondering the same thing, and Soth, your answer doesn't really answer his question.

Of course either the build will fail or when the compile is down, the program isgoing to have some major issues.

I think he was more or less wondering how gracefully will portage let you know that a new packages needs to be installed but it is excluded.

Thinking of how portage works, When it builds the dependency list is where it will fail. Since you won't have an .ebuild file for package Z, when it checks to make sure you have all of package Z's dependency it will give an about not finding an ebuild to statisfy package Z. The least descriptive portage will give is this

Code:
jaguar dev-util # emerge -vp iverilog

These are the packages that I would merge, in order:

Calculating dependencies -
emerge: there are no ebuilds to satisfy "dev-util/gperf".

or if you just direclty emerge something without check it out first you would get this
Code:

jaguar dev-util # emerge iverilog
Calculating dependencies -
emerge: there are no ebuilds to satisfy "dev-util/gperf".

(the above test was complete by moving the gperf folder to gpef-old so portage couldn't find an ebuld to gperf)

both very useful in finding out what went wrong. The most descriptive it would get is what you get when you run a x86 machine and something is mask to only ~x86, or something is hardmasked, you will get some red text telling the package Z is excluded by such-and-such file.
_________________
Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
Insanity5902
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1228
Location: Fort Worth, Texas

PostPosted: Tue Nov 30, 2004 4:58 am    Post subject: Reply with quote

Well slap me silly, this shit is awesome.

before
Quote:
Number of files: 101987

after
Quote:
Number of files: 8068


(with distfiles moved out of folder)
before
Quote:
jaguar ~ # du ./portage -sh
116M ./portage

after
Quote:
jaguar usr # du ./portage/ -sh
18M ./portage/


all I can say is WOW.


What I ended up doing was moving distfiles to anther dir so it doesn't get delted, it doesn't really matter,just saves you from ahving to re-download files again. After moving distfiles I did a
Code:
rm -rf /usr/portage/*
then rean
Code:
 emerge sync
.

After the emerge (which btw, took less then a minute) I ran a
Code:
emerge -vauD world
. This took a while b/c everything was being generated for the first time, a few QA comments popped up, I ran it again and it was fast as heck.

You look inside /usr/portage now and you see the directory structure, but the package folders and the ebuilds are gone.

What blows my mind is the 8,000 files compared to the 100,000 files. Can't get any better then this. This should be included in the gentoolkit even if under beta status. The amount of overhead this will save the rsync servers is amazing.

Thanks again, a 92.1% clean up here!
_________________
Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
soth
Apprentice
Apprentice


Joined: 12 Sep 2003
Posts: 207

PostPosted: Tue Nov 30, 2004 8:19 am    Post subject: Reply with quote

I see. So the question really was about what error message you will get when you want to emerge a package that's not longer covered in it's entirety by the slimmed version of portage?
That's sorta' what I meant by puke. I should have been more explicit I guess :oops:


oh, well.
_________________
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
Insanity5902
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1228
Location: Fort Worth, Texas

PostPosted: Tue Nov 30, 2004 1:09 pm    Post subject: Reply with quote

I knew what you meant by puke, but an emerge can puke in many different ways:P
_________________
Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
soth
Apprentice
Apprentice


Joined: 12 Sep 2003
Posts: 207

PostPosted: Tue Nov 30, 2004 1:17 pm    Post subject: Reply with quote

Yes. You have a point.
But my sig is bolder than yours...
8O
_________________
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
jinxos
n00b
n00b


Joined: 13 Jan 2004
Posts: 32
Location: Athens, Greece

PostPosted: Tue Nov 30, 2004 4:05 pm    Post subject: UNDO actions? Reply with quote

Ok, nice script and everything but how do you "UNDO" the changes if you (for whatever reason) much sync the whole portgae tree again?

will
Code:
cd /usr/portage && rm -rf *
suffice (after having commented out the RSYNC_EXCLUDEFROM directive from make.conf?

J.
Back to top
View user's profile Send private message
Insanity5902
Veteran
Veteran


Joined: 23 Jan 2004
Posts: 1228
Location: Fort Worth, Texas

PostPosted: Tue Nov 30, 2004 4:08 pm    Post subject: Reply with quote

to undo the change, all you would have to do is comment out the EXCLUDE line form you make.conf file and then run an emerge sync. That will then fully populate your PORTAGE_DIR will all the packages
_________________
Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
schotter
Guru
Guru


Joined: 30 Nov 2004
Posts: 497
Location: Germany, Bavaria, Bayreuth, Pottenstein, Tüchersfeld

PostPosted: Wed Feb 09, 2005 8:52 pm    Post subject: Reply with quote

I wanted to run the script but I don't have the file qpkg. gentoolkit-dev is emerged, so what else do I need?
Back to top
View user's profile Send private message
Pythonhead
Developer
Developer


Joined: 16 Dec 2002
Posts: 1801
Location: Redondo Beach, Republic of Calif.

PostPosted: Thu Feb 10, 2005 8:43 pm    Post subject: Reply with quote

schotter wrote:
I wanted to run the script but I don't have the file qpkg. gentoolkit-dev is emerged, so what else do I need?


qpkg belongs to gentoolkit, not gentoolkit-dev
Back to top
View user's profile Send private message
tierra
n00b
n00b


Joined: 22 Jun 2003
Posts: 13

PostPosted: Tue Apr 26, 2005 7:30 am    Post subject: Reply with quote

Just thought I would bump this thread since I've found it very useful considering how big portage has grown in the last year. Nice work Nick.
Back to top
View user's profile Send private message
doppelganger
Tux's lil' helper
Tux's lil' helper


Joined: 30 Jun 2004
Posts: 84

PostPosted: Fri Aug 26, 2005 4:23 pm    Post subject: Nice script..but having an error on emerge --sync Reply with quote

well the script works fine for creating the rsync_excludes file. Well I moved my portage dir to portage.org and created another portage dir. I then ran an emerge --sync to recreate the portage structure. At the end of the sync I am getting this error

>>> Updating Portage cache:
Traceback (most recent call last):
File "/usr/bin/emerge", line 2705, in ?
oldcat = portage.catsplit(cp_list[0])[0]
IndexError: list index out of range

it appears that emerge is looking for a previous list from portage. I'm new to python so 2700 lines of python code is quiet overwhelming for me right now. Any help would be appreciated
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum