Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Unsupported Software
  • Search

Script to cut rsync file list by upto 95%

This forum covers all Gentoo-related software not officially supported by Gentoo. Ebuilds/software posted here might harm the health and stability of your system(s), and are not supported by Gentoo developers. Bugs/errors caused by ebuilds from overlays.gentoo.org are covered by this forum, too.
Post Reply
Advanced search
50 posts
  • 1
  • 2
  • Next
Author
Message
NickDaFish
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 112
Joined: Thu Sep 12, 2002 4:24 pm
Location: Boston, USA

Script to cut rsync file list by upto 95%

  • Quote

Post by NickDaFish » Mon May 19, 2003 10:04 pm

I have a few gentoo servers that have only a few packages above the base install. Whenever I'd sync them I'd watch as they chewed up bandwidth and CPU grabbing tons of files they didn't and wouldn't need.
So I decided to build an exclude list thinking it would be easyer than making an include list.... Long story short I ended up writing this script...

Code: Select all

#!/usr/bin/python
#
# prlock.py (Portage Rsync Lockdown)
# This script creates a exclude list for portage's rsync. The exclude list
# excludes all but the installed packages (found with qpkg).
# Specify 'branch-name/package-name' arguments to unlock additional packages
# that are not yet installed.
# Remember to edit make.conf and make the dir /etc/portage
#
# *WARNING*
# If you use this script and find yourself wanting to add a new package...
# You MUST either add it to the exclude list (By passing it as an arg to 
# prlock) or comment out the exclude list entirly in make.conf.
# Then do a sync
# THEN add the package
# If you don't you will add a package that may be out of date!
#
# Eg.....
#     > prlock.py dev-python/wxPython
# This will restrict the rsync to only the installed packages AND wxPython
# All other packages will not be updated.
#
# Nick Fisher prlock@nickdafish.com
#
import os, re, sys

# Quit if an arg that doens't look like a package is encountered.
for arg in sys.argv[1:]:
    if not re.match('[\w|-]+/[\w|-]+',arg):
        print 'Line doens\'t look like it is a package....\n',arg,'\n','...aborting.'
        sys.exit(1)

# Check that the make.conf is ready for the exclude
for line in open('/etc/make.conf','r').readlines():
    if re.match('RSYNC_EXCLUDEFROM=/etc/portage/rsync_excludes', line): break
else:
    print 'WARNING! RSYNC_EXCLUDEFROM=/etc/portage/rsync_excludes was not found in /etc/make.conf'
    print 'Make sure that it isn\'t commented out. Portage will not exclude without it!'

# Check that we will be able to write the file
if not os.path.isdir('/etc/portage'):
    print "/etc/portage/ does not exist!\nPlease make the dir and try again." 
    sys.exit(1)

# Run qpkg to find all the installed packages
pkgl = os.popen('/bin/bash /usr/bin/qpkg -nc -I','r').readlines()

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)
    print 'Adding package from cmd line ',arg

# Add the branches to list and format
fmt = re.compile(r'[\w|-]*/')
list=[]
for line in pkgl:
    m = fmt.search(line)
    if not('+ '+m.group()+'\n' in list): list.append('+ '+m.group()+'\n')
    if not('+ metadata/cache/'+m.group()+'\n' in list): list.append('+ metadata/cache/'+m.group()+'\n')

# Add the package dir and all subdirs/files to list
for x in range(len(pkgl)):
    list.append('+ '+pkgl[x].strip()+'/\n')
    list.append('+ '+pkgl[x].strip()+'/**\n')
    list.append('+ metadata/cache/'+pkgl[x].strip()+'*\n')

# Append the directives to allow access to the metadata
list.append('+ metadata/\n')
list.append('+ metadata/*\n')
list.append('+ metadata/cache/\n')
list.append('+ metadata/cache/*\n')

# Allow sync for eclasses
# I don't know how I could exclude them well
list.append('+ eclass/\n')
list.append('+ eclass/**\n')

# Allow sync for files
# These shouldn't be excluded
list.append('+ files/\n')
list.append('+ files/**\n')

# Allow sync for libsidplay
# I havn't a clue what this branch is about
list.append('+ libsidplay/\n')
list.append('+ libsidplay/**\n')

# Allow sync for licenses
# Should be updated that often
list.append('+ licenses/\n')
list.append('+ licenses/**\n')

# Allow sync for packages
# Dunno what this branch is about
# Include for safteys sake
list.append('+ packages/\n')
list.append('+ packages/**\n')

# Allow sync for profiles
# Should smarten this up at some point
list.append('+ profiles/\n')
list.append('+ profiles/**\n')

# Allow sync for scripts
list.append('+ scripts/\n')
list.append('+ scripts/**\n')

# Sort the list for readability
list.sort()

# Add the include all files and exclude everything else
list.append('+ /*\n- *')

# Write the pkg list out to the exclude file
open('/etc/portage/rsync_excludes','w').writelines(''.join(list))

If you ever need to add a new package remember to add it first to the exclude list by passing it as an arg to prlock. Then sync again to get all the latest files.

Things to note:
Remember to make the /etc/portage dir
Remember to uncomment the 'RSYNC_EXCLUDEFROM' in the make.conf
Remember to localise rsync as layed out in the GWN

Also Please note that I'm not and do not profess to be a Portage guru or a Python guru. Infact this is the first python script I have used in production. If you have any notes, bugs or comments about the script please message me and I will do an edit.... lets keep this thread clean.
I'm not really sure that this script is 100% safe. I've been using it for a while, I've have had no trouble and I see no reason why I should. If anyone does spot a problem please let me know!

Happy light weight rsyncing :wink:

Nick

EDIT: Added includes for portage dirs that should always be rsynced.
Last edited by NickDaFish on Mon Jun 07, 2004 3:34 pm, edited 2 times in total.
Top
jmpnz
n00b
n00b
User avatar
Posts: 23
Joined: Mon May 19, 2003 9:51 pm

  • Quote

Post by jmpnz » Tue Mar 09, 2004 11:20 pm

qpkg has moved. Change this line :

Code: Select all

pkgl = os.popen('/bin/bash /usr/portage/app-admin/gentoolkit/files/scripts/qpkg -nc -I','r').readlines()
to

Code: Select all

pkgl = os.popen('/bin/bash /usr/bin/qpkg -nc -I','r').readlines()
Also it would be nice if the script would remove the directories no longer needed. Is there an emerge or gentoolkit way to do that?
Top
NickDaFish
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 112
Joined: Thu Sep 12, 2002 4:24 pm
Location: Boston, USA

  • Quote

Post by NickDaFish » Tue Mar 09, 2004 11:25 pm

Thanks for that.... I've edited the script to reflect the change. :wink:
Top
jmglov
Retired Dev
Retired Dev
User avatar
Posts: 23
Joined: Sat Aug 03, 2002 8:33 pm
Location: Yokohama, Japan
Contact:
Contact jmglov
Website

Re: Script to cut rsync file list by upto 95%

  • Quote

Post by jmglov » Fri Jun 04, 2004 1:42 am

NickDaFish wrote:Long story short I ended up writing this script...
And quite a nice script it is! May I suggest one change?

Change this:
NickDaFish wrote:

Code: Select all

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)
To this:

Code: Select all

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)
    print 'Adding package ',arg,'\n';
Note that I have suggested that your script be added to gentoolkit:

http://bugs.gentoo.org/show_bug.cgi?id=47715

Cheers!
Josh Glover <jmglov@gentoo.org>
Gentoo Developer (http://dev.gentoo.org/~jmglov/)
Top
NickDaFish
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 112
Joined: Thu Sep 12, 2002 4:24 pm
Location: Boston, USA

Re: Script to cut rsync file list by upto 95%

  • Quote

Post by NickDaFish » Mon Jun 07, 2004 3:42 pm

jmglov wrote:May I suggest one change?

Change this:
NickDaFish wrote:

Code: Select all

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)
To this:

Code: Select all

# Add the args
for arg in sys.argv[1:]:
    pkgl.append(arg)
    print 'Adding package ',arg,'\n';
I don't *think* the

Code: Select all

,'\n';
is really required.... You've been doing too much C :wink:
jmglov wrote:Note that I have suggested that your script be added to gentoolkit:

http://bugs.gentoo.org/show_bug.cgi?id=47715

Cheers!
Cool..... I would be honored to have one of my sripts in the distro :D

However someone who knows what's going on in the portage tree should have a bit of a look and make sure I'm not doing samething daft (Missing important branches or something). I built alot of it through trial and error.....
Top
ruth
Retired Dev
Retired Dev
Posts: 640
Joined: Sun Sep 07, 2003 1:56 pm
Location: M / AN / BY / GER

  • Quote

Post by ruth » Mon Jun 07, 2004 6:52 pm

hi,
i really _love_ your script... ;)
works great...
thanks a lot...

rootshell
"The compiler has tried twice to abort and cannot do so; therefore, compilation will now terminate."
-- IBM PL/I (F) error manual
Top
soth
Apprentice
Apprentice
User avatar
Posts: 207
Joined: Fri Sep 12, 2003 12:27 am

  • Quote

Post by soth » Tue Nov 23, 2004 8:14 pm

Splendid idea!
Though, I seem to be doing something wrong:

Code: Select all


root@raziel ~/shellscripts # ./prlock.py                                                           
./prlock.py: line 30: syntax error near unexpected token `if'
./prlock.py: line 30: `    if not re.match('[\w|-]+/[\w|-]+',arg):'
root@raziel ~/shellscripts # nano prlock.py                                                        
root@raziel ~/shellscripts # ./prlock.py                                                           
./prlock.py: line 35: syntax error near unexpected token `('
./prlock.py: line 35: `for line in open('/etc/make.conf','r').readlines():'
root@raziel ~/shellscripts # ./prlock.py                                                           
root@raziel ~/shellscripts # nano prlock.py                                                        
root@raziel ~/shellscripts # ./prlock.py                                                           
./prlock.py: line 42: syntax error near unexpected token `('
./prlock.py: line 42: `if not os.path.isdir('/etc/portage'):'
root@raziel ~/shellscripts # nano prlock.py                                                        
root@raziel ~/shellscripts # ./prlock.py                                                           
./prlock.py: line 47: syntax error near unexpected token `('
./prlock.py: line 47: `pkgl = os.popen('/bin/bash /usr/bin/qpkg -nc -I','r').readlines()'

I tried to comment out the stuff it was complaining about and got the above, which leads me to think I missed something, which I cannot figure out.
would love to use it, but maybe I'm too stwuped =/
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Top
soth
Apprentice
Apprentice
User avatar
Posts: 207
Joined: Fri Sep 12, 2003 12:27 am

  • Quote

Post by soth » Tue Nov 23, 2004 9:04 pm

Geh. Now i got it to work.
I found an extra newline at the top of the script I pasted in there by mistake...
Stupid me.

Now another question;

should it not allow the syntax
RSYNC_EXCLUDEFROM="/etc/portage/rsync_excludes"

not just
RSYNC_EXCLUDEFROM=/etc/portage/rsync_excludes
which seems to be the case?

My bad if I'm in error.
Lovely script. I now sync only ~7800 files om my largest system =D
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Top
NickDaFish
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 112
Joined: Thu Sep 12, 2002 4:24 pm
Location: Boston, USA

  • Quote

Post by NickDaFish » Wed Nov 24, 2004 9:17 pm

soth wrote: should it not allow the syntax
RSYNC_EXCLUDEFROM="/etc/portage/rsync_excludes"

not just
RSYNC_EXCLUDEFROM=/etc/portage/rsync_excludes
which seems to be the case?
In the example make.conf that comes with portage there are no quotes.... and none are really needed. If you want to quote RSYNC_EXCLUDEFROM then alter the script :wink:
soth wrote: Lovely script. I now sync only ~7800 files om my largest system =D
Cool! Another happy customer :D
Top
soth
Apprentice
Apprentice
User avatar
Posts: 207
Joined: Fri Sep 12, 2003 12:27 am

  • Quote

Post by soth » Wed Nov 24, 2004 10:09 pm

Ah you are right indeed. It's only url's, commands and variables that have quotes. Though portage says it's ok, then I thought scripts should comply, but I should prolly just shaddup and thank you again.

I'm very grateful. Keep up the good work. Thanks again! =)
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Top
wesblake
n00b
n00b
User avatar
Posts: 52
Joined: Fri Jun 25, 2004 2:10 am
Location: Sacramento
Contact:
Contact wesblake
Website

Dependency question

  • Quote

Post by wesblake » Thu Nov 25, 2004 2:16 am

Ok, cool script. I'm about to run it on my laptop, but I have one question about how dependencies might work. In the past, I've noticed that some packages that I already have installed will have differen't or new dependencies when I do an update to the newer version of the already installed package. Will these still show when I do emerge -pu world? If they will still show at least, I don't mind having to add them late with prlock.py package-name. I'm mostly concerned that they might not show so the package I have will be updated but brocken because of some unknown new dependencies that were excluded.
Thanks.
<kow`> "There are 10 types of people in the world... those who understand binary and those who don't."
Top
soth
Apprentice
Apprentice
User avatar
Posts: 207
Joined: Fri Sep 12, 2003 12:27 am

  • Quote

Post by soth » Thu Nov 25, 2004 8:08 am

I'm not quite sure if I understand your question right, but as I understand it, portage is quite aware of the dependenciesof the packages. It will tell you if somethings missing installtime. Then you hvave to include that in your copy of oprtage and sync again I guess. That being the backside of this approach...
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Top
wesblake
n00b
n00b
User avatar
Posts: 52
Joined: Fri Jun 25, 2004 2:10 am
Location: Sacramento
Contact:
Contact wesblake
Website

  • Quote

Post by wesblake » Mon Nov 29, 2004 8:40 pm

Well, I think that answered my question, but how about a better example in case. Let's say:
I have Package A version 1.0 depending on packages X and Y.
I run this script and package Z is not emerged so added to the exclude list.
Then Package A version 2.0 comes out which now depends on package Z.
Since package Z was excluded, will I now have any issue emerging the new package A?

It sounds like from what you said that because it is a dependency, portage will still list it with emerge -pu packageA wether it is on the exclude list or not.
<kow`> "There are 10 types of people in the world... those who understand binary and those who don't."
Top
soth
Apprentice
Apprentice
User avatar
Posts: 207
Joined: Fri Sep 12, 2003 12:27 am

  • Quote

Post by soth » Mon Nov 29, 2004 9:07 pm

Well, that can happen without an upgrade I guess =)

Emerge will fail and puke...
what to do is to rerun the script like

prlock.py Z
emerge sync
emerge -Duva A
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Top
Insanity5902
Veteran
Veteran
User avatar
Posts: 1228
Joined: Fri Jan 23, 2004 3:32 pm
Location: Fort Worth, Texas

  • Quote

Post by Insanity5902 » Tue Nov 30, 2004 4:37 am

I was wondering the same thing, and Soth, your answer doesn't really answer his question.

Of course either the build will fail or when the compile is down, the program isgoing to have some major issues.

I think he was more or less wondering how gracefully will portage let you know that a new packages needs to be installed but it is excluded.

Thinking of how portage works, When it builds the dependency list is where it will fail. Since you won't have an .ebuild file for package Z, when it checks to make sure you have all of package Z's dependency it will give an about not finding an ebuild to statisfy package Z. The least descriptive portage will give is this

Code: Select all

jaguar dev-util # emerge -vp iverilog

These are the packages that I would merge, in order:

Calculating dependencies -
emerge: there are no ebuilds to satisfy "dev-util/gperf".
or if you just direclty emerge something without check it out first you would get this

Code: Select all

jaguar dev-util # emerge iverilog
Calculating dependencies -
emerge: there are no ebuilds to satisfy "dev-util/gperf".
(the above test was complete by moving the gperf folder to gpef-old so portage couldn't find an ebuld to gperf)

both very useful in finding out what went wrong. The most descriptive it would get is what you get when you run a x86 machine and something is mask to only ~x86, or something is hardmasked, you will get some red text telling the package Z is excluded by such-and-such file.
Join the adopt an unanswered post initiative today
Top
Insanity5902
Veteran
Veteran
User avatar
Posts: 1228
Joined: Fri Jan 23, 2004 3:32 pm
Location: Fort Worth, Texas

  • Quote

Post by Insanity5902 » Tue Nov 30, 2004 4:58 am

Well slap me silly, this shit is awesome.

before
Number of files: 101987
after
Number of files: 8068
(with distfiles moved out of folder)
before
jaguar ~ # du ./portage -sh
116M ./portage
after
jaguar usr # du ./portage/ -sh
18M ./portage/
all I can say is WOW.


What I ended up doing was moving distfiles to anther dir so it doesn't get delted, it doesn't really matter,just saves you from ahving to re-download files again. After moving distfiles I did a

Code: Select all

rm -rf /usr/portage/*
then rean

Code: Select all

 emerge sync
.

After the emerge (which btw, took less then a minute) I ran a

Code: Select all

emerge -vauD world
. This took a while b/c everything was being generated for the first time, a few QA comments popped up, I ran it again and it was fast as heck.

You look inside /usr/portage now and you see the directory structure, but the package folders and the ebuilds are gone.

What blows my mind is the 8,000 files compared to the 100,000 files. Can't get any better then this. This should be included in the gentoolkit even if under beta status. The amount of overhead this will save the rsync servers is amazing.

Thanks again, a 92.1% clean up here!
Join the adopt an unanswered post initiative today
Top
soth
Apprentice
Apprentice
User avatar
Posts: 207
Joined: Fri Sep 12, 2003 12:27 am

  • Quote

Post by soth » Tue Nov 30, 2004 8:19 am

I see. So the question really was about what error message you will get when you want to emerge a package that's not longer covered in it's entirety by the slimmed version of portage?
That's sorta' what I meant by puke. I should have been more explicit I guess :oops:


oh, well.
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Top
Insanity5902
Veteran
Veteran
User avatar
Posts: 1228
Joined: Fri Jan 23, 2004 3:32 pm
Location: Fort Worth, Texas

  • Quote

Post by Insanity5902 » Tue Nov 30, 2004 1:09 pm

I knew what you meant by puke, but an emerge can puke in many different ways:P
Join the adopt an unanswered post initiative today
Top
soth
Apprentice
Apprentice
User avatar
Posts: 207
Joined: Fri Sep 12, 2003 12:27 am

  • Quote

Post by soth » Tue Nov 30, 2004 1:17 pm

Yes. You have a point.
But my sig is bolder than yours...
8O
- Never argue with an idiot. They just drag you down to your level and beat you with experience.

Join the adopt an unanswered post initiative today
Top
jinxos
n00b
n00b
User avatar
Posts: 32
Joined: Tue Jan 13, 2004 9:31 am
Location: Athens, Greece

UNDO actions?

  • Quote

Post by jinxos » Tue Nov 30, 2004 4:05 pm

Ok, nice script and everything but how do you "UNDO" the changes if you (for whatever reason) much sync the whole portgae tree again?

will

Code: Select all

cd /usr/portage && rm -rf *
suffice (after having commented out the RSYNC_EXCLUDEFROM directive from make.conf?

J.
Top
Insanity5902
Veteran
Veteran
User avatar
Posts: 1228
Joined: Fri Jan 23, 2004 3:32 pm
Location: Fort Worth, Texas

  • Quote

Post by Insanity5902 » Tue Nov 30, 2004 4:08 pm

to undo the change, all you would have to do is comment out the EXCLUDE line form you make.conf file and then run an emerge sync. That will then fully populate your PORTAGE_DIR will all the packages
Join the adopt an unanswered post initiative today
Top
schotter
Guru
Guru
User avatar
Posts: 497
Joined: Tue Nov 30, 2004 10:16 pm
Location: Germany, Bavaria, Bayreuth, Pottenstein, Tüchersfeld
Contact:
Contact schotter
Website

  • Quote

Post by schotter » Wed Feb 09, 2005 8:52 pm

I wanted to run the script but I don't have the file qpkg. gentoolkit-dev is emerged, so what else do I need?
Top
Pythonhead
Developer
Developer
User avatar
Posts: 1801
Joined: Mon Dec 16, 2002 6:30 pm
Location: Redondo Beach, Republic of Calif.
Contact:
Contact Pythonhead
Website

  • Quote

Post by Pythonhead » Thu Feb 10, 2005 8:43 pm

schotter wrote:I wanted to run the script but I don't have the file qpkg. gentoolkit-dev is emerged, so what else do I need?
qpkg belongs to gentoolkit, not gentoolkit-dev
Top
tierra
n00b
n00b
Posts: 13
Joined: Sun Jun 22, 2003 9:29 pm
Contact:
Contact tierra
Website

  • Quote

Post by tierra » Tue Apr 26, 2005 7:30 am

Just thought I would bump this thread since I've found it very useful considering how big portage has grown in the last year. Nice work Nick.
Top
doppelganger
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 84
Joined: Wed Jun 30, 2004 10:42 pm

Nice script..but having an error on emerge --sync

  • Quote

Post by doppelganger » Fri Aug 26, 2005 4:23 pm

well the script works fine for creating the rsync_excludes file. Well I moved my portage dir to portage.org and created another portage dir. I then ran an emerge --sync to recreate the portage structure. At the end of the sync I am getting this error

>>> Updating Portage cache:
Traceback (most recent call last):
File "/usr/bin/emerge", line 2705, in ?
oldcat = portage.catsplit(cp_list[0])[0]
IndexError: list index out of range

it appears that emerge is looking for a previous list from portage. I'm new to python so 2700 lines of python code is quiet overwhelming for me right now. Any help would be appreciated
Top
Post Reply

50 posts
  • 1
  • 2
  • Next

Return to “Unsupported Software”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic