View previous topic :: View next topic |
Author |
Message |
Carlo Developer
Joined: 12 Aug 2002 Posts: 3356
|
Posted: Sun Feb 08, 2004 10:46 pm Post subject: |
|
|
@TGL: your script is nice, but doesn't work (as other scripts, too) with some ebuilds.
On my box it wants to delete these files, even though the corresponding ebuilds are merged.
Code: | [ 389.23 K ] /usr/portage/distfiles/OOo-Thesaurus-snapshot.zip
[ 8.10 M ] /usr/portage/distfiles/Python-2.3.3.tgz
[ 2.40 M ] /usr/portage/distfiles/j2sdk-1_4_2-bin-scsl.zip
[ 3.27 M ] /usr/portage/distfiles/j2sdk-1_4_2-mozilla_headers-unix.zip
[ 46.99 M ] /usr/portage/distfiles/j2sdk-1_4_2-src-scsl.zip
[ 794.31 K ] /usr/portage/distfiles/j2sdk-sec-1_4_2-src-scsl.zip
[ 2.61 M ] /usr/portage/distfiles/koffice-i18n-de-1.3.tar.bz2 |
koffice-i18n and j2sdk ebuilds are using some code to decide dynamically, which tarballs portage shall fetch and merge. What makes me wonder is the python tarball.
Carlo _________________ Please make sure that you have searched for an answer to a question after reading all the relevant docs. |
|
Back to top |
|
|
TGL Bodhisattva
Joined: 02 Jun 2002 Posts: 1978 Location: Rennes, France
|
Posted: Sun Feb 08, 2004 11:10 pm Post subject: |
|
|
Arggg... Okay, then I guess using the cache was in fact a bad idea.
Can you try with the original script by far, a very few posts before mine ? If it is better, then I will just merge my interface code in his version.
Thanks. |
|
Back to top |
|
|
TGL Bodhisattva
Joined: 02 Jun 2002 Posts: 1978 Location: Rennes, France
|
Posted: Mon Feb 09, 2004 12:02 am Post subject: |
|
|
For Python-2.3.3, that's really weird. I don't see anything particular with this packages here. The others, I can understand, but this one... Could you please try that?: Code: | cat /var/cache/edb/dep/dev-lang/python-2.3.3 | Or maybe you don't have such a file at all? (which would really be strange because it seems to be present in portage/metadata)
TGL wrote: | If it is better, then I will just merge my interface code in his version. |
It would give us something like this:
EDIT: removed the useless code.
Last edited by TGL on Mon Feb 09, 2004 11:18 pm; edited 1 time in total |
|
Back to top |
|
|
Carlo Developer
Joined: 12 Aug 2002 Posts: 3356
|
Posted: Mon Feb 09, 2004 12:52 am Post subject: |
|
|
far's and your latest script do not list the koffice-i18n tarball.
Code: | cat /var/cache/edb/dep/dev-lang/python-2.3.3 |
That's a strange one. The real tarball is .tgz, but portage recorded a .tar.bz2
Similiar the sun-j2sdk* data - tarballs are not listed in the cache file. Same for older versions.
Carlo _________________ Please make sure that you have searched for an answer to a question after reading all the relevant docs. |
|
Back to top |
|
|
TGL Bodhisattva
Joined: 02 Jun 2002 Posts: 1978 Location: Rennes, France
|
Posted: Mon Feb 09, 2004 1:45 am Post subject: |
|
|
Carlo wrote: | far's and your latest script do not list the koffice-i18n tarball. |
Hmmm, i think i've understood: there is no space beetween filenames and closing ")" in the SRC_URI, but i've assumed there was always one. I have to fix the regexp then... Probably something like this would be enough: Code: | file_regexp = re.compile('([a-zA-Z0-9_,\.\-\+]*)[\s\)]') |
Carlo wrote: | That's a strange one. The real tarball is .tgz, but portage recorded a .tar.bz2 |
Okay, i've read the changelog and it happens that the SRC_URI changed at some point (from .gz to .bz2). So this was not an error, the file would not have been used anyway, and the .bz2 would have been fetch in all case.
Carlo wrote: | Similiar the sun-j2sdk* data - tarballs are not listed in the cache file. Same for older versions. |
For this ones, i would say ebuild bug. The files should be listed in SRC_URI, with a RESTRICT="fetch", like in dev-lang/sun-jdk for instance. But they are not, and so the files are neither referenced in digest nor in cache. Missing in cache, ok, it doesn't hurt but my script, but missing in digest is a bad thing, digests are here for some good reasons. Will open a bug report.
And the last one, the thesaurus, from which package was it coming? |
|
Back to top |
|
|
Carlo Developer
Joined: 12 Aug 2002 Posts: 3356
|
Posted: Mon Feb 09, 2004 2:07 am Post subject: |
|
|
TGL wrote: | Carlo wrote: | That's a strange one. The real tarball is .tgz, but portage recorded a .tar.bz2 |
Okay, i've read the changelog and it happens that the SRC_URI changed at some point (from .gz to .bz2). So this was not an error, the file would not have been used anyway, and the .bz2 would have been fetch in all case. |
O.k. But it's still a bit strange. That just means if you want make sure that all tarballs of the merged ebuilds are available local, you have to 'fetch world' after syncing.
TGL wrote: | For this ones, i would say ebuild bug. The files should be listed in SRC_URI, with a RESTRICT="fetch", like in dev-lang/sun-jdk for instance. But they are not, and so the files are neither referenced in digest nor in cache. Missing in cache, ok, it doesn't hurt but my script, but missing in digest is a bad thing, digests are here for some good reasons. Will open a bug report. |
So by now ebuilds can be easily circumvent digest tests. The security bell rings! This code should be part of an automatic test when commiting ebuilds.
TGL wrote: | And the last one, the thesaurus, from which package was it coming? |
It's not part of the official tree (binary german OpenOffice). Missing space & closing parenthesis problem, too.
Carlo _________________ Please make sure that you have searched for an answer to a question after reading all the relevant docs. |
|
Back to top |
|
|
wolf31o2 Retired Dev
Joined: 31 Jan 2003 Posts: 628 Location: Mountain View, CA
|
Posted: Mon Feb 09, 2004 5:01 pm Post subject: |
|
|
Carlo wrote: |
So by now ebuilds can be easily circumvent digest tests. The security bell rings! This code should be part of an automatic test when commiting ebuilds. | Actually, it is a part of the tests. If you look at this closely, you'll realize that there is no way for an automated script to be able to tell that a file is missing from the digest if it is also missing from SRC_URI. _________________ Ex-Gentoo Developer
Catalyst/Genkernel Development Lead
http://wolf31o2.org |
|
Back to top |
|
|
Carlo Developer
Joined: 12 Aug 2002 Posts: 3356
|
Posted: Mon Feb 09, 2004 5:53 pm Post subject: |
|
|
wolf31o2: I never had a deep look at portage sources, but if it is possible to download files without the need to perform a digest check, then it is a security hole in my eyes.
Think about a hacked mirror. Everyone who installs an affected ebuild, could install a clever wrapped rootkit without taking notice, that something is wrong.
wolf31o2 wrote: | If you look at this closely, you'll realize that there is no way for an automated script to be able to tell that a file is missing from the digest if it is also missing from SRC_URI. |
Sure. The whole ebuild has to be checked for possibly malicious code.
Carlo _________________ Please make sure that you have searched for an answer to a question after reading all the relevant docs. |
|
Back to top |
|
|
wolf31o2 Retired Dev
Joined: 31 Jan 2003 Posts: 628 Location: Mountain View, CA
|
Posted: Mon Feb 09, 2004 6:20 pm Post subject: |
|
|
You are not understanding the problem at all. If a file is not listed in the SRC_URI it is NOT downloaded. This means there is NO automated method for getting this file, aside from entering a manual wget command in the ebuild or something similar. Because of this, there is NO WAY that the digest portion of ebuild can KNOW that the file even EXISTS.
Think of it like this:
SRC_URI=""
RESTRICT=fetch
Will download NO FILES AT ALL... therefore, NOTHING will be in the digest
Now, you simply put an einfo "Download blah.blah.tar.bz2 from http://blah.blah.org and place it in ${DISTDIR}" and the fetch portion is taken care of.
Later, the ebuild is doing an unpack on a file that DOESN'T EXIST in the eyes of portage. It has been MANUALLY added to ${DISTDIR}, and not digested.
THIS IS A BUG IN THE EBUILD AND SHOULD BE CORRECTED BY THE MAINTAINER
It is not a bug in portage. Portage is functioning EXACTLY as it should.
See the difference? _________________ Ex-Gentoo Developer
Catalyst/Genkernel Development Lead
http://wolf31o2.org |
|
Back to top |
|
|
tecknojunky Veteran
Joined: 19 Oct 2002 Posts: 1937 Location: Montréal
|
Posted: Sun Feb 15, 2004 12:27 am Post subject: |
|
|
I made a detailed post in this thread and Moz decided to die on me. I lost my post, so I'm pissed. So, I'll be rather straight and resume.
The scripts I tried in this thread seemed unsafe. So I made this one, based on the A= value found in the "environment.gz" of each ebuilds that ends up in the /var/pkg/<category> folder for each of them.
WARNING: I have discovered that the "environment.gz" is an added feature to Portage, since I have a box for which ebuilds are missing that file. That means this script wont be able to know which tarballs are needed by those ebuilds.
The script follows: Code: | script superseeded by the one in the post below |
The script first do some sanity checks to ensure all the tools needed are available and that the folders configured in make.conf exists. It then aggregate the files from the "environment.gz" and save it sorted in /tmp/need. It will then save into /tmp/have the listing of /usr/postage/distfiles. Then a diff is performed between those two files and will sed the entries that begins with a +, which should, in theory, be the superfluous files.
I did not include any file manipulations in this script, no pretend parameter, no remove, no nothing, just output. To effectivley delete the files, one could do Code: | for FILE in `scriptname`; do rm /usr/portage/distfiles/$FILE; done |
I myself can't use this as-is because my /usr/portage/distfiles is NFS shared among the boxes on my lan. I'm thinking to use consecutive runs of the script on each box to aggregate the final list of unneeeded files and then perform the actual deletions.
Further, there seem to be no entries in the "environment.gz" files for X430src-4.tgz thrue X430src-7.tgz. I'm wondering how I got those files since I obviously needed X430src-1.tgz thrue X430src-3.tgz. So I have uncertainties. [ANSWER]My /usr/portage/distfiles is NFS shared. One box has xfree-4.3.0 and the other has xfree-4.99.902. It seem the latest don't use X430src-4.tgz thrue X430src-7.tgz.[/ANSWER]
This script runs under 10 seconds or so on my box. I have no clue if it is performant or not.
Any comment are mostly wellcome.
[edited=Sun feb15, 2004 02:41]
Corrected a bug that included /usr/portage/distfiles/cvs-src folder. Now the script ensure that only files get listed.
I'm working on a method to run this script on several machines nfs sharing the distfiles folder.
[/edited] _________________ (7 of 9) Installing star-trek/species-8.4.7.2::talax.
Last edited by tecknojunky on Sat Feb 28, 2004 7:48 pm; edited 1 time in total |
|
Back to top |
|
|
tecknojunky Veteran
Joined: 19 Oct 2002 Posts: 1937 Location: Montréal
|
Posted: Sun Feb 15, 2004 10:46 am Post subject: |
|
|
Here's an updated version for people who share their portage tree thrue NFS. Code: | #!/bin/sh
# Alpha code!!!
bye () {
echo "$1"
exit
}
sanity_check () {
for TOOL in bzcat sort diff sed hostname
do
[ "`which $TOOL 2> /dev/null`" ] || bye "I need $TOOL!"
done
ME="`hostname`"
[ "$ME" ] || bye "I don't know who I am."
eval `sed '/^PORTDIR=/!d;q' /etc/make.conf`
[ "$PORTDIR" ] || PORTDIR=/usr/portage
eval `sed '/^DISTDIR=/!d;q' /etc/make.conf`
[ "$DISTDIR" ] || DISTDIR="$PORTDIR/distfiles"
STALLSDIR="$DISTDIR/stalls"
[ -d "$STALLSDIR" ] || mkdir $STALLSDIR
eval `sed '/^PKGDIR=/!d;q' /etc/make.conf`
[ "$PKGDIR" ] || PKGDIR="$PORTDIR/packages"
DBDIR=/var/db/pkg
TMP=/tmp
for FOLDER in $PORTDIR $DISTDIR $PKGDIR $DBDIR $TMP $STALLSDIR
do
[ -d "$FOLDER" ] || bye "$FOLDER not found"
done
}
ineed() {
for ENVFILE in `ls $DBDIR/*/*/environment.bz2`
do
eval `bzcat $ENVFILE | sed '1q'`
for TARBALL in $A
do
echo "$TARBALL"
done
done
}
ihave() {
for ENTRY in `ls $DISTDIR`
do
[ -f "$DISTDIR/$ENTRY" ] && echo "$ENTRY"
done
}
iterate() {
DONTNEEDS="`ls $DISTDIR/stalls/*.dont.need`"
[ "$DONTNEEDS" ] || bye "Nothing to do"
ITERATION=0
# [ "$1" ] && echo -n "Iterating diffs:"
for DONTNEED in $DONTNEEDS
do
ITERATION="$(( $ITERATION + 1 ))"
if [ $ITERATION -eq 1 ]
then
cp $DONTNEED $STALLSDIR/$ITERATION.diff
else
diff -uN $STALLSDIR/$(( $ITERATION - 1 )).diff $DONTNEED | sed '/^[ ]/!d;s/^.//' > $STALLSDIR/$ITERATION.diff
fi
# echo -n " $ITERATION"
done
mv $STALLSDIR/$ITERATION.diff $STALLSDIR/dont.need
[ "`ls $STALLSDIR/*.diff`" ] && rm $STALLSDIR/*.diff
# [ "$1" ] && echo
cat $STALLSDIR/dont.need
}
scan() {
echo "Scanning..."
ineed | sort > "$TMP/need"
ihave > "$TMP/have"
diff -uN "$TMP/need" "$TMP/have" | sed '/^+[^+]/!d;s/^.//' > $STALLSDIR/$ME.dont.need
echo "$ME don't need these files:"; echo
cat $DISTDIR/stalls/$ME.dont.need
}
sanity_check
case "$1" in
"--diff")
iterate
;;
*)
scan
;;
esac |
Usage: genitor.sh [--diff]
There are two steps to follow:
1- Perform a scan on each box.
2- Perform a diff
This script do nothing to the files, there is only processing occuring. The final result is a list of tarballs that are identified as orphaned by all the boxes.
My setup is as follow. I have a LDAP root user with home folder located at /home/root/bin that is also NFS mounted. Putting the script there makes it available on all the boxes. You can simply make copies on each box if you don't have a NFS shared /home folder.
On each box, you perform a scan simply by calling the script with no parameters. The script 1st check if "stalls" folder exist under the distfiles folder, if not, it is created. Then it will output the unneeded files to stdout and to <hostname>.dont.need in the "stalls" folder. At the end, you should have a list of unneeded files for each of your box. Here's the result in my folder: Code: | manitou bin # ls -l /usr/portage/distfiles/stalls
total 20
-rw-r--r-- 1 root root 7797 Feb 15 2004 fiston.dont.need
-rw-r--r-- 1 root root 6478 Feb 15 00:16 manitou.dont.need
| Running the script again will simply scan again and overwrite the file. The scanning process takes about 10 seconds on each of my box which are a P3 500mhz and a Celeron 600mhz. So I find the performance acceptable.
Once there, call the script with the --diff parameter. It will scan *.dont.need and output to stdout the list of unneeded files by all the boxes. The output is also saved in a plain "dont.need" file in the "stalls" folder.
To effectively delete the files, you can include that output into a for loop: Code: | # for FILE in `./genitor.sh --diff`; do rm /usr/portage/distfiles/$FILE; done | An equivalent would be Code: | ./genitor.sh --diff
# for FILE in `cat /usr/portage/distfiles/stalls/dont.need`; do rm /usr/portage/distfiles/$FILE; done |
Alternatively, you could insert a rm into the script. I'm planning to do that later, when I'm sure this script is tried, tested and true. In addition, I will also add cleaning up the "stalls" folder.
Certainly, there are things that could probably be done differently in that script. Looking at it and I see things that are not quite right, but that's the artist side of me (not ). It's been a while I have coded in bash.
Also, bare in mind that, as stated in the previous post i made, this script is base on the assumption that Portage created an environment.gz file in /var/db/pkg. One of my box has not been updated for a long while and many of those ebvironment.gz files are missing. I'm currently emerge -U world that box. If you don't mind redownloading files mistakenly deleted, then there is no arm done. If your on dialup or have monthly capped downloads, then be careful. In the long run, all newly merged (for re-emerge or for update) will create the environment.gz file and, before oong, all installed ebuilds should have one. This script shall then be a working one... the author modestly hope. _________________ (7 of 9) Installing star-trek/species-8.4.7.2::talax. |
|
Back to top |
|
|
Yen Tux's lil' helper
Joined: 19 Oct 2003 Posts: 107 Location: Lummen, Belgium
|
Posted: Fri Feb 27, 2004 10:19 pm Post subject: |
|
|
You should add something to exclude packages as linux (linux-kernel), freetype, ...
Grtz Yen |
|
Back to top |
|
|
tecknojunky Veteran
Joined: 19 Oct 2002 Posts: 1937 Location: Montréal
|
Posted: Fri Feb 27, 2004 10:42 pm Post subject: |
|
|
Yen wrote: | You should add something to exclude packages as linux (linux-kernel), freetype, ...
Grtz Yen | What do you mean?
Right now the script is aimed at thos who have the portage tree NFS mounted. The script works in two moves.
1st move: Execute on each box to compile a list of needed distfiles for installed ebuilds only. The lists end up in a ./stall folder undet whatever $DISTFILES is set to in /etc/make.conf.
2nd move: the script will diff all the lists against the $DISTFILES listing and print to stdout the unneeded files (files for which no ebuild reference to it).
You could redirect the output to a file and edit it to remove some special file names you'd like to keep. _________________ (7 of 9) Installing star-trek/species-8.4.7.2::talax. |
|
Back to top |
|
|
Yen Tux's lil' helper
Joined: 19 Oct 2003 Posts: 107 Location: Lummen, Belgium
|
Posted: Sat Feb 28, 2004 2:26 pm Post subject: |
|
|
Wel i have the kernel 2.6.2 and 2.6.3. o if i run the script it will delete 2.6.2 so is there any way to exclude the linux kernel and other packages? |
|
Back to top |
|
|
wolf31o2 Retired Dev
Joined: 31 Jan 2003 Posts: 628 Location: Mountain View, CA
|
Posted: Sat Feb 28, 2004 4:29 pm Post subject: |
|
|
Use TGL's script instead... in fact... we're working to get his script added to portage in gentoolkit.... it is VERY nice since it ONLY removes packages not in portage OR your overlay... _________________ Ex-Gentoo Developer
Catalyst/Genkernel Development Lead
http://wolf31o2.org |
|
Back to top |
|
|
tecknojunky Veteran
Joined: 19 Oct 2002 Posts: 1937 Location: Montréal
|
Posted: Sat Feb 28, 2004 7:43 pm Post subject: |
|
|
Yen wrote: | Wel i have the kernel 2.6.2 and 2.6.3. o if i run the script it will delete 2.6.2 so is there any way to exclude the linux kernel and other packages? | Ah, your, talking about one's of the other multpiple scripts found here, bacuase mine does not do that. In fact it does not remove anything, it merely list tarballs thatare not installed on any systems. So if you have kernel 2.6.2 and 2.63 on the same system, they will not be listed as file to delete (unless you have an old Portage version that do not create the environment.gz file).
wolf31o2 wrote: | Use TGL's script instead... | I tried just about every script found in this thread but they all do something fishy at some point and would delete stuffs that should not. Care to point the THE script that works? _________________ (7 of 9) Installing star-trek/species-8.4.7.2::talax. |
|
Back to top |
|
|
Yen Tux's lil' helper
Joined: 19 Oct 2003 Posts: 107 Location: Lummen, Belgium
|
Posted: Sat Mar 06, 2004 12:27 am Post subject: |
|
|
wolf31o2 wrote: | Use TGL's script instead... in fact... we're working to get his script added to portage in gentoolkit.... it is VERY nice since it ONLY removes packages not in portage OR your overlay... |
Where can I find this script? |
|
Back to top |
|
|
TGL Bodhisattva
Joined: 02 Jun 2002 Posts: 1978 Location: Rennes, France
|
Posted: Sat Mar 06, 2004 12:45 am Post subject: |
|
|
@Yen: the url was hidden a few posts above.
Here it is again: bug #33877 |
|
Back to top |
|
|
calhoun Tux's lil' helper
Joined: 14 Nov 2003 Posts: 91 Location: Point Pleasant, NJ
|
Posted: Sat Mar 06, 2004 7:39 pm Post subject: |
|
|
bump |
|
Back to top |
|
|
Yen Tux's lil' helper
Joined: 19 Oct 2003 Posts: 107 Location: Lummen, Belgium
|
Posted: Sun Mar 07, 2004 7:14 pm Post subject: |
|
|
It will be great to see this script in the next release in the gentoolkit |
|
Back to top |
|
|
eeknay Guru
Joined: 07 Jul 2003 Posts: 402 Location: EndOfTheRainbow
|
Posted: Thu Mar 18, 2004 2:25 pm Post subject: |
|
|
small question, how do i run these scripts? _________________ Linda: "The holiday season is time of celebration for most but it is also the time to remember the tragic suffering of the less fortunate."
Morbo: "Earthlings do not yet know the meaning of suffering." |
|
Back to top |
|
|
Carlo Developer
Joined: 12 Aug 2002 Posts: 3356
|
Posted: Fri Mar 19, 2004 1:27 pm Post subject: |
|
|
eeknay: You have to change the access permissions. man chmod
Carlo _________________ Please make sure that you have searched for an answer to a question after reading all the relevant docs. |
|
Back to top |
|
|
Hydraulix Guru
Joined: 12 Dec 2003 Posts: 447
|
Posted: Sat Mar 20, 2004 11:41 pm Post subject: |
|
|
eeknay wrote: | small question, how do i run these scripts? |
All I did was copy and paste the script and name it something like cleanup and saved it in my home directory. Then I just ran ./cleanup and wala.
If there's a better way let me know but I'm sticking with this since it works. _________________ It is the fate of operating systems to become free.
- Neal Stephenson |
|
Back to top |
|
|
øxygen Apprentice
Joined: 09 Mar 2004 Posts: 236 Location: Bergheim, Germany
|
Posted: Sun Mar 21, 2004 12:03 pm Post subject: |
|
|
well a 2nd common mistake: phpbb add a " " behind every break and when the script ends with a \ it seams to be broken. Just delete the spaces at the end. |
|
Back to top |
|
|
lesc n00b
Joined: 27 Apr 2003 Posts: 26 Location: Edmonton, Alberta
|
Posted: Sun Mar 21, 2004 7:26 pm Post subject: Cleaning Out Stale Distfiles |
|
|
Wow, great script. The script cleaned out 168 files for a total of 770 megs. I checked the accuracy of the info the script provided and the deleted files vs. current files were accurate.
Thanks for sharing the script.
diarmid@shaw.ca _________________ Linux Registered User # 187441 |
|
Back to top |
|
|
|