Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Cleaning out stale distfiles
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3, 4, 5, 6, 7, 8  Next  
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
474
l33t
l33t


Joined: 19 Apr 2002
Posts: 714

PostPosted: Tue May 21, 2002 8:35 pm    Post subject: Cleaning out stale distfiles Reply with quote

This script will clean out any tarballs in your /usr/portage/distfiles directory where a newer one is proven to exist. As the presence of a newer tarball for a given piece of software usually implies that you have emerged a newer version, it also implies that you don't need the tarball for the older version sitting around on your disk simply wasting space! No more ... they will be cleaned up.

Notes: this script runs in a "pretend" mode by default ;-), where files that would be deleted are displayed, but not actually deleted. To clean out the files, pass on the --nopretend parameter and those old source files will be wiped up quicker than a flash! It only operates on the following types of file: .tar.gz, .tar.bz2, .tgz. Of course, I won't be held responsible if it doesn't work as expected!!! But I've tested it and it works just fine for me, and of course it will not delete files unless you explicitly tell it to. If you find it useful, do check back because (as usual) I will edit the post when I make improvements to the script. Code follows:
Code:
#!/usr/bin/perl -w
use strict;

my $lastname = 0;
my $lastversion;
my $lastext;
my @stalefiles;
my @files;

# Insert your exclusions here with trailing '-'
my %maskedfiles = (
   'X420src-' => 1,
   'gcc-' => 1,
   'freetype-' => 1
);

# Root check
if ($< != 0) {
   print "You must be root to run this script.\n";
   exit 0;
}

# Determine sources present on the system
print "Determining available tarballs in /usr/portage/distfiles ...\n";
opendir(DIR, "/usr/portage/distfiles");
@files = sort(readdir(DIR));
closedir(DIR);

# Grab names/versions, checking each time whether current distfile
# has been superceded. Push anonymous array ref containing required
# info into @stalefiles array.
print "Determining stale versions ...\n";
foreach (@files) {
   my $name;
   my $version;
   # Only operate on tarballs   
   if (/(.+?\-)([0-9r\.\-]+)(\.tar\.gz|\.tgz|\.tar.bz2)/s) {
      $name = $1;
      $version = $2;
      next if ($maskedfiles{$name}); # Ignore "masked" files
      if ($lastname && $name eq $lastname) {
         if ($version gt $lastversion) {
            push (@stalefiles, [$name, $version, $lastversion, $lastext]);
         }
      }
      $lastname = $name;
      $lastversion = $version;
      $lastext = $3;
   }
}

if (@stalefiles == 0) {
   print "\nNo stale distfiles have been detected on your system!\n";
   exit 0;
}

if ($ARGV[0] && $ARGV[0] eq '--nopretend') {
   # User requested deletion so here goes ...
   print "*Nopretend* mode, deleting stale files:\n\n";
   foreach (@stalefiles) {
      my ($name, $version, $lastversion, $ext) = @{$_};
      unlink('/usr/portage/distfiles/' . $name . $lastversion . $ext);
      print "Deleted: $name$lastversion in favour of $name$version\n";
   }
}
else
{
   # Safe mode (default)!
   print "\7*Pretend* mode, will only pretend to delete files.\nTo actually delete the files, reinvoke with the --nopretend parameter.\n\n";
   foreach (@stalefiles) {
      my ($name, $version, $lastversion, $ext) = @{$_};
      print "Would delete: $name$lastversion in favour of $name$version\n";
   }
}
1;


Last edited by 474 on Wed May 22, 2002 11:49 am; edited 1 time in total
Back to top
View user's profile Send private message
Jeevz
Bodhisattva
Bodhisattva


Joined: 15 Apr 2002
Posts: 195
Location: Boston, MA

PostPosted: Tue May 21, 2002 10:10 pm    Post subject: Reply with quote

Great script, thanks!
Back to top
View user's profile Send private message
TheWart
Guru
Guru


Joined: 10 May 2002
Posts: 432
Location: Nashville,TN - USA

PostPosted: Tue May 21, 2002 10:18 pm    Post subject: Reply with quote

thanks a lot, will save some time for me!
_________________
Face it, we are all noobs.

On the box it said it was designed for Win XP or better, so why won't it work with Linux?
Back to top
View user's profile Send private message
Guest






PostPosted: Tue May 21, 2002 11:50 pm    Post subject: Reply with quote

Thanks for the nice script :)
Reimplement it in Python and it could become part of emerge. Something like 'emerge --cleandistfiles' would be fine.
Back to top
slik
n00b
n00b


Joined: 18 Apr 2002
Posts: 48
Location: Alberta, Canada

PostPosted: Wed May 22, 2002 7:00 am    Post subject: Reply with quote

Some bugs?

Would delete: AfterStep-1.8.10 in favour of AfterStep-1.8.8
All these are needed for X
Would delete: X420src-1 in favour of X420src-2
Would delete: X420src-2 in favour of X420src-3
freetype is 1.3.1 is needed yet
Would delete: freetype-1.3.1 in favour of freetype-2.0.8
still default compiler
Would delete: gcc-2.95.3 in favour of gcc-3.0.3
maybe other baddies.. If i used this though, my poor dialup would be working overtime
Back to top
View user's profile Send private message
474
l33t
l33t


Joined: 19 Apr 2002
Posts: 714

PostPosted: Wed May 22, 2002 12:00 pm    Post subject: Thanks for the comments Reply with quote

Quote:
Some bugs?

Not really, just things I haven't done yet ;-). The way it checks version numbers is flawed (it needs to break into major/minor numbers and calculate properly), and there are various other things to be done. Basically, it needs more intelligence - I will do it when I get the time, and the script will get a bit larger. I trust you will find the quick hack introduced into the script useful, which allows you to make it ignore certain files - that should tide you over until the next revision.

Quote:
Reimplement it in Python and it could become part of emerge

Thanks, but I think Perl is a much better language than Python - I don't usually touch Python. On the other hand, it's a simple program (for now) so how hard could it be?
Back to top
View user's profile Send private message
474
l33t
l33t


Joined: 19 Apr 2002
Posts: 714

PostPosted: Fri May 24, 2002 8:41 pm    Post subject: Soon to be improved ... Reply with quote

Ah, coming closer to making this work properly now. I knocked up a proper version checking system for my new script which umerges old builds lying around on the system. Just sending this post to let the watchers know that the script here will soon be adapted so it works perfectly (without that nasty masking kludge)!
Back to top
View user's profile Send private message
Vlad
Apprentice
Apprentice


Joined: 09 Apr 2002
Posts: 264
Location: San Diego, California

PostPosted: Mon Jun 10, 2002 9:11 am    Post subject: Reply with quote

Wow - what a great script. You just saved me a couple hundred megs of hard drive space, not to mention how long it would have taken me to manually sort through those files! Thanks a ton!
Back to top
View user's profile Send private message
masseya
Bodhisattva
Bodhisattva


Joined: 17 Apr 2002
Posts: 2602
Location: Baltimore, MD

PostPosted: Wed Jun 19, 2002 5:59 am    Post subject: Reply with quote

This is a really neat script. I'm basically posting here so I'll get an email when you post the version that you're currently working on. I did a cursory search and I have a few distfiles that would be eliminated with your current script that I wouldn't like to get rid of just yet. :)
_________________
if i never try anything, i never learn anything..
if i never take a risk, i stay where i am..
Back to top
View user's profile Send private message
474
l33t
l33t


Joined: 19 Apr 2002
Posts: 714

PostPosted: Wed Jun 19, 2002 4:27 pm    Post subject: Help would help!!! Reply with quote

Thanks for the comments. I would appreciate it if people could send me a list of those packages (by private message, no need to clutter up this area with postings) which should be protected from deletion (some have been discussed here such as the XFree86 collection, freetype and so on). I would like to classify them into two categories:

* Packages which have logically grouped files (i.e. X420src-1, X42src-2, X420src-3). In this case we don't mind if an older version is deleted (e.g. XFree 4.1) but the script needs to treat the collection of files as one.

* Packages which should be left because some packages still depend on them (freetype 1.x, gcc 2.9.5 and so on).

Ideally, I want to make the script SLOTS more aware so that it can automatically determine if a distfile is still relied upon by other builds present on a user's system, and avoid deleting them. That might take a little while though ... if anyone understands the slots system very well then by all means get in touch. I think perhaps a workable way of doing it would be to scan through the files in /var/cache/edb/dep looking for occurrences of a package which *must* be equal to version so and so. If it is present then leave the distfile alone. Then again, that might be flawed and perhaps there is a much better way ...
Back to top
View user's profile Send private message
Vann
Guru
Guru


Joined: 04 Aug 2002
Posts: 357

PostPosted: Tue Aug 20, 2002 6:16 pm    Post subject: Emerge Reply with quote

Wouldn't it be better to just find what packages are unprotected and delete those distfiles? I know emerge -cp package can list multiple versions of package as "protected." It seems the functionality to determine if something is safe to remove is already built into emerge, so why duplicate it?
Back to top
View user's profile Send private message
474
l33t
l33t


Joined: 19 Apr 2002
Posts: 714

PostPosted: Tue Aug 20, 2002 8:08 pm    Post subject: Reply with quote

I think you're missing the point. Firstly, the script is about removing distfiles, not about any form of package management per se. The objective originally was this: to remove distfiles that are superceded by those of later versions (as a result of newer versions of software). Such distfiles are completely useless in that they will never be used again unless, for some reason, you want to specifically build an older version of the package which still has an ebuild available. As it stands, it has absolutely nothing to do with what you have emerged or not (as evidenced by the complete absence of the emerge command in the entire script).

As for the idea posed in my last comment, I'll explain with an example:

1. Let's say you install cool_prog-r1 which has dependencies on aux_prog-1.0 and another_lib-1.5 to build
2. Later on your Portage tree has updated ebuilds for aux_prog-1.5 and another_lib-2.0. You emerge them. In the meantime ther older versions reamain on your system because cool_prog-r1 has *explicit* dependencies (expressed in the ebuild) for the older versions and the newer ones would break this particular package. This is what I understand SLOTS are for, otherwise Portage would simply break all kinds of software otherwise.
3. Because you emerged the newer versions of aux_prog and another_lib the newer distfiles are present, as well as the old ones. Fine.
4. You run my script to clean out older distfiles. It sees that aux_prog-1.0 is older than aux_prog-1.5 and that another_lib-1.5 is older than another_lib-2.0 so it deletes the older versions.
5. You decide you want to recompile cool_prog-r1 with some different optimisations or settings. Now Portage has to go and download the older versions of aux_prog and another_lib for *nothing*!!! It's just a waste of bandwidth.

Do you see what I'm getting at now? The idea is a bandwidth saving measurement. Because we often like to recompile our packages and because many packages are dependent on very specific versions of other packages, the idea is merely to avoid deleting distfiles which would be used if you recompiled a build you already have on your system. IMO, the rationale behind this is sensible: if you've already compiled something, chances are that you may want to do it again in the future (not necessarily because a new version came out). Whereas the script as it stands will just delete any older version of a distfile blindly.
Quote:
It seems the functionality to determine if something is safe to remove

Like I said before, my script doesn't remove packages. It removes the files containing the source code from which they were built. This comment is moot.

So the logic would be:
1) Aha, I see a distfile that is older
2) But wait a moment, the user has a package currently emerged which just happens to have a DEPEND or RDEPEND line referencing a package to which this older distfile is related. The user might want to rebuild it at a later stage and not appreciate the emerge process having to get that distfile again - so let's skip that one.

None of this functionality is in Portage. Portage doesn't have a facility to remove old distfiles, a measure which is designed to prevent tedious manual traversal of the distfiles in the interests of saving disk space, trying to guess which ones are completely redundant and judging by some of the comments I got, it is obviously something useful. I cannot see how emerge -cp is of any use whatsoever in trying to achieve this goal as it doesn't list dependencies. It only lists packages which are safe for removal because newer versions have been emerged. I'm not interested in that, because I am not trying to rewrite Portage here ....

Furthermore, why would you want to delete a distfile just because it isn't emerged on your system at the time? I have plenty of downloaded distfiles which aren't installed on my Gentoo box (I regularly use the -f parameter). That doesn't mean I won't do it tomorrow, or that I won't want to burn these distfiles on a CD to save time at another location. And it wouldn't be very nice if you used a shared distfiles directory from a server (as I do). I have Gnome distfiles in my share now which I don't intend to install on my main box yet, but maybe I want to from another. Having said that, it could be useful in some cases as an optional parameter. The problem is then you have to reliably map distfile names <-> package names (they're not necessarily the same and one package might have more than one distfile related to it - such as X). Those are the sort of problems I was hinting at. If you know how to do this cleanly and efficiently then I would like to know ...
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20048

PostPosted: Thu Oct 17, 2002 5:14 am    Post subject: Reply with quote

Would it take much to alter this script to run a check on /usr/portage/packages/All ?
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
474
l33t
l33t


Joined: 19 Apr 2002
Posts: 714

PostPosted: Thu Oct 17, 2002 9:48 am    Post subject: Good idea Reply with quote

Quote:
Would it take much to alter this script to run a check on /usr/portage/packages/All ?

No, not at all. A nice idea in fact. And I still need to put that better version checking algorithm in the version posted here! Will try to get done today.
Back to top
View user's profile Send private message
474
l33t
l33t


Joined: 19 Apr 2002
Posts: 714

PostPosted: Fri Nov 01, 2002 4:10 pm    Post subject: Reply with quote

kanusplusplus,

a nasty kludge it may be, but changing
Code:
opendir(DIR, "/usr/portage/distfiles");

to
Code:
opendir(DIR, "/usr/portage/packages/All");

and
Code:
unlink('/usr/portage/distfiles/' . $name . $lastversion . $ext);

to
Code:
unlink('/usr/portage/packages/All/' . $name . $lastversion . $ext);


should work. I've started work on my proposed reworkings, and it's growing into something a little more sophisticated than I had anticipated. Hopefully, that'll do for the meantime.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20048

PostPosted: Fri Nov 01, 2002 4:48 pm    Post subject: Reply with quote

Thanks, I'll give it a whirl.
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
dreamer3
Guru
Guru


Joined: 24 Sep 2002
Posts: 553

PostPosted: Fri Nov 15, 2002 10:53 am    Post subject: Reply with quote

kerframil wrote:
The problem is then you have to reliably map distfile names <-> package names (they're not necessarily the same and one package might have more than one distfile related to it - such as X). Those are the sort of problems I was hinting at. If you know how to do this cleanly and efficiently then I would like to know ...

You could easily use /usr/portage/category/package/files/pack-ver.digest to accomplish this. For each version of the package it contains the actual LIST of the files associated with that package. No more guessing which files XFree 4.2 consists of...
Back to top
View user's profile Send private message
474
l33t
l33t


Joined: 19 Apr 2002
Posts: 714

PostPosted: Fri Nov 15, 2002 4:59 pm    Post subject: Reply with quote

Quote:
You could easily use /usr/portage/category/package/files/pack-ver.digest to accomplish this. For each version of the package it contains the actual LIST of the files associated with that package.


Simple, effective, functional = brilliant. Thanks! :-)
Back to top
View user's profile Send private message
progster
Apprentice
Apprentice


Joined: 16 Jul 2002
Posts: 271

PostPosted: Wed Jan 15, 2003 8:06 pm    Post subject: Reply with quote

Very good script! I'm also posting here to get a notice when you post the new version of your script :oops:

~Progster
Back to top
View user's profile Send private message
474
l33t
l33t


Joined: 19 Apr 2002
Posts: 714

PostPosted: Fri Jan 17, 2003 2:26 pm    Post subject: Reply with quote

progster wrote:
Very good script! I'm also posting here to get a notice when you post the new version of your script :oops:

~Progster

Yup, I've been emailed about this too! I started, but I'm afraid it's on hold at the moment, lost the blasted code from before so I had to start again. I will get this done one day (hopefully before the second coming) ... :roll:
Back to top
View user's profile Send private message
snowmoon
n00b
n00b


Joined: 05 Jun 2002
Posts: 64
Location: Albany,NY USA

PostPosted: Fri Jan 17, 2003 3:27 pm    Post subject: Reply with quote

Jumping here on the end, but I had a much simpler idea...

First mkdir /usr/portage/old
Then MV /usr/portage/distfiles /usr/portage/old

Add the following lines to your make.conf

FETCHCOMMAND="curl -o \${DISTDIR}/\${FILE} \${URI}"

GENTOO_MIRRORS="file:/usr/portage/old http://www.ibiblio.org/pub/Linux/distributions/gentoo"

Then do a emerge -f --deep world
emerge -f --deep system

That get 99% of them I'm sure their is even a better way with qpkg, but I havn't learned enough about them to know. You need to use curl since wget does not understand file: localtions.
Back to top
View user's profile Send private message
snowmoon
n00b
n00b


Joined: 05 Jun 2002
Posts: 64
Location: Albany,NY USA

PostPosted: Fri Jan 17, 2003 3:29 pm    Post subject: Reply with quote

Forgot the obvious...

When done, comment out extra stuff in make.conf and rm -rf /usr/portage/old

Cheers
Back to top
View user's profile Send private message
jefftang
n00b
n00b


Joined: 04 Jan 2003
Posts: 30

PostPosted: Sun Jan 19, 2003 7:10 am    Post subject: Here's my take: Reply with quote

I decided to write my own script to do this same thing. It uses qpkg to list all the installed packages then checks each package's digest file to see what files should be there.

Code:

#!/usr/bin/perl
use Getopt::Std;
getopts('p');
$distdir = '/usr/portage/distfiles';

open PACKAGES, "qpkg -v -I -nc |" || die "Can't list packages\n";
while(<PACKAGES>) {
  chomp;
  $package = $_;
  $package =~ m|(.*)/(.*)-(\d.*)|;
  $category = $1;
  $program = $2;
  $version = $3;
  $digest = "/usr/portage/$category/$program/files/digest-$program-$version";
 
  open (DIGEST, "<$digest");
  while(<DIGEST>) {
    chomp;
    ($hashtype, $hash, $file, $size) = split;
    $files{$file}=1;
  }
  close(DIGEST);
}

opendir(DIR,"$distdir")|| die "can't open $distdir";
while ($file = readdir DIR) {
  next unless -f "$distdir/$file";
  if  (! $files{$file}) {
    if ($opt_p) {
      print "Would erase $distdir/$file\n";
    }
    else {
      unlink "$distdir/$file";
    }
  }
}
Back to top
View user's profile Send private message
robm
n00b
n00b


Joined: 20 Jan 2003
Posts: 10
Location: Boston

PostPosted: Mon Jan 20, 2003 2:48 pm    Post subject: mod for PORTAGE_OVERLAY, etc. Reply with quote

Great script! Here's a mod that handles /etc/make.conf's new PORTAGE_OVERLAY and the non-default locations of PORTDIR and DISTDIR.

Code:

#!/usr/bin/perl
use Getopt::Std;

my %makeconf;
$makeconf{"PORTDIR"} = '/usr/portage';
$makeconf{"DISTDIR"} = '/usr/portage/distfiles';

sub get_make_conf {
  my ($var);

  open (MAKECONF, "</etc/make.conf");
  while(<MAKECONF>) {
    if (/^\s*(\w+)=(.+)$/) {
      $makeconf{$1}=$2;
    }
  }
  close(MAKECONF);

  # simple hack to expand the shell variables
  foreach $var (keys %makeconf) {
    my ($sub);
    while ($makeconf{$var}=~/"?\$\{(\w+)\}"?/) {
      $sub=$1;
      $makeconf{$var}=~s/"?\$\{$sub\}"?/$makeconf{$sub}/;
    }
  }
}

getopts('p');

get_make_conf();

print "PORTDIR = ".$makeconf{"PORTDIR"}."\n";
print "DISTDIR = ".$makeconf{"DISTDIR"}."\n";
print "PORTDIR_OVERLAY = ".$makeconf{"PORTDIR_OVERLAY"}."\n";

open(PACKAGES,"qpkg -v -I -nc |") || die "Can't list packages\n";
while(<PACKAGES>) {
  chomp;
  $package = $_;
  $package =~ m|(.*)/(.*)-(\d.*)|;
  $category = $1;
  $program = $2;
  $version = $3;
  $digest = $makeconf{"PORTDIR"}."/$category/$program/files/digest-$program-$version";
  if ($makeconf{"PORTDIR_OVERLAY"} && ! -f $digest) {
    $digest = $makeconf{"PORTDIR_OVERLAY"}."/$category/$program/files/digest-$program-$version";
  }
  if (-f $digest) {
    open (DIGEST, "<$digest");
    while(<DIGEST>) {
      chomp;
      ($hashtype, $hash, $file, $size) = split;
      $files{$file}=1;
    }
    close(DIGEST);
  }
}
if (scalar(keys(%files))==0) {
  die "sanity check: no package files found.  This can't be right.\n";
}

opendir(DIR,$makeconf{"DISTDIR"})|| die "can't open ".$makeconf{"DISTDIR"};
while ($file = readdir DIR) {
  next unless -f $makeconf{"DISTDIR"}."/$file";
  if  (! $files{$file}) {
    if ($opt_p) {
      print "Would erase ".$makeconf{"DISTDIR"}."/$file\n";
    }
    else {
      unlink $makeconf{"DISTDIR"}."/$file";
    }
  }
}
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20048

PostPosted: Thu Jan 30, 2003 9:14 pm    Post subject: Reply with quote

kerframil wrote:
kanusplusplus,

a nasty kludge it may be, but changing
Code:
opendir(DIR, "/usr/portage/distfiles");

to
Code:
opendir(DIR, "/usr/portage/packages/All");

and
Code:
unlink('/usr/portage/distfiles/' . $name . $lastversion . $ext);

to
Code:
unlink('/usr/portage/packages/All/' . $name . $lastversion . $ext);


should work. I've started work on my proposed reworkings, and it's growing into something a little more sophisticated than I had anticipated. Hopefully, that'll do for the meantime.
Not too bad. Posted on Nov 1st, and I just got around to trying this :D.

Another change needed is:
Code:
if (/(.+?\-)([0-9r\.\-]+)(\.tar\.gz|\.tgz|\.tar.bz2)/s) {
to
Code:
if (/(.+?\-)([0-9r\.\-]+)(\.tbz2)/s) {
As far as I know, binary packages are only in .tbz2 format. Also, I added a variable to replace references to the actual directory:
Code:
my $targetdir = "/usr/portage/packages/All/";

print "Determining available tarballs in $targetdir ...\n";

opendir(DIR, $targetdir);

unlink($targetdir . $name . $lastversion . $ext);

_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Goto page 1, 2, 3, 4, 5, 6, 7, 8  Next
Page 1 of 8

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum