Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Portage & Programming
  • Search

mass-deletion of distfiles/ -- what now?

Problems with emerge or ebuilds? Have a basic programming question about C, PHP, Perl, BASH or something else?
Post Reply
Advanced search
13 posts • Page 1 of 1
Author
Message
mounty1
l33t
l33t
User avatar
Posts: 955
Joined: Thu Jul 06, 2006 3:17 pm
Location: Queensland

mass-deletion of distfiles/ -- what now?

  • Quote

Post by mounty1 » Mon Mar 04, 2024 9:54 am

I wanted to reduce the size of my system as the backup was taking longer each week so:

Code: Select all

# find distfiles/ -mtime +354 | xargs rm
Now in the old days of portage by rsync, that was OK because distfiles/ was just all the packages you had ever downloaded but now the command echoes stuff like

Code: Select all

rm: distfiles/git3-src/proj_portage.git/objects/ae is a directory
rm: distfiles/git3-src/proj_portage.git/objects/63 is a directory
so:
  • What have I done? (trashed a git repo. most likely) and
  • What should I do about it?
Michael Mounteney
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56085
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

  • Quote

Post by NeddySeagoon » Mon Mar 04, 2024 10:10 am

mounty1,

The easy one first
What should I do about it?
Do nothing. The upstream git repo will be alive and well.

Some ebuilds fetch from git. Either by release or tag or even by commit ID. The code will be fetched again if required.

You have removed your copy of the objects fetched from git. That's just like removing a tarball.
distfiles/ is still all the packages you had ever downloaded but git is used as well are tarballs now.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
Zucca
Administrator
Administrator
User avatar
Posts: 4692
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

Re: mass-deletion of distfiles/ -- what now?

  • Quote

Post by Zucca » Mon Mar 04, 2024 10:27 am

Instead of

Code: Select all

find distfiles/ -mtime +354 | xargs rm
... I'd use eclean-dist. Pass --help first to see the options or read the man page of eclean.
However I'm not certain if it cleans subdirectories like git3-src for example.

EDIT:

Code: Select all

# @ECLASS_VARIABLE: EGIT3_STORE_DIR
# @USER_VARIABLE
# @DEFAULT_UNSET
# @DESCRIPTION:
# Storage directory for git sources.
#
# This is intended to be set by user in make.conf. Ebuilds must not set
# it.
#
# EGIT3_STORE_DIR=${DISTDIR}/git3-src
... so you could place sources fetched by git into an another directory.
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
mounty1
l33t
l33t
User avatar
Posts: 955
Joined: Thu Jul 06, 2006 3:17 pm
Location: Queensland

Re: mass-deletion of distfiles/ -- what now?

  • Quote

Post by mounty1 » Mon Mar 04, 2024 10:35 am

Zucca wrote:Instead of

Code: Select all

find distfiles/ -mtime +354 | xargs rm
... I'd use eclean-dist. Pass --help first to see the options or read the man page of eclean.
However I'm not certain if it cleans subdirectories like git3-src for example.
I will do in future but tools such as eclean-dist are imperfect in my situation since my local repository is shared (NFS-mounted) amongst several machines, and none is the source-of-truth of what packages are required, obsolete etc.
Michael Mounteney
Top
wjb
l33t
l33t
User avatar
Posts: 681
Joined: Sun Jul 10, 2005 9:40 am
Location: Fife, Scotland

  • Quote

Post by wjb » Mon Mar 04, 2024 11:38 am

I wrote a python script that cleans up a shared nfs distfiles directory. Runs on each pc to look at the environment.bz2 files under /var/db/pkg and figure out what's being used by each pc. Then one more run to compare whats in distfiles with whats actually needed; this writes a bash script to delete the unwanted stuff.

It doesn't care about the file dates, only that they're for an installed package.

About 200 lines, needs some edits to make it less specific to my setup - say if interested.
Top
Zucca
Administrator
Administrator
User avatar
Posts: 4692
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

Re: mass-deletion of distfiles/ -- what now?

  • Quote

Post by Zucca » Mon Mar 04, 2024 1:03 pm

mounty1 wrote:tools such as eclean-dist are imperfect in my situation since my local repository is shared (NFS-mounted) amongst several machines, and none is the source-of-truth of what packages are required, obsolete etc.
Closest to your find -command equivalent can be achieved by passing -t or --time-limit=<time> for eclean-dist.
Although carefully crafting a find -command can achieve almost same results.
For example:

Code: Select all

find "$(portageq envvar DISTDIR)" -maxdepth 1 -mindepth 1 -type f -atime +180 -delete
... would delete distfiles which haven't been accessed in about half a year.
That would take NFS accessed files into account. Unless your filesystem doesn't support access times or you have mounted the filesystem with noatime. This, however, can't take installed packages into account. So, not perfect either. Also access time can be updated by others than just portage doing an install.

One other way, you could create a list of distfiles each client thinks it doesn't need by running:

Code: Select all

eclean-dist --quiet --pretend | sed 's|^/.*/||'
... but that "will escalate very quickly" into a complex coding project. You'd need to make comparisons and determine that if each client thinks the file is now useless, then remove it.

It's a simple problem for one client, but if multiple are accessing then same $DISTDIR via network it gets complicated really quick.

@wjb: What's your implementation, roughly?
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
Hu
Administrator
Administrator
Posts: 24389
Joined: Tue Mar 06, 2007 5:38 am

Re: mass-deletion of distfiles/ -- what now?

  • Quote

Post by Hu » Mon Mar 04, 2024 3:49 pm

mounty1 wrote:I wanted to reduce the size of my system as the backup was taking longer each week so
If backup times are the only concern, you could just tell the backup tool not to archive this directory.

As regards the problem of deleting distfiles only when they are unnecessary, one approach I once experimented with was:
  • Create a shared directory via NFS, which all systems can see.
  • Point each system's DISTDIR to a unique subdirectory, say NFSMNT/$hostname.
  • Override FETCHCOMMAND so that, before downloading from the Internet, it would try to find the requisite file in a peer's subdirectory. If found, create a hard link to it. Otherwise, download.
This way, any given client can run eclean-dist whenever no one else is fetching. A file that no one needs will drop to a link count of 0, and be deleted. A file that this system does not need, but that others do, will be deleted from this system's $hostname directory, but the link count will not go to 0, so the file is not deleted.

This does not play nicely with git-r3, and further work could be needed there.
Top
wjb
l33t
l33t
User avatar
Posts: 681
Joined: Sun Jul 10, 2005 9:40 am
Location: Fife, Scotland

  • Quote

Post by wjb » Mon Mar 04, 2024 4:31 pm

@zucca

Code: Select all

# On each client
# Look through all the environment.bz2 files under /var/db/pkg/*/*
#  - Find line: declare -x A="<space delimited list>"
#  - Add all the files in this list to a set of in-use files
# Save the in-use set under distfiles, in a file named for the host
#
# Then, on one PC 
# - Load all the client file lists into a set, REQUIRED
# - Glob all the filenames in distfiles into a set, PRESENT
# - Find filenames in PRESENT but not in REQUIRED, into a list UNWANTED
# - Delete all the UNWANTED files.
Actually generates a bash script to do the deletes, to allow for manual review and tweaking.

Edit: github wwjjbb/wjbtools
Last edited by wjb on Mon Mar 04, 2024 10:19 pm, edited 1 time in total.
Top
Zucca
Administrator
Administrator
User avatar
Posts: 4692
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

Re: mass-deletion of distfiles/ -- what now?

  • Quote

Post by Zucca » Mon Mar 04, 2024 5:23 pm

Hu wrote:This way, any given client can run eclean-dist whenever no one else is fetching. A file that no one needs will drop to a link count of 0, and be deleted. A file that this system does not need, but that others do, will be deleted from this system's $hostname directory, but the link count will not go to 0, so the file is not deleted.
This is so elegant and simple solution.
Hu wrote:This does not play nicely with git-r3, and further work could be needed there.
Maybe easiest is to move git3-src -directory out of $DISTDIR. But there may be other directories too.
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
mounty1
l33t
l33t
User avatar
Posts: 955
Joined: Thu Jul 06, 2006 3:17 pm
Location: Queensland

  • Quote

Post by mounty1 » Tue Mar 05, 2024 8:05 am

NeddySeagoon wrote:Do nothing. The upstream git repo will be alive and well.
Thanks Neddy.
Michael Mounteney
Top
mounty1
l33t
l33t
User avatar
Posts: 955
Joined: Thu Jul 06, 2006 3:17 pm
Location: Queensland

  • Quote

Post by mounty1 » Tue Mar 05, 2024 8:09 am

Thanks everyone for the ideas. My shared-portage system extends to NFS-mounted /etc/portage and /var/db/pkg on all machines, so I could write my own script, like wjb, to perform the clean-up. I just didn't know, until yesterday, that I ought to do so.
Michael Mounteney
Top
Zucca
Administrator
Administrator
User avatar
Posts: 4692
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

  • Quote

Post by Zucca » Tue Mar 05, 2024 9:05 am

man eclean wrote:

Code: Select all

       By default, eclean will protect all distfiles or
       binary packages corresponding  to  some  ebuilds
       available  in  the  Portage  tree.   This is the
       safest mode, since it will protect whatever  may
       still  be  useful,  for  instance to downgrade a
       package without downloading its sources for  the
       second  time,  or to reinstall a package you un‐
       merge by mistake without recompiling it.   Sure,
       it’s  also  a  mode  in  which  your DISTDIR and
       PKGDIR will stay rather big (although still  not
       growing  infinitely).  For the ’distfiles’, this
       mode is also quite slow because it requires some
       access to the whole Portage tree.

       If you use the --deep option, eclean  will  only
       protect  files  corresponding  to some currently
       installed package (taking  their  exact  version
       into  account).   It  will save much more space,
       while still preserving sources files around  for
       minor revision bumps, and binaries for reinstal‐
       lation of corrupted packages.  But it won’t keep
       files for less usual operations like downgrading
       or  reinstalling  an  unmerged package.
So maybe running eclean without --deep might also get you close to what you want? In short: without --deep eclean only removes distfiles that aren't used by any ebuild on any active repositories (overlays) you have.
Adding --time-limit would further avoid deleting too new distfiles.
Problem with this approach is, if you have enabled different repos on your machines using the same NFS shared distdir, some distfiles might still get deleted which would still be available from some repo that's active on some other machine than the one running eclean.

For a bullet-proof method, you will need to query all the other machines.
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
wjb
l33t
l33t
User avatar
Posts: 681
Joined: Sun Jul 10, 2005 9:40 am
Location: Fife, Scotland

  • Quote

Post by wjb » Tue Mar 05, 2024 11:38 am

mounty1 wrote:I wanted to reduce the size of my system as the backup was taking longer each week
Oh. So, that can actually be sorted by using a deduplicating backup tool, like app-backup/borgbackup - where a chunk of data is already in the backup repository, it's simply referenced rather than adding a new copy of the data. On average this is way quicker than a trad backup tool, every backup archive is effectively assembled as if incremental but behaves later as a full backup.

There's also app-backup/borgmatic which is a simplified layer on top of borg.

I don't actually bother backing up distfiles as I figure it's easy enough for portage to retrieve them from the mirrors. Current ones anyway. All I'm trying to do with distfiles is purge when it's partition fills up. Anyway, I just put my script here on github.
Top
Post Reply

13 posts • Page 1 of 1

Return to “Portage & Programming”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic