Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Filesystem cruft script: clean your system, save disk space!
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9, 10  Next  
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Tue Mar 23, 2004 5:09 pm    Post subject: Filesystem cruft script: clean your system, save disk space! Reply with quote

Over time, any system builds up cruft - files and directories that don't belong to any package.
This script tries to list all the cruft on a system with as few as possible false positives to help you keep your system in good working order.
Note that plenty of packages drop extra files all over the place; any extra help is appreciated.

NOTE: This is not the whole script, I got bored updating it in two places so this is just a sample. Follow the below link for the actual script.
Code:
#!/bin/bash
#
# Author: Ed Catmur <ed@catmur.co.uk>
# Licensed under the GNU Public License, version 2.
#
# Copyright © Ed Catmur 2004
# Contains code believed to be copyright © Gentoo Technologies, Inc.

has_version() {
   if /usr/lib/portage/bin/portageq 'has_version' "${ROOT}" "$1"; then
      return 0
   else
      return 1
   fi
}

best_version() {
   if /usr/lib/portage/bin/portageq 'best_version' "${ROOT}" "$1"; then
      return 0
   else
      return 1
   fi
}

#
# name:   python_version
# desc:   run without arguments and it will export the version of python
#         currently in use as $PYVER
#
python_version() {
   local tmpstr
   python=${python:-/usr/bin/python}
   tmpstr="$(${python} -V 2>&1 )"
   export PYVER_ALL="${tmpstr#Python }"

   export PYVER_MAJOR=$(echo ${PYVER_ALL} | cut -d. -f1)
   export PYVER_MINOR=$(echo ${PYVER_ALL} | cut -d. -f2)
   export PYVER_MICRO=$(echo ${PYVER_ALL} | cut -d. -f3-)
   export PYVER="${PYVER_MAJOR}.${PYVER_MINOR}"
}

ROOT="/"

# Files and directory trees to omit, ordered alphabetically.
# If a package drops files or directories in more than one place, move its
# definitions to the appropriate stanza. ldconfig symlinks go in the last
# stanza. Put large lists of single files next to the CONTENTS listing code.
PRUNE="
   /boot
   /dev
   
$([ -h /etc/runlevels/boot/clock ]    && echo "/etc/adjtime")
$([ -d /proc/asound ]          && echo "/etc/asound.state")
$(has_version net-wireless/bluez-utils   && echo "/etc/bluetooth/link_key")
   /etc/config-archive
   $(echo /etc/cron.{hourly,daily,weekly,monthly})
   /etc/csh.env
   /etc/dnsdomainname
   $(echo /etc/env.d/??hostname)
   $(echo /etc/env.d/??locale)
$(has_version dev-java/java-config   && echo "/etc/env.d/20java")
$(has_version sys-devel/prelink      && echo "/etc/env.d/99prelink")
   /etc/gconf/gconf.xml.defaults
   /etc/gentoo-release
   /etc/group   /etc/group-
   /etc/gshadow   /etc/gshadow-
$(has_version x11-libs/gtk+      && echo "/etc/gtk-2.0/gtk.immodules")
   /etc/hostname
   /etc/hosts
   /etc/ioctl.save
   /etc/ld.so.cache
   /etc/ld.so.conf
   /etc/localtime
   /etc/make.conf
   /etc/make.profile
   /etc/modprobe.conf   /etc/modprobe.conf.old
   /etc/modprobe.devfs   /etc/modprobe.devfs.old
   /etc/modules.conf   /etc/modules.conf.old
   /etc/motd
   /etc/mtab
$(has_version x11-libs/pango      && echo "/etc/pango/pango.modules")
   /etc/passwd   /etc/passwd-
   /etc/portage
$(has_version net-dialup/ppp       && echo "/etc/ppp")
$(has_version sys-devel/prelink      && echo "/etc/prelink.cache")
   /etc/profile.env
   /etc/resolv.conf
   /etc/runlevels
...
$(has_version app-admin/syslog-ng   && echo "/var/run/syslog-ng.pid")
   /var/run/utmp
   $(echo /var/spool/cron/crontabs/*)
   $(echo /var/spool/cron/lastrun/cron.{hourly,daily,weekly,monthly})
   /var/tmp/distfiles
   /var/tmp/portage
   /var/tmp/portage-pkg
   /var/tmp/sync
"

# Packages which drop files or directories on more than one place go here,
# listed alphabetically by cp.
has_version app-text/docbook-sgml-dtd \
   && PRUNE="${PRUNE}
   $(cat /var/db/pkg/app-text/docbook-sgml-dtd-*/SLOT | sed 's:^:/etc/sgml/sgml-docbook-:; s:$:.cat:')
   /etc/sgml/sgml.cenv
   /etc/sgml/sgml.env"
...
has_version x11-misc/electricsheep \
   && PRUNE="${PRUNE}
   $(echo /usr/share/electricsheep-{frown,smile,splash-{0,1}}.tif)
   /var/cache/sheep"

# Packages which omit ldconfig symlinks (to test, delete the symlink and see
# if ldconfig recreates it). Specify at least to minor, these are ugly.
has_version '=gnome-extra/vfs-menu-applet-0.1*' \
   && PRUNE="${PRUNE}          /usr/lib/libvfsmenu-applet.0"
has_version '=net-fs/samba-3.0*' \
   && PRUNE="${PRUNE}          /usr/lib/libsmbclient.so.0"
has_version '=media-libs/xvid-1.0*' \
   && PRUNE="${PRUNE}          /usr/lib/libxvidcore.so.4"
has_version '=media-video/nvidia-glx-1.0*' \
   && PRUNE="${PRUNE}    /usr/X11R6/lib/libXvMCNVIDIA_dynamic.so.1"

# awk: filter out pyc and pyo files for which the corresponding .py exists
find / '(' -false $(echo $PRUNE | xargs -n 1 echo -or -path) \
   ')' -prune -or -print \
| sort \
| awk '/\.py$/ { py=$0; } $0 !~ "^"py"[co]$"' \
>/tmp/allfiles

(
   echo "/"
   # sed code stolen from qpkg
   cat /var/db/pkg/*/*/CONTENTS \
   | sed -e "s:\(^obj \)\(.*\)\( .*\)\{2\}$:\2:;
      s:\(^sym \)\(.*\)\( -> \)\(.*\)\( .*\)$:\2:;
      s:\(^dir \)\(.*\)$:\2:"
   # Generate cached man pages
   for manx in /usr/share/man/man*; do
      x=${manx#/usr/share/man/man}
      for manp in $manx/*; do
         p=${manp#$manx/};
         echo "/var/cache/man/cat$x/${p%.gz}.bz2"
      done
   done
   # The gnome-games ebuild doesn't install scores files that already
   # exist on the filesystem (silly!)
   has_version gnome-extra/gnome-games \
      && for game in $(
         cat /var/db/pkg/gnome-extra/gnome-games-*/CONTENTS \
         | grep '^obj /usr/bin/'\
         | sed "s:\(^obj \)/usr/bin/\(.*\)\( .*\)\{2\}$:\2:"
      ); do
         find /var/lib/games/${game}.*.scores /var/lib/games/${game}.scores 2>/dev/null
      done
) \
| sort \
| uniq \
>/tmp/portagefiles

comm -2 -3 /tmp/allfiles /tmp/portagefiles

Also download it here.


Last edited by ecatmur on Sun Nov 27, 2005 10:47 am; edited 6 times in total
Back to top
View user's profile Send private message
snakattak3
Guru
Guru


Joined: 11 Dec 2002
Posts: 468
Location: Seattle

PostPosted: Tue Mar 23, 2004 10:04 pm    Post subject: Reply with quote

Seems pretty straightforeward. All you have to do is add files or directories to the ignore section for some of your own scripts or whatever. Is there a way to actually remove those files as well with this script? Maybe pass a --real flag or something to do the damage?
_________________
Ban Reality TV!
Adopt an Unanswered Post
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Tue Mar 23, 2004 10:45 pm    Post subject: Reply with quote

Well, I want to keep it generic so it can be used on any Gentoo system.
Most 'extra' things on my filesystem are in /home, or in /usr/local, or in /srv, all of which are ignored by the script.

Currently this is what is output:
Code:
/etc/bash_completion.d/glade-2
/etc/devfs.d/cups.conf
/etc/devfs.d/ide-scsi.conf
/etc/devfs.d/nvidia.conf
/etc/init.d/bluetooth.palm
/etc/modules.d/bluez
/etc/profile.d/xprint.sh
/usr/share/control-center-2.0/capplets/gstreamer-properties.desktop
/usr/share/pixmaps/pptout-small.png
/usr/share/pixmaps/skencil-logo-small.png
/usr/share/pixmaps/tiny-eyeicon.png

It is actually quite useful - I just added parsing of /var and it picked up a load of cached man pages I no longer have the originals of, which is nice. I think since beginning writing it I've saved maybe 200MB of hard disk space, which isn't bad :D
Back to top
View user's profile Send private message
El_Presidente_Pufferfish
Veteran
Veteran


Joined: 11 Jul 2002
Posts: 1179
Location: Seattle

PostPosted: Wed Mar 24, 2004 1:19 am    Post subject: Reply with quote

I have TONS of stuff outputted, mainly from perl and python-2.2, how can i tell which to delete and absolutely arent false positives?
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Wed Mar 24, 2004 4:44 am    Post subject: Reply with quote

Basically, you can't - except by deleting it and seeing what breaks.
Generally files which are left over from upgrades will be safe to delete.
This is of course a work in progress as it has only been tested on a few systems, which is why I would appreciate help to eliminate false positives.

If you have genlop installed, you can use my strategy of looking at the mtime on offending files and then running genlop -l | grep "Jan 26 10" (for instance) to see if it landed just before a particular ebuild finished merging; if that ebuild is the latest version of a package on your system then my script needs a new entry, otherwise it's a holdover from an old version and is probably safe to delete.
Back to top
View user's profile Send private message
sminons
n00b
n00b


Joined: 28 Dec 2003
Posts: 49

PostPosted: Wed Mar 24, 2004 4:00 pm    Post subject: Reply with quote

Doesn't "emerge --deplclean" clean away those dependency files which are not needed any more, thereby keeping the system from accumulating redundant files ?
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Wed Mar 24, 2004 6:19 pm    Post subject: Reply with quote

No, depclean only removes redundant packages. This script removes files that don't belong to any package.
Back to top
View user's profile Send private message
wishkah
Guru
Guru


Joined: 09 May 2003
Posts: 441
Location: de

PostPosted: Wed Mar 24, 2004 6:51 pm    Post subject: Reply with quote

I think this is something that really should be implemented within emerge (if it isn't already, I didn't check). Simply go through ALL not installed packages and remove all existing files that are linked to them. Feature-request??
_________________
if only I could fill my heart with love...
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Thu Mar 25, 2004 7:31 pm    Post subject: Reply with quote

littleendian wrote:
Simply go through ALL not installed packages and remove all existing files that are linked to them.
Well, *that* wouldn't work - one problem of a source distro is that there's no way to know what a package will install without actually merging it.

But I'm waiting to get this a bit more finely tuned before getting it into portage, which is why I would appreciate people testing it. Remember, it doesn't actually do anything to your filesystem, so it is safe to run and see what it outputs.


Last edited by ecatmur on Sun Mar 28, 2004 3:40 pm; edited 1 time in total
Back to top
View user's profile Send private message
lupine313
n00b
n00b


Joined: 12 Nov 2003
Posts: 35

PostPosted: Fri Mar 26, 2004 2:18 am    Post subject: Reply with quote

after running this, in addition to a whole lotta stuff, it's telling me to delete most all of my .conf files and pretty much all of nessus...i've had this system up for exactly 2 days now...i doubt this is accurate?

~jeff~
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Fri Mar 26, 2004 3:49 am    Post subject: Reply with quote

It depends what you mean by accurate.

If it's listing .conf files, that's a potential problem... can you post or pm me what they are, and which packages they pertain to?
If it's listing part of nessus, then I imagine nessus is being very ill-behaved and dumping files all over your filesystem. I say this because the nessus ebuild lacks a postinst section, implying any extraneous files are created by nessus when it is run.

I could be wrong, though... but I won't know unless you tell me what the files it lists in error are.
Back to top
View user's profile Send private message
aethyr
Veteran
Veteran


Joined: 06 Apr 2003
Posts: 1085
Location: NYC

PostPosted: Sun Mar 28, 2004 3:28 am    Post subject: Reply with quote

Nice script. I've been complaining about cruft for a long time:
http://forums.gentoo.org/viewtopic.php?t=50929

Like I said back then, I hope we get some kind of "officially maintained" cruft tool soon.
Back to top
View user's profile Send private message
sminons
n00b
n00b


Joined: 28 Dec 2003
Posts: 49

PostPosted: Sun Mar 28, 2004 8:35 am    Post subject: Reply with quote

ecatmur wrote:

If it's listing .conf files, that's a potential problem... can you post or pm me what they are, and which packages they pertain to?


I ran the above script, and I got a few config files in the list, which is of concern. They are
1. all config files in webmin
2./etc/lilo.conf ( I am using lilo )
3./etc/nessus/nessusd.conf
4./etc/mplayer.conf
5./etc/proftpd/proftpd.conf

I got all these files, running a filter of config files on the result. I wonder whether some of the other files listed by the script depends on some active package on my system.
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Sun Mar 28, 2004 3:44 pm    Post subject: Reply with quote

OK... sorted lilo, nessus and proftpd. I don't have webmin so I've just put in a rule to ignore the whole of /etc/webmin and /var/webmin - if this can be improved on let me know.

/etc/mplayer.conf should be regarded as belonging to mplayer, AFAICT - is your installation of mplayer up to date?
Back to top
View user's profile Send private message
aethyr
Veteran
Veteran


Joined: 06 Apr 2003
Posts: 1085
Location: NYC

PostPosted: Mon Mar 29, 2004 2:24 am    Post subject: Reply with quote

Quote:
OK... sorted lilo, nessus and proftpd. I don't have webmin so I've just put in a rule to ignore the whole of /etc/webmin and /var/webmin - if this can be improved on let me know.


Is this a bug with the script or a bug with the ebuilds?

Don't most ebuilds install files in /etc/ with knowledge of where they came from? I think rather than make the script exclude these programs, you should use the script to report bugs on the files that don't keep track of what they dump in /etc/ no?
Back to top
View user's profile Send private message
mhodak
Veteran
Veteran


Joined: 15 Nov 2003
Posts: 1206

PostPosted: Mon Mar 29, 2004 3:41 am    Post subject: Reply with quote

Great script, ecatmur. I have just run it and got a rid of quite a lot of stuff leftover by portage, especially in /etc. I still do not understand why /etc is not cleaned after uninstalling package. It can be useful sometimes if you tweak config files, uninstall package and then realize you wan it back, but most of the time it just leaves garbage, especially when upgrading.

Anyway, the false positives: /etc/hosts.allow
/etc/hosts.deny
/etc/kernels
/var/log

The first two files are really imporatnt, they should not be removed. /etc/kernels is directory where genkernel stores kernel configs of kernels it built. I think most people want to keep that.
Your script also lists log files in /var/log directory, I would not consider those files as cruft.
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Mon Mar 29, 2004 4:14 am    Post subject: Reply with quote

Thanks.

I'm trying to treat /var/log on a per-package basis - for instance, /var/log/cups or /var/log/samba are cruft if you have removed cups or samba from your system.
Back to top
View user's profile Send private message
mhodak
Veteran
Veteran


Joined: 15 Nov 2003
Posts: 1206

PostPosted: Mon Mar 29, 2004 5:07 am    Post subject: Reply with quote

ecatmur wrote:
Thanks.

I'm trying to treat /var/log on a per-package basis - for instance, /var/log/cups or /var/log/samba are cruft if you have removed cups or samba from your system.


OK,
it founds these
/var/log/auth.log
/var/log/daemon.log
/var/log/debug
/var/log/emerge_fix-db.log
/var/log/genkernel.log
/var/log/kern.log
/var/log/mail.err
/var/log/mail.info
/var/log/mail.log
/var/log/mail.warn
/var/log/messages.0
/var/log/privoxy/jarfile
/var/log/privoxy/privoxy.log
/var/log/syslog
/var/log/user.log
/var/run/apmd.pid

All of these files are active, except /var/log/mail*, but I think those are due to net-mail/ssmtp installed on my system (it does not run, but it is a requiremnt for at). I know about these files: /var/log/emerge_fix-db.log - log file that is created when running fixpackages (needed for people with many binary packages), /var/log/privoxy/* are due to privoxy bing run. I do not know about other files, but they are being used. I am using sysklogd-1.4.1-r10 log daemon.

Also, most of these files above exist with .0.bz2, .1.bz2, ..., .6.bz2 extensions. These are probably created by /etc/cron.daily/syslog.cron script. Your script should probably take this into account.

The script also lists files in /var/run/, such as /var/run/gpm.pid, these files contains pid of daemons running on my system. Therefore the /var/run/ directory should probably be excluded.
Back to top
View user's profile Send private message
mhodak
Veteran
Veteran


Joined: 15 Nov 2003
Posts: 1206

PostPosted: Mon Mar 29, 2004 9:04 am    Post subject: Reply with quote

Found a couple of more false positives:
/usr/bin/links
/usr/bin/texi2html

Here is what qpkg says
Code:

qpkg -f /usr/bin/links
net-www/links *

qpkg -f /usr/bin/texi2html
app-text/tetex *


Both seem to be symbolic links

Code:

ls -l /usr/bin/links
lrwxrwxrwx    1 root     root            6 Nov  7 15:14 /usr/bin/links -> links2*

ls -l /usr/bin/texi2html
lrwxrwxrwx    1 root     root           15 Feb 22 15:54 /usr/bin/texi2html -> texi2html-1.56k*


Also the script picks up fonts.list in font directories, which certainly is not cruft.
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Mon Mar 29, 2004 2:28 pm    Post subject: Reply with quote

aethyr wrote:
Quote:
OK... sorted lilo, nessus and proftpd. I don't have webmin so I've just put in a rule to ignore the whole of /etc/webmin and /var/webmin - if this can be improved on let me know.


Is this a bug with the script or a bug with the ebuilds?
The script can always be improved. IIRC about Webmin, it does loads of weird stuff so it'd be hard to keep track of what is actually supposed to be in /etc/webmin and /var/webmin. However if someone could write a piece of code (pref. bash, though perl or python is OK too) to list the files that are supposed to be there that'd be useful.
Quote:

Don't most ebuilds install files in /etc/ with knowledge of where they came from? I think rather than make the script exclude these programs, you should use the script to report bugs on the files that don't keep track of what they dump in /etc/ no?
If I did that, I'd have no time to actually maintain the script (or do anything else). Some programs spew files around without good reason, but for quite a lot there's no real alternative; also of course there are config files the admin has to create - these belong to the script but aren't in its contents list.
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Mon Mar 29, 2004 4:26 pm    Post subject: Reply with quote

OK, it can handle sysklogd now, plus weekly log rotation.

Better handling of pidfiles - it guesses pidfiles for started services both by appending .pid to their service name and by grepping the service file for start-stop-daemon. Still, there may be some pidfiles I've missed - but for instance, kdm drops a pidfile there which you don't want to remain there after deinstalling kde, so I'm not excluding the whole of /var/run.

Unfortunately I don't have any fonts.list files on my system so I don't know where they turn up. Could you post where they are found on your system?
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Mon Mar 29, 2004 4:39 pm    Post subject: Reply with quote

Odd about the links and texi2html symlink false positives; my script uses similar code to qpkg so it should have picked those up... ? n/m.
Back to top
View user's profile Send private message
corrs_fan
Tux's lil' helper
Tux's lil' helper


Joined: 16 Sep 2002
Posts: 78
Location: Giffnock, East Renfrewshire

PostPosted: Sat Apr 10, 2004 6:35 pm    Post subject: Reply with quote

you got me thinking here, how does portage deal with emerge unmerge currently?

doesnt it keep a sort of diff file log for things that are created as emerge goes about the file system?

next Q if it does, why the heck doesnt emerge unmerge remove these things then ?!?
_________________
Some say "The glass is half empty",
I usually say "Eh, There was a Glass.."
Back to top
View user's profile Send private message
mhodak
Veteran
Veteran


Joined: 15 Nov 2003
Posts: 1206

PostPosted: Sun Apr 11, 2004 12:54 am    Post subject: Reply with quote

corrs_fan wrote:
you got me thinking here, how does portage deal with emerge unmerge currently?

doesnt it keep a sort of diff file log for things that are created as emerge goes about the file system?

next Q if it does, why the heck doesnt emerge unmerge remove these things then ?!?

The great majority of the files script like this catches are files that were created after the install, i.e. files that are needed during operation of the software package. For example a game you install will create a file containing highscores for all users on a machine. Or any file you change after install, unmerge looks at timestamps and removes only files that timestamps consistent with package install time. I do not like this behavior very much , because if you for example touch a binary, or any file, it will be left behind after unmerge.
Then there is a special case of kerel sources, wher you have to perform compile explicitly and since compile produces new files, unmerging of sources will not remove directory containg sources, this has to be done manually.

SO, I think it is mainly because packaging system can only monitor files belonging to a package when installing it, any files created during runs are files of unknown origin for the packaging system.
Back to top
View user's profile Send private message
manywele
l33t
l33t


Joined: 12 Jul 2003
Posts: 706
Location: Basking in the Zen glow of Jerry

PostPosted: Mon Apr 12, 2004 5:09 am    Post subject: Reply with quote

ecatmur

I like the script. You just helped me clean about half a gig of stuff off my drive. But it's generating a list of over 16,000 files which do not need deleting, especially a lot of stuff in /etc, some stuff in /usr/bin, lots and lots of webmin and usermin stuff and lots of stuff in /var.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9, 10  Next
Page 1 of 10

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum