Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Find space wasters (list packages by size)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
numerodix
l33t
l33t


Joined: 18 Jul 2002
Posts: 743
Location: nl.eu

PostPosted: Sat Jun 24, 2006 2:30 pm    Post subject: Find space wasters (list packages by size) Reply with quote

I don't know if this has been done before, I haven't seen any threads about it. I was thinking it might be useful to know how much space each package occupies, just to get an idea of how big they are.

My script uses the CONTENTS file for every installed ebuild to check the size of the files which belong to a package and give a sorted listing of packages by size.

Code:

#!/bin/env python
#
# Author: Martin Matusiak <numerodix@gmail.com>
# Licensed under the GNU Public License, version 2.

pkgdb = "/var/db/pkg"


import os, string, stat
from operator import itemgetter

sizes = {}

cats = os.listdir(pkgdb)
for c in cats:
   cat = os.listdir(os.path.join(pkgdb, c))
   for p in cat:
      size = 0
      
      cont = os.path.join(pkgdb, c, p, "CONTENTS")
      fd = open(cont, 'r')
      
      strings = fd.readlines()
      for s in strings:
         line = string.split(s, " ")
         if line[0] == "obj" and os.path.exists(line[1]):
            size += os.path.getsize(line[1])
      
      fd.close()
      
      sizes[os.path.join(c, p)] = size

pkglist = sorted(sizes.items(), key=itemgetter(1))

for i in pkglist:
   print string.rjust(str(i[1]), 11), " ", i[0]


The output looks like this:

Code:
          0   virtual/perl-libnet-1.19
          0   virtual/perl-Storable-2.15
          0   virtual/libstdc++-3.3
          0   virtual/ghostscript-0
          0   dev-java/ant-1.6.2-r6
          0   virtual/perl-Test-Simple-0.62
          0   virtual/perl-Digest-MD5-2.36
          0   virtual/libintl-0
          0   virtual/perl-MIME-Base64-3.07
          0   virtual/libiconv-0
         86   kde-base/kde-env-3-r4
        393   kde-base/kdebase-pam-6
        889   sys-apps/coldplug-20040920-r1
       1160   sys-apps/hotplug-base-20040401
...
  215529358   dev-lang/ghc-6.4.1-r3
  219531427   sys-kernel/gentoo-sources-2.6.16-r9
  313704725   app-office/openoffice-bin-2.0.2


Hope I'm not the only one who thinks it's useful :)
_________________
undvd - ripping dvds should be as simple as unzip
Back to top
View user's profile Send private message
toralf
Developer
Developer


Joined: 01 Feb 2004
Posts: 3922
Location: Hamburg

PostPosted: Sat Jun 24, 2006 2:44 pm    Post subject: Reply with quote

What's about
Code:
equery --nocolor --quiet list | cut -f4 -d' ' | xargs -n 1 -i{} equery --nocolor size ={} | sort -u
?
Back to top
View user's profile Send private message
numerodix
l33t
l33t


Joined: 18 Jul 2002
Posts: 743
Location: nl.eu

PostPosted: Sat Jun 24, 2006 3:46 pm    Post subject: Reply with quote

That looks good yeah, I didn't know equery had a function for package size.

But my method seems to be somewhat quicker. 2m2.168s on my system with 600+ packages. Your one liner took 25m42.282s. It also doesn't sort the packages by size, it sorts them alphabetically:

Code:
app-admin/eselect-1.0: total(56), inaccessible(0), size(244824)
app-admin/eselect-opengl-1.0.3: total(11), inaccessible(0), size(477527)
app-admin/gamin-0.1.7: total(35), inaccessible(0), size(351271)
app-admin/gnomesu-0.3.1: total(25), inaccessible(0), size(249078)


And on top of that the output for size doesn't match,

Code:
./pkgsize.py | grep eselect
     101245   app-admin/eselect-1.0


I can't be bothered to add up all the files for eselect, but with dev-java/systray4j-2.4 for instance, if I add up the files manually I get 115657, whereas equery says it's 220.95 KiB.

So I wouldn't call them equivalent. :?
_________________
undvd - ripping dvds should be as simple as unzip
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Sat Jun 24, 2006 6:13 pm    Post subject: Reply with quote

Ahem. Size of all installed packages.

- John


Last edited by John R. Graham on Sat Jun 24, 2006 9:15 pm; edited 1 time in total
Back to top
View user's profile Send private message
numerodix
l33t
l33t


Joined: 18 Jul 2002
Posts: 743
Location: nl.eu

PostPosted: Sat Jun 24, 2006 6:24 pm    Post subject: Reply with quote

john_r_graham wrote:
Ahem. Size of all installed packages.

- John

A thread about UT2003? :?:
_________________
undvd - ripping dvds should be as simple as unzip
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Sat Jun 24, 2006 9:17 pm    Post subject: Reply with quote

Drat. Used [Post=] instead of [Topic=]. Fixed my prior post.

- John
Back to top
View user's profile Send private message
numerodix
l33t
l33t


Joined: 18 Jul 2002
Posts: 743
Location: nl.eu

PostPosted: Sat Jun 24, 2006 10:21 pm    Post subject: Reply with quote

Right, I imagine that's what toralf had in mind. But once again, using equery takes quite a bit longer, 23m13.720s here, and the numbers aren't the same either.

equery:
Code:
     13177 sys-apps/coldplug-20040920-r1
     12681 kde-base/kdebase-pam-6
     12374 kde-base/kde-env-3-r4


my script:
Code:
        889   sys-apps/coldplug-20040920-r1
        393   kde-base/kdebase-pam-6
         86   kde-base/kde-env-3-r4

_________________
undvd - ripping dvds should be as simple as unzip
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Sun Jun 25, 2006 9:04 pm    Post subject: Reply with quote

On my other post on this topic, nixnut let me know about a new tool:
nixnut wrote:
qsize (part of portage-utils) will tell you the size of the installed packages: qsize -aSm

It's written in C and is fast! When I tried it, it ran about 40 times faster than my script.

- John
Back to top
View user's profile Send private message
numerodix
l33t
l33t


Joined: 18 Jul 2002
Posts: 743
Location: nl.eu

PostPosted: Sun Jun 25, 2006 11:42 pm    Post subject: Reply with quote

Yep, that one is fast indeed. I didn't bother with awk cause I don't think it takes much time at all, so the output isn't exactly what it should be, but running them both:

Code:
time ( qlist -I --nocolor --quiet | xargs qsize -b | sort )
app-admin/eselect-1.0: 45 files, 11 non-files, 101464 bytes
app-admin/eselect-opengl-1.0.3: 3 files, 8 non-files, 375127 bytes
app-admin/gamin-0.1.7: 25 files, 10 non-files, 199567 bytes
..
real    0m4.285s
user    0m3.072s
sys     0m0.352s


Whereas my script seems to edge it anyhow :)

Code:
time pkgsize.py
..
   54710595   sys-devel/gcc-3.4.6-r1
  117815381   x11-base/xorg-x11-6.8.2-r8
  219531427   sys-kernel/gentoo-sources-2.6.16-r9

real    0m2.132s
user    0m1.324s
sys     0m0.564s


But this was never supposed to be a contest anyway, so while using portage-utils with some awk magic gets the job done quickly, I'll just as well use mine now that I have it.

And the discrepancies in file sizes apparently comes down to a rounding error (?), because if I tell qsize to give me the numbers in bytes, they match with what my script prints.
_________________
undvd - ripping dvds should be as simple as unzip
Back to top
View user's profile Send private message
dundas
Guru
Guru


Joined: 16 Dec 2004
Posts: 317
Location: China, Earth

PostPosted: Mon Jun 26, 2006 2:45 am    Post subject: Reply with quote

Code:
qlist -I --nocolor --quiet | xargs qsize -b | sort


still puts things in alphabetical order, any luck to show them by exact size?

I don't know about awk yet......sorry.........
_________________
Appreciate Gentoo: Best Devs, Best Forums. YOU could help too: Help Answer
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Mon Jun 26, 2006 3:16 am    Post subject: Reply with quote

Why, sure:
Code:
qsize -ab | awk '{printf "%10u %s\n", $6, substr($1, 1, length($1)-1)}' | sort -k1,1nb
or even
Code:
qsize -ab | perl -ne '/^(.*):.*, (\d*) bytes/; printf "%10u %s\n", $2, $1;' | sort -k1,1nb

- John

(Note: Command lines edited from their original to avoid a subtle locale issue that affects some users. The issue was discussed in following posts. - JRG)


Last edited by John R. Graham on Tue Jul 04, 2006 9:41 am; edited 1 time in total
Back to top
View user's profile Send private message
dundas
Guru
Guru


Joined: 16 Dec 2004
Posts: 317
Location: China, Earth

PostPosted: Mon Jun 26, 2006 4:27 am    Post subject: Reply with quote

thx, but I still got something like the following, not in the order of the size.


Code:
# qsize -ab | awk '{printf "%10u %s\n", $6, substr($1, 1, length
($1)-1)}' | sort -k1
         0 virtual/ghostscript-0
         0 virtual/libiconv-0
         0 virtual/libintl-0
         0 virtual/libstdc++-3.3
         0 virtual/perl-MIME-Base64-3.07
         0 virtual/perl-Storable-2.15
         0 virtual/perl-Test-Simple-0.62
   1000937 app-shells/bash-3.1_p16
   1004574 media-libs/fontconfig-2.2.3
   1004624 media-libs/gd-2.0.33
     10094 app-text/build-docbook-catalog-1.2
  10102092 media-libs/alsa-lib-1.0.11
  10125525 dev-libs/libxml2-2.6.23
   1015554 dev-libs/cyrus-sasl-2.1.21-r2
    101958 x11-libs/startup-notification-0.8
  10232482 media-plugins/live-2005.11.11
   1034341 sys-fs/reiserfsprogs-3.6.19
    104807 sys-libs/libcap-1.10-r5
     10616 x11-themes/hicolor-icon-theme-0.8
    106226 media-libs/libdvdcss-1.2.9
   1072694 sys-libs/timezone-data-2006a
   1084309 net-misc/curl-7.15.1-r1
  10885519 dev-python/pygtk-2.8.2
   1091556 sys-fs/e2fsprogs-1.38-r1
   1093550 dev-libs/libpcre-6.3
    110780 x11-apps/ttmkfdir-3.0.9-r3






Code:
# qsize -ab | perl -ne '/^(.*):.*, (\d*) bytes/; printf "%10u %s
\n", $2, $1;' | sort -k1
         0 virtual/ghostscript-0
         0 virtual/libiconv-0
         0 virtual/libintl-0
         0 virtual/libstdc++-3.3
         0 virtual/perl-MIME-Base64-3.07
         0 virtual/perl-Storable-2.15
         0 virtual/perl-Test-Simple-0.62
   1000937 app-shells/bash-3.1_p16
   1004574 media-libs/fontconfig-2.2.3
   1004624 media-libs/gd-2.0.33
     10094 app-text/build-docbook-catalog-1.2
  10102092 media-libs/alsa-lib-1.0.11
  10125525 dev-libs/libxml2-2.6.23
   1015554 dev-libs/cyrus-sasl-2.1.21-r2
    101958 x11-libs/startup-notification-0.8
  10232482 media-plugins/live-2005.11.11
   1034341 sys-fs/reiserfsprogs-3.6.19
    104807 sys-libs/libcap-1.10-r5
     10616 x11-themes/hicolor-icon-theme-0.8
    106226 media-libs/libdvdcss-1.2.9
   1072694 sys-libs/timezone-data-2006a
   1084309 net-misc/curl-7.15.1-r1
  10885519 dev-python/pygtk-2.8.2
   1091556 sys-fs/e2fsprogs-1.38-r1
   1093550 dev-libs/libpcre-6.3
    110780 x11-apps/ttmkfdir-3.0.9-r3

_________________
Appreciate Gentoo: Best Devs, Best Forums. YOU could help too: Help Answer
Back to top
View user's profile Send private message
numerodix
l33t
l33t


Joined: 18 Jul 2002
Posts: 743
Location: nl.eu

PostPosted: Mon Jun 26, 2006 10:37 am    Post subject: Reply with quote

dundas: change the flag on sort to numerical sorting, ie:

Code:
qsize -ab | perl -ne '/^(.*):.*, (\d*) bytes/; printf "%10u %s\n", $2, $1;' | sort -n

_________________
undvd - ripping dvds should be as simple as unzip
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Mon Jun 26, 2006 12:37 pm    Post subject: Reply with quote

dundas wrote:
thx, but I still got something like the following, not in the order of the size.
...

Hmmm. Works here as originally written. After a little research, I found that this is a subtle locale issue. I found the following in the sort info page:
Quote:
If you use a non-POSIX locale (e.g., by setting `LC_ALL' to `en_US'), then `sort' may produce output that is sorted differently than you're accustomed to. In that case, set the `LC_ALL' environment variable to `C'.
I was able to reproduce your issue by setting my LC_ALL environment variable to "en_US". (It is normally set to "C" in my installation). Both solutions appear to work (i.e., adding the -n option or prefixing the whole thing with LC_ALL=C). Obviously, I still have a lot to learn about locales.

With numeric sorting, the key should really be limited to the numeric portion to prevent really strange results sometimes:
Code:
sort -n -b -k1,1
or
Code:
sort -k1,1nb
Without the extra ",1", the key field extends to the end of the line. The documentation also states that there are failures in obscure cases in numeric sorts without the -b option, so I included it.

- John
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum