Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
TIP: Compressing portage tree using squashfs
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3, 4  Next  
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
adsmith
Veteran
Veteran


Joined: 26 Sep 2004
Posts: 1386
Location: NC, USA

PostPosted: Sun Nov 13, 2005 12:35 am    Post subject: TIP: Compressing portage tree using squashfs Reply with quote

Here is a quick overview of how to compress the portage tree using squashfs. I won't cover every little detail, as I presume your are comfortable configuring/compiling your kernel, using emerge, and dealing with simple bash scripts.

Why?
Wow, the portage tree is huge! It takes around 500MB, which is a huge chunk of the storage space on many systems. Now (unless you're crazy), you only sync your portage tree once a week or so. In the mean time, it is essentially a read-only filesystem.

Thus, we get the idea to put it in a compressed file system. squashfs (http://squashfs.sf.net) gives us this option. By following this howto, you'll have a compressed portage tree which only takes 25MB, and is also quite fast.

How?

Getting SquashFS

KERNEL
You need a kernel with supports squashfs and loop-back file systems, either compiled-in or as modules.
kernel config:
Code:

Device Drivers -> Block Devices -> Loopback device support <M>
File systems  ->  Miscellaneous Filesystems  -> SquashFS <M>

NOTE: Squashfs is not yet in the vanilla-sources kernel, but the patch is very easy to do yourself. Otherwise, gentoo-sources contains this patch already, so you may use that.
Of course, recompile your kernel, fix grub, and reboot into it.

TOOLS
Next, you need to emerge the squashfs tools:
Code:

emerge squashfs-tools


PORTAGE CONFIG
Since squashfs is a read-only filesystem, we need to put distfiles somewhere else. I chose /var/tmp/distfiles:
/etc/make.conf
Code:

DISTDIR="/var/tmp/distfiles"



MAKING THE FILESYSTEM
Suppose you have a current live portage tree at /usr/portage, and that you have already moved distfiles to $DISTDIR listed above. The basic command to squash portage is this:
Code:

rm /usr/portage.sqsh
mksquashfs /usr/portage /usr/portage.sqsh -check_data


After running this command, you'll have a squashed copy of your portage tree in the file /usr/portage.sqsh.
Now, what do we do with it?
Well, we want to mount this squashed file to /usr/portage.

To prepare for the next step, you need to move the live portage tree out of the way. I suggest doing this for now, so you have a backup in case you decide to abort this procedure.
Code:

tar cvzf /usr/portage-backup.tar.gz /usr/portage
rm -rf /usr/portage/*



SYSTEM CONFIG
If you compiled loop and squashfs as modules, you need to load them at boot time.
/etc/modules.autoload.d/kernel-2.6
Code:

loop
squashfs

For now, you can just modprobe them.

You also need to edit /etc/fstab so that the new filesystem is mounted properly.
/etc/fstab
Code:

/usr/portage.sqsh    /usr/portage     squashfs     ro,loop     0 0



Finally, to mount your shiny new compressed portage tree, do
Code:

mount /usr/portage


That's the basic idea.

Updates??
Ah, but the portage tree doesn't sit still forever! At some point, you have to update it! How do we do this?
Obviosuly, since the /usr/portage filesystem is read-only, a simple "emerge --sync" won't work.

There are various ways to do this.

UPDATING WITH EMERGE
Here is a script which will go through the update process. A serious downside of this method is that you'll need the full 500MB available for a short bit of time, since you can unly sync an uncompressed copy. Here is a simple script which goes through the steps. I'm sure it can be improved, but you get the idea.

/usr/local/sbin/emerge-sync-squash.sh
Code:

#!/bin/bash

## Where things go:
PORTDIR=/usr/portage
PORTSQSH=/usr/portage.sqsh
PORTTMP=/var/tmp/portage-tmpcopy

## First, make sure the squashed protage is mounted
mount -o remount $PORTDIR &> /dev/null || mount $PORTDIR

## make sure there is no old tmp copy in the way
rm -rf $PORTTMP

## Make an uncompressed copy
cp -ra $PORTDIR $PORTTMP

## sync to the uncompressed copy
FEATURES="-fixpackages" PORTDIR="$PORTTMP" emerge --nospinner --sync

## uncomment re-make the "q" database, if you use it.  (portage-utils)
#PORTDIR=$PORTTMP  q -r

## squash! and remount
umount $PORTDIR
rm -f $PORTSQSH
mksquashfs $PORTTMP $PORTSQSH -check_data
mount $PORTDIR

## cleanup
rm -rf $PORTTMP

##  Make sure local databases are up to date
emerge --nospinner --metadata
update-eix


That's it. As you can see, the idea is straightforward, but you need to full disk space available somewhere, if only for 5 minutes.


UPDATING FROM SOMEONE ELSE
If you don't have that 500M available (if you did, then this wouldn't be a very useful exercise!), then the only option is to only get compressed copies. Eventually, perhaps, it will be possible to produce a squashfs directly from a tarball, but this is not yet possible. So, you have to get the squashed image directly.

For a short while, I'll provide pre-squashed copies, updated every day or two, at
http://www.math.duke.edu/~adsmith/portage.sqsh
However don't expect these to last forever. If my sysadmin complains about traffic, I'll have to take them down.

To update with these pre-squashed copies, simply to
/usr/local/sbin/emerge-sync-squash2.sh
Code:

#!/bin/bash

umount /usr/portage
wget http://www.math.duke.edu/~adsmith/portage.sqsh -O /usr/portage.sqsh
mount /usr/portage
emerge --metadata
update-eix


I'll think about how to distribute these using binary diffs, but for now, they're only 25MB, so it's not big deal.


WHAT ABOUT SHARING THE TREE??
If you want to share the tree over a LAN, you have to be a bit more careful. NFS doesn't play with squashfs, so you have to use network block devices (config kernel and emerge nbd) to export the /usr/portage.sqsh image file to your other systems.

basically, you do this:

server:
Code:

nbd-server <port> /usr/portage.sqsh


client:
Code:

modprobe nbd
nbd-client <server> <port> /dev/ndb0
mount /dev/nbd0 /usr/portage




Enjoy!


UPDATE: A few pages ahead, several people are reporting success by using unionfs over squashfs to make a RW portage tree, which can then be re-squashed after a sync. I personally have not had success with this, but there are good instructions ahead.
[edit]: changed -noappend to 'rm file', so mksquashfs doesn't make portage a subdirectory.


Last edited by adsmith on Sat Mar 25, 2006 3:25 pm; edited 2 times in total
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2977
Location: Germany

PostPosted: Sun Nov 13, 2005 1:25 am    Post subject: Re: TIP: Compressing portage tree using squashfs Reply with quote

adsmith wrote:
Why?
Wow, the portage tree is huge! It takes around 500MB, which is a huge chunk of the storage space on many systems.


500MB is huge :?:

I'm somewhat unsatisfied by this description. Hard disks are so huge nowadays (they broke like the 100GB barrier ages ago) that it seems hardly imaginable that 500MB would pose that much of a problem. The only application which would be more limited in terms of disk space that I can think of, would be a virtual server. And for a Gentoo installation/update on those I would go about in a completely different manner.

So can you give a real world example where SquashFSing the Portage tree would actually be useful? It has so many downsides (hard to update, requires a lot of free space anyway during update), plus the Portage tree isn't even the biggest part about Gentoo (my distfiles usually grows bigger than that, as well as the work directory during a compile).

If you need the free space only while you are not updating your system, and you are only updating like once a week, and you don't mind to download a tarballed tree then, you could just delete the whole thing (and thus achieve a even better compression ration than SquashFS without all the hassle). :lol:
Back to top
View user's profile Send private message
adsmith
Veteran
Veteran


Joined: 26 Sep 2004
Posts: 1386
Location: NC, USA

PostPosted: Sun Nov 13, 2005 4:22 am    Post subject: Reply with quote

Sure -- a good question.

Nowadays, when many desktop machines are amd64 mid-towers with 500 GB of SATA drives, we forget that not all machines have the same flexibility. Sometimes we can't or don't want to buy a new drive for an older machine which is otherwise still useful.

Case in point: Many small ("legacy") laptops have 8, 10 or 12G drives. Basic system tools take up to 2GB; A window manager and X programs take another 1 or 2GB. Saving 0.5GB can be crucial! (As for distfiles, I clean it after every emerge, so it's never any bigger than a few dozen MB)

Such a machine (like mine!) might only be updated every several months, when there is a day it can connect to a LAN with fast distcc hosts. However, a recent portage tree still has to be kept on the laptop in order for various system utilities to function properly.


So, Yes, this is a very specific application which isn't of much benefit for many people, but in some cases, I've found it to be quite useful.


Another point: In some sense, when slow hard drvies are in use, this obviates the need for putting the portage tree on a seperate reiser partition, et cetera, for speed. The whole 25MB tree can usually fit in RAM all at once, and squashfs is really quite fast, so basic emerge functions are pretty darn quick, even with slow hardware. This is a sort of half-assed, cock-eyed way of getting the databased portage tree we all hope for some day. ;) well, okay, hardly.....
Back to top
View user's profile Send private message
adsmith
Veteran
Veteran


Joined: 26 Sep 2004
Posts: 1386
Location: NC, USA

PostPosted: Sun Nov 13, 2005 4:23 am    Post subject: Reply with quote

Also, note that in a parallel forum, one of our fellow gentooers seems to haver patched squashfs to read directly from tar.gz; hence, (once it really works) you can just grab the binary-diff portage snapshot and run from there. No extra space needed!

https://forums.gentoo.org/viewtopic-p-2873343.html#2873343
Back to top
View user's profile Send private message
kuku
Tux's lil' helper
Tux's lil' helper


Joined: 23 Dec 2004
Posts: 142

PostPosted: Sun Nov 13, 2005 7:03 am    Post subject: Reply with quote

what do you think about using unionfs to make the squashed image writable , then sync it and create new squashed image (you dont need the extra space to unpack it every time you sync)
Back to top
View user's profile Send private message
adsmith
Veteran
Veteran


Joined: 26 Sep 2004
Posts: 1386
Location: NC, USA

PostPosted: Sun Nov 13, 2005 1:43 pm    Post subject: Reply with quote

That's an interesting idea! I tried playing with unionfs a little bit for NFS purposes.

I'll give this a try later this afternoon.
Back to top
View user's profile Send private message
adsmith
Veteran
Veteran


Joined: 26 Sep 2004
Posts: 1386
Location: NC, USA

PostPosted: Sun Nov 13, 2005 2:58 pm    Post subject: Reply with quote

Okay, I've tried the unionfs idea.
The plan is this

1. start with a squashfs image, and mount it at /usr/portage-ro
2. mkdir /usr/portage-rw
3. mount -t unionfs -o dirs=/usr/portage-rw=rw:/usr/portage-ro=ro none /usr/portage
4. emerge --sync
5. remake squashfs image and remount.

This would be really slick if it worked, but I keep getting kernel Oopses on step 4. Is there a known problem with using rsync with unionfs??
Back to top
View user's profile Send private message
makomk
n00b
n00b


Joined: 15 Jul 2005
Posts: 46
Location: Not all there

PostPosted: Sun Nov 13, 2005 3:47 pm    Post subject: Reply with quote

adsmith wrote:
Also, note that in a parallel forum, one of our fellow gentooers seems to haver patched squashfs to read directly from tar.gz; hence, (once it really works) you can just grab the binary-diff portage snapshot and run from there. No extra space needed!

https://forums.gentoo.org/viewtopic-p-2873343.html#2873343


Unfortunately, that's not exactly how it works; I wasn't terribly clear. (Sorry about that - I was half-asleep at the time.) Basically, I patched mksquashfs to generate a squashfs filesystem directly from a portage tarball. Unfortunately, due to the way I did it, you have to use uncompressed tarballs (~200Mb), otherwise it takes forever. (Of course, emerge-delta-webrsync chucks uncompressed tarballs around anyway!) If I understand how squashfs works correctly, it should be possible to do it efficiently from compressed tarballs, but it'd probably require rewriting mksquashfs (not fun)*.

On the other hand, my solution works now - I'm running portage from a squashfs generated using it at the moment. I've got a modified version of emerge-delta-webrsync called emerge-delta-squashfs which runs the modified mksquashfs instead of tarsync - quite neat actually. I don't have anywhere to host the patches, but anyone who's interested should feel free to PM or otherwise contact me, and I'd be happy to mail them (7K total).

Basically, you install emerge-delta-squashfs and the modified mksquashfs, set up your system as above (except don't emerge squashfs-tools, and replace /usr/portage.sqsh with /var/delta-webrsync/portage.squashfs everywhere), then run emerge-delta-squashfs every time you want to sync portage. Free space requirements are the same as emerge-delta-webrsync, minus the 500Mb /usr/portage.

* Besides, it might be easier and better to find a way of generating deltas between squashfs filesystems.
Back to top
View user's profile Send private message
adsmith
Veteran
Veteran


Joined: 26 Sep 2004
Posts: 1386
Location: NC, USA

PostPosted: Sun Nov 13, 2005 3:52 pm    Post subject: Reply with quote

Quote:

... (once it really works) ...

Yeah, I got that picture. I guess I didn't convey it to others very clearly.

I'd love to play with the patches. You can guess my email address from the weblink I provide above, and I could host them there as well, if others want to play.
Back to top
View user's profile Send private message
adsmith
Veteran
Veteran


Joined: 26 Sep 2004
Posts: 1386
Location: NC, USA

PostPosted: Sun Nov 13, 2005 4:27 pm    Post subject: Reply with quote

Ah! Yet another way to do it!

We can use the "--compare-dest" option in rsync. To do this, simply add a corresponding line around line number 2434 of /usr/lib/portage/bin/emerge: e.g.,
Code:

            "--compare-dest='/usr/portage-ro'",

if the squashfs is mounted at /usr/portage-ro

it's still a little shakey, but it looks promising.
Back to top
View user's profile Send private message
makomk
n00b
n00b


Joined: 15 Jul 2005
Posts: 46
Location: Not all there

PostPosted: Sun Nov 13, 2005 5:15 pm    Post subject: Re: TIP: Compressing portage tree using squashfs Reply with quote

frostschutz wrote:
So can you give a real world example where SquashFSing the Portage tree would actually be useful? It has so many downsides (hard to update, requires a lot of free space anyway during update), plus the Portage tree isn't even the biggest part about Gentoo (my distfiles usually grows bigger than that, as well as the work directory during a compile).

If you need the free space only while you are not updating your system, and you are only updating like once a week, and you don't mind to download a tarballed tree then, you could just delete the whole thing (and thus achieve a even better compression ration than SquashFS without all the hassle). :lol:


Not if the reason you need the extra 500Mb space is to *build* the packages in (some of them are pretty hefty - the main thing I had/have space problems with was emerges). Also, part of the reason I like my current experimental setup is that I just have to run emerge-delta-squashfs (and hope it works) instead of running emerge-delta-webrsync like I did before - easy.
Back to top
View user's profile Send private message
mdeininger
Veteran
Veteran


Joined: 15 Jun 2005
Posts: 1740
Location: Emerald Isles, observing Dublin's docklands

PostPosted: Sun Nov 13, 2005 7:14 pm    Post subject: Reply with quote

That's quite a nice one really, and I'd like to back you guys up on the space thing, the computer I have at work only has like 8 gigs of hd and saving 500 megs of disk space really can be crucial :)

albeit i got around that slightly differently, I did a normal install, copied everything to our server and made the server create squashfs images that i put on my machine's harddrive with dd. then i made the machine boot with an initrd that mounts the read-only image and creates three ramdisks, one for /tmp and two for /etc and /var. it then combines the /etc and /var from the image with the ramdisks using unionfs so that every change goes to the ramdisk. works like a charm, the image is 700mb including gnome and some office stuff and i get lotsa disk space for my /home. Updates are rather fast too, since i can just chroot on the thing on the server, so if you're tight on disk space, you could try this method instead of only compressing /usr/portage ;). It might be me but i also think it's faster than using reiserfs, at least on that old box...
_________________
"Confident, lazy, cocky, dead." -- Felix Jongleur, Otherland

( Twitter | Blog | GitHub )
Back to top
View user's profile Send private message
electrofreak
l33t
l33t


Joined: 30 Jun 2004
Posts: 713
Location: Ohio, USA

PostPosted: Sun Nov 13, 2005 9:42 pm    Post subject: Reply with quote

My laptop only has a 2GB hard drive, so most definately 500MB helps. Unfortunately, the portage tree doesn't actually take up 500MBs, but only like 100-150MB. 'du .hs /usr/portage' isn't very accurate. However, even saving about 100MB still helps. But I don't really want to do this if it still needs the space to sync and such. so when the unionfs idea works, I may go ahead and do that.
_________________
Desktop: ABit AN8, Athlon64 X2 4400+ 939 2.75GHz, 2x1GB Corsair XMS DDR400, 2x160GB SATA RAID-0, 2x20"W, Vista Ultimate x64
Laptop: 15.4" MacBook Pro 2.4Ghz, 2x1GB RAM, 160GB, Mac OS X 10.5.1
Server: PIII 550Mhz, 3x128MB RAM, 160GB, Ubuntu Server 7.10
Back to top
View user's profile Send private message
adsmith
Veteran
Veteran


Joined: 26 Sep 2004
Posts: 1386
Location: NC, USA

PostPosted: Sun Nov 13, 2005 9:54 pm    Post subject: Reply with quote

While du doesn't reflect the size of the actual files, it does show how much disk space is allocated for those files. From the perspective of wasted disk space, it's correct.

Between the websync script mentioned above, and some other ideas, we're working on better sync/update methods.
Back to top
View user's profile Send private message
Cappo
n00b
n00b


Joined: 29 Jun 2004
Posts: 21

PostPosted: Sun Nov 20, 2005 10:16 pm    Post subject: Reply with quote

When I try to mount the squashfs partition, I get this error in the system log:

SQUASHFS error: zlib_fs returned unexpected result 0xfffffffd
SQUASHFS error: Unable to read cache block [1e85440:90f]
SQUASHFS error: Unable to read inode [1e85440:90f]
SQUASHFS error: Root inode create failed


Any idea what this means and how to fix it?
Back to top
View user's profile Send private message
Cappo
n00b
n00b


Joined: 29 Jun 2004
Posts: 21

PostPosted: Sun Nov 20, 2005 11:32 pm    Post subject: Reply with quote

Nevermind. After restarting my computer, the error message went away.
Back to top
View user's profile Send private message
lazy_bum
l33t
l33t


Joined: 16 Feb 2005
Posts: 691

PostPosted: Thu Dec 01, 2005 11:32 am    Post subject: Reply with quote

Is it possible to do the same thing with kernel sources? (-:
_________________
roslin uberlay | grubelek
Back to top
View user's profile Send private message
adsmith
Veteran
Veteran


Joined: 26 Sep 2004
Posts: 1386
Location: NC, USA

PostPosted: Thu Dec 01, 2005 1:54 pm    Post subject: Reply with quote

errr.. how would you build them if the directory is read-only????
Back to top
View user's profile Send private message
lazy_bum
l33t
l33t


Joined: 16 Feb 2005
Posts: 691

PostPosted: Thu Dec 01, 2005 2:27 pm    Post subject: Reply with quote

adsmith wrote:
errr.. how would you build them if the directory is read-only????

Well, my kernel is about 2 months old (or even more). Only few modules, almost everything build-in... so that's why i wonder about "squashing" kernel (it's ~300 mb, a lot of small files).
_________________
roslin uberlay | grubelek
Back to top
View user's profile Send private message
adsmith
Veteran
Veteran


Joined: 26 Sep 2004
Posts: 1386
Location: NC, USA

PostPosted: Thu Dec 01, 2005 3:16 pm    Post subject: Reply with quote

I don't understand what you mean at all.

You can always delete the /usr/src/linux-* sources if you so desire, since that's only needed for building the kernel. The main kernel image, /boot/kernel-*, is already bz2-compressed. I'm not sure if the modules in /lib/modules/* are compressed or not, but if they aren't, then they can't be.

At the very least, you can "cd /usr/src/linux" and "make clean" to clean up everything but the original source.
Back to top
View user's profile Send private message
yoyo
Bodhisattva
Bodhisattva


Joined: 04 Mar 2003
Posts: 4273
Location: Lyon - France

PostPosted: Fri Dec 09, 2005 8:39 am    Post subject: Reply with quote

Good tips imho.
But don't forget the "PKGDIR" which, like DISTDIR, needs to be mounted with read/write access.

My 0.02 cents.
_________________
La connaissance s'accroît quand on la partage.
JCB
Back to top
View user's profile Send private message
bibi.skuk
Guru
Guru


Joined: 01 Aug 2005
Posts: 425

PostPosted: Fri Dec 09, 2005 10:37 pm    Post subject: Reply with quote

adsmith wrote:
I don't understand what you mean at all.

You can always delete the /usr/src/linux-* sources if you so desire, since that's only needed for building the kernel. The main kernel image, /boot/kernel-*, is already bz2-compressed. I'm not sure if the modules in /lib/modules/* are compressed or not, but if they aren't, then they can't be.

At the very least, you can "cd /usr/src/linux" and "make clean" to clean up everything but the original source.


But in order to build modules, and some apps, you need to have a /usr/src/linux/... maybe only the config file, but i don't know exactly.
Back to top
View user's profile Send private message
electrofreak
l33t
l33t


Joined: 30 Jun 2004
Posts: 713
Location: Ohio, USA

PostPosted: Sat Dec 10, 2005 2:31 am    Post subject: Reply with quote

About compressing the linux sources: I actually tar my current sources up and always unmerge older sources. When I come across something that needs them to build their stuff, then I simply untar them, let it build that delete them (leaving the tar there still). Yeah, it's a little extra work.

If we could get a compressed portage tree that can be written to and updated as if it wasn't compressed, then, yeah, it'd be completely awsome to do the same thing with the linux sources.
_________________
Desktop: ABit AN8, Athlon64 X2 4400+ 939 2.75GHz, 2x1GB Corsair XMS DDR400, 2x160GB SATA RAID-0, 2x20"W, Vista Ultimate x64
Laptop: 15.4" MacBook Pro 2.4Ghz, 2x1GB RAM, 160GB, Mac OS X 10.5.1
Server: PIII 550Mhz, 3x128MB RAM, 160GB, Ubuntu Server 7.10
Back to top
View user's profile Send private message
frank_einstien
n00b
n00b


Joined: 11 Apr 2005
Posts: 10

PostPosted: Wed Dec 21, 2005 4:32 pm    Post subject: Reply with quote

Hi guys
How do I get use squashfs without recompiling my kernal? Just asking...
Thanks
Back to top
View user's profile Send private message
drwook
Veteran
Veteran


Joined: 30 Mar 2005
Posts: 1324
Location: London

PostPosted: Wed Dec 21, 2005 7:15 pm    Post subject: Reply with quote

Short answer is you don't ;)

Seriously though, compiling a kernel isn't a black art or anything. Give it a go if you haven't before.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Goto page 1, 2, 3, 4  Next
Page 1 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum