View previous topic :: View next topic |
Author |
Message |
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Mon Oct 12, 2009 11:35 pm Post subject: Compressing large similar files |
|
|
I've got a bunch of uncompressed disk images which are very large (Ranging from ~4GB to >100GB) but very similar which I want to compress and archive.
They already compress very well (The >100GB ones can be 7zip'd down to ~30GB!), but I want to take advantage of the fact that they're very similar to compress them even more.
Does anyone have any suggestions of compressors which can do a better job of compressing them?
7z and rar with default options don't get compression ratios much better than if I compressed them all individually, but I haven't explored the documentation much, e.g. if there is a way to interleave them for instance... |
|
Back to top |
|
|
yther Apprentice
Joined: 25 Oct 2002 Posts: 151 Location: Charlotte, NC (USA)
|
Posted: Mon Oct 12, 2009 11:59 pm Post subject: Re: Compressing large similar files |
|
|
Cyker wrote: | I've got a bunch of uncompressed disk images which are very large (Ranging from ~4GB to >100GB) but very similar... |
Those are so big, I don't know if concatenating them first with tar would be useful at all. However, if they truly are very similar (like one image was created based on another one) you could use something like xdelta to diff the derivatives against the original. Then you just compress the original and store the diffs alongside.
However, this might be considered a lot of trouble to go through... just depends on how tight you are on disk space and if you consider it worth your time. |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Tue Oct 13, 2009 7:51 am Post subject: |
|
|
Yeah, the sheer size really is the problem (I never thought I'd be dealing with stuff that big just a few years ago! )
But your idea is genius!
I hadn't thought of diff'ing them off the base image; If it can deal with blocks that hve moved as well and not just straight differences, this could shrink the images by several orders of magnitude!
I shall play with the xdelta thing and see how it goes; Thanks!
(If anyone has any other ideas tho', please chime in!) |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Thu Oct 15, 2009 6:17 pm Post subject: |
|
|
Weeee! It shrank the images from just under a Terabyte to about 40GB
Cool beans! |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Mon Oct 19, 2009 9:55 pm Post subject: |
|
|
A second option would probably be a 7zip solid archive. However I doubt it will be anywhere near as good as you did. _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Tue Oct 20, 2009 4:21 pm Post subject: |
|
|
I had tried that but alas 7zip's solid mode can't cope with files that big
If I compressed one of the 160GB images, it would shrink it down to something like 34GB, but if you did two of them, it only goes from 320GB to ~64GB. The files are just too big for 7z's solid compression window/dictionary/thingy.
(And if it did it'd need more RAM than my compy has!! )
The xdelta was very smart tho'; I ran it on the base image and the first mod, and the xdelta diff is 7MB!
I ran through all the files and then 7zip'ed up the base image and all the diffs I'd ended up with. 7zip did its magic on the base image and the xdelta files and then... Small-Violin-Like-Instrument! 40GB!
I wonder if there is a 7z/rar grade archiver which uses diff'ing for its solid packing technology; It's definitely a lot less RAM hungry than 7z's (As you bump up the solid dictionary size for 7z, the RAM requirements just get silly!) |
|
Back to top |
|
|
|