Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Compressing large similar files
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Mon Oct 12, 2009 11:35 pm    Post subject: Compressing large similar files Reply with quote

I've got a bunch of uncompressed disk images which are very large (Ranging from ~4GB to >100GB) but very similar which I want to compress and archive.

They already compress very well (The >100GB ones can be 7zip'd down to ~30GB!), but I want to take advantage of the fact that they're very similar to compress them even more.

Does anyone have any suggestions of compressors which can do a better job of compressing them?

7z and rar with default options don't get compression ratios much better than if I compressed them all individually, but I haven't explored the documentation much, e.g. if there is a way to interleave them for instance...
Back to top
View user's profile Send private message
yther
Apprentice
Apprentice


Joined: 25 Oct 2002
Posts: 151
Location: Charlotte, NC (USA)

PostPosted: Mon Oct 12, 2009 11:59 pm    Post subject: Re: Compressing large similar files Reply with quote

Cyker wrote:
I've got a bunch of uncompressed disk images which are very large (Ranging from ~4GB to >100GB) but very similar...


Those are so big, I don't know if concatenating them first with tar would be useful at all. However, if they truly are very similar (like one image was created based on another one) you could use something like xdelta to diff the derivatives against the original. Then you just compress the original and store the diffs alongside.

However, this might be considered a lot of trouble to go through... just depends on how tight you are on disk space and if you consider it worth your time. :)
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Tue Oct 13, 2009 7:51 am    Post subject: Reply with quote

Yeah, the sheer size really is the problem (I never thought I'd be dealing with stuff that big just a few years ago! :lol:)

But your idea is genius! :D

I hadn't thought of diff'ing them off the base image; If it can deal with blocks that hve moved as well and not just straight differences, this could shrink the images by several orders of magnitude!

I shall play with the xdelta thing and see how it goes; Thanks! :D

(If anyone has any other ideas tho', please chime in!)
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Thu Oct 15, 2009 6:17 pm    Post subject: Reply with quote

Weeee! It shrank the images from just under a Terabyte to about 40GB :D

Cool beans!
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Mon Oct 19, 2009 9:55 pm    Post subject: Reply with quote

A second option would probably be a 7zip solid archive. However I doubt it will be anywhere near as good as you did.
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Tue Oct 20, 2009 4:21 pm    Post subject: Reply with quote

I had tried that but alas 7zip's solid mode can't cope with files that big :(

If I compressed one of the 160GB images, it would shrink it down to something like 34GB, but if you did two of them, it only goes from 320GB to ~64GB. The files are just too big for 7z's solid compression window/dictionary/thingy. :(
(And if it did it'd need more RAM than my compy has!! :lol: )

The xdelta was very smart tho'; I ran it on the base image and the first mod, and the xdelta diff is 7MB!

I ran through all the files and then 7zip'ed up the base image and all the diffs I'd ended up with. 7zip did its magic on the base image and the xdelta files and then... Small-Violin-Like-Instrument! 40GB! :D

I wonder if there is a 7z/rar grade archiver which uses diff'ing for its solid packing technology; It's definitely a lot less RAM hungry than 7z's (As you bump up the solid dictionary size for 7z, the RAM requirements just get silly!)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum