Weird 7-zip behaviour

Cyker · Veteran Joined: 15 Jun 2006 Posts: 1746

Does anyone here use 7-zip for archiving e.g. historical copies of USB sticks?

I've been updating some old archives and had an old 7zip archive that was compressed to 2GB with 4 fairly similar copies of this USB stick.

I just decompressed it and added 3 more historical copies and the final 7zip archive is now 8GB, which is only slightly (Like, 1.5GB) smaller than its uncompressed form!

I am using the same command that I used previously:

Cyker · Veteran Joined: 15 Jun 2006 Posts: 1746

After some digging, it seems that at some point 7zip stopped sorting files into a more optimal order before compressing them. Now I guess it just compresses them in what ever random order it gets them from the filesystem.

I made a file list of the files sorted by name and fed that into it and now the archive is now down to 1.9GB (From 8GB!)

Not sure why they took that out as it clearly has a massive effect on compression performance...

I wonder... Does anyone know of some sort of pre-processor floating around that e.g. groups files by type, name and maybe checksum and puts it in a list file I can feed to 7z?

steveL · Posted: Thu Aug 17, 2017 7:04 pm Post subject:

Cyker · Veteran Joined: 15 Jun 2006 Posts: 1746

Cheers Steve, I couldn't find one so have indeed bodged up one myself

Still, that's a good suggestion - I might drop in there later and see if they can help make it a bit more elegant!

Ant P. · Watchman Joined: 18 Apr 2009 Posts: 6920

Use lrzip for stuff like this (or set the window size large enough).

Cyker · Veteran Joined: 15 Jun 2006 Posts: 1746

Oooooh where were you when I was asking about ways to compress disk images and VMs!!!!!

For this situation it isn't so great as it seems lrzip has pretty poor support outside linux, and I already solved the problem by pre-processing the file list first (And both rar and 7zip are great at compressing huge numbers of similar files and have very good multi-platform support)

However, I have been after something like lrzip for ages... I'll have to see how it does vs the xdelta/7zip kludge I have going at the moment for these very similar multi-gigabyte disk images! Thanks Ant!

Yamakuzure · Posted: Fri Aug 18, 2017 9:17 am Post subject: Re: Weird 7-zip behaviour

Cyker · Veteran Joined: 15 Jun 2006 Posts: 1746

I haven't, but TBH I doubt it'll make any difference.

The problem is that, at some point, 7zip stopped grouping similar files together before compressing. I don't know what order it takes them in now, but it is far less optimal than before when doing a solid-compress.

Pre-processing the file list so that similar files are together and feeding that to it fixes the problem.

And the 512MB dictionary? Maybe not always necessary, but I do it because I CAN!

(This is literally the only genuinely useful advantage I got from moving to 64-bits so I use the heck out of it!

)
In this case 512MB is actually not enough - If I could push it to, say, 8GB then the fact that the files are not pre-sorted anymore wouldn't have been a problem!

Yamakuzure · Posted: Fri Aug 18, 2017 10:56 am Post subject: