View previous topic :: View next topic |
Author |
Message |
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Mon Aug 07, 2017 5:15 pm Post subject: Weird 7-zip behaviour |
|
|
Does anyone here use 7-zip for archiving e.g. historical copies of USB sticks?
I've been updating some old archives and had an old 7zip archive that was compressed to 2GB with 4 fairly similar copies of this USB stick.
I just decompressed it and added 3 more historical copies and the final 7zip archive is now 8GB, which is only slightly (Like, 1.5GB) smaller than its uncompressed form!
I am using the same command that I used previously:
Code: | 7z a -mx=9 -ms=on -md=512m <dest.7z> <sources> |
The 3 new copies are at least 90% identical and solid mode is explicitly on, and the final archive confirms this, but the compression ratio I'm getting is awful, like it is non-solid!
Has anyone else experienced this?
Last edited by Cyker on Mon Aug 07, 2017 7:02 pm; edited 1 time in total |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Mon Aug 07, 2017 7:01 pm Post subject: |
|
|
After some digging, it seems that at some point 7zip stopped sorting files into a more optimal order before compressing them. Now I guess it just compresses them in what ever random order it gets them from the filesystem.
I made a file list of the files sorted by name and fed that into it and now the archive is now down to 1.9GB (From 8GB!)
Not sure why they took that out as it clearly has a massive effect on compression performance...
I wonder... Does anyone know of some sort of pre-processor floating around that e.g. groups files by type, name and maybe checksum and puts it in a list file I can feed to 7z? |
|
Back to top |
|
|
steveL Watchman
Joined: 13 Sep 2006 Posts: 5153 Location: The Peanut Gallery
|
Posted: Thu Aug 17, 2017 7:04 pm Post subject: |
|
|
Cyker wrote: | Does anyone know of some sort of pre-processor floating around that e.g. groups files by type, name and maybe checksum and puts it in a list file I can feed to 7z? | I'd ask in #bash on IRC: chat.freenode.net or .org
That kind of thing is a standard beginner's question in there; they'll help you to knock up a quick script that you understand, and so can maintain; while also being robust against any filename. |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Thu Aug 17, 2017 7:35 pm Post subject: |
|
|
Cheers Steve, I couldn't find one so have indeed bodged up one myself
Still, that's a good suggestion - I might drop in there later and see if they can help make it a bit more elegant! |
|
Back to top |
|
|
Ant P. Watchman
Joined: 18 Apr 2009 Posts: 6920
|
Posted: Thu Aug 17, 2017 10:45 pm Post subject: |
|
|
Use lrzip for stuff like this (or set the window size large enough). |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Fri Aug 18, 2017 8:55 am Post subject: |
|
|
Oooooh where were you when I was asking about ways to compress disk images and VMs!!!!!
For this situation it isn't so great as it seems lrzip has pretty poor support outside linux, and I already solved the problem by pre-processing the file list first (And both rar and 7zip are great at compressing huge numbers of similar files and have very good multi-platform support)
However, I have been after something like lrzip for ages... I'll have to see how it does vs the xdelta/7zip kludge I have going at the moment for these very similar multi-gigabyte disk images! Thanks Ant! |
|
Back to top |
|
|
Yamakuzure Advocate
Joined: 21 Jun 2006 Posts: 2284 Location: Adendorf, Germany
|
Posted: Fri Aug 18, 2017 9:17 am Post subject: Re: Weird 7-zip behaviour |
|
|
Cyker wrote: | Code: | 7z a -mx=9 -ms=on -md=512m <dest.7z> <sources> |
| Did you try this already? Code: | 7za a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on <dest.7z> <directory> | If you feed a file list, it is taken as is. If you feed directories, their content is used in their "natural" order. Maybe that has some effect?
Oh, and do you really think a dictionary size of 512M is needed? 32M should be more than enough.
Edith just tried to pack a 4.1GB VM.
With a dictionary size of 32M, the resulting archive has 1.5GB.
With a dictionary size of 512M, the resulting archive has 1.4GB - but packing took more than twice as long. _________________ Important German:- "Aha" - German reaction to pretend that you are really interested while giving no f*ck.
- "Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.
Last edited by Yamakuzure on Fri Aug 18, 2017 10:55 am; edited 1 time in total |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Fri Aug 18, 2017 10:00 am Post subject: |
|
|
I haven't, but TBH I doubt it'll make any difference.
The problem is that, at some point, 7zip stopped grouping similar files together before compressing. I don't know what order it takes them in now, but it is far less optimal than before when doing a solid-compress.
Pre-processing the file list so that similar files are together and feeding that to it fixes the problem.
And the 512MB dictionary? Maybe not always necessary, but I do it because I CAN!
(This is literally the only genuinely useful advantage I got from moving to 64-bits so I use the heck out of it! )
In this case 512MB is actually not enough - If I could push it to, say, 8GB then the fact that the files are not pre-sorted anymore wouldn't have been a problem! |
|
Back to top |
|
|
Yamakuzure Advocate
Joined: 21 Jun 2006 Posts: 2284 Location: Adendorf, Germany
|
Posted: Fri Aug 18, 2017 10:56 am Post subject: |
|
|
Cyker wrote: | If I could push it to, say, 8GB then the fact that the files are not pre-sorted anymore wouldn't have been a problem! | ...only that you would have an 8GB dictionary in your archive. _________________ Important German:- "Aha" - German reaction to pretend that you are really interested while giving no f*ck.
- "Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.
|
|
Back to top |
|
|
|