Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Weird 7-zip behaviour
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Mon Aug 07, 2017 5:15 pm    Post subject: Weird 7-zip behaviour Reply with quote

Does anyone here use 7-zip for archiving e.g. historical copies of USB sticks?

I've been updating some old archives and had an old 7zip archive that was compressed to 2GB with 4 fairly similar copies of this USB stick.

I just decompressed it and added 3 more historical copies and the final 7zip archive is now 8GB, which is only slightly (Like, 1.5GB) smaller than its uncompressed form!

I am using the same command that I used previously:
Code:
7z a -mx=9 -ms=on -md=512m <dest.7z> <sources>


The 3 new copies are at least 90% identical and solid mode is explicitly on, and the final archive confirms this, but the compression ratio I'm getting is awful, like it is non-solid!

Has anyone else experienced this?


Last edited by Cyker on Mon Aug 07, 2017 7:02 pm; edited 1 time in total
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Mon Aug 07, 2017 7:01 pm    Post subject: Reply with quote

After some digging, it seems that at some point 7zip stopped sorting files into a more optimal order before compressing them. Now I guess it just compresses them in what ever random order it gets them from the filesystem.

I made a file list of the files sorted by name and fed that into it and now the archive is now down to 1.9GB (From 8GB!)

Not sure why they took that out as it clearly has a massive effect on compression performance...

I wonder... Does anyone know of some sort of pre-processor floating around that e.g. groups files by type, name and maybe checksum and puts it in a list file I can feed to 7z?
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Thu Aug 17, 2017 7:04 pm    Post subject: Reply with quote

Cyker wrote:
Does anyone know of some sort of pre-processor floating around that e.g. groups files by type, name and maybe checksum and puts it in a list file I can feed to 7z?
I'd ask in #bash on IRC: chat.freenode.net or .org
That kind of thing is a standard beginner's question in there; they'll help you to knock up a quick script that you understand, and so can maintain; while also being robust against any filename.
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Thu Aug 17, 2017 7:35 pm    Post subject: Reply with quote

Cheers Steve, I couldn't find one so have indeed bodged up one myself :)

Still, that's a good suggestion - I might drop in there later and see if they can help make it a bit more elegant!
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Thu Aug 17, 2017 10:45 pm    Post subject: Reply with quote

Use lrzip for stuff like this (or set the window size large enough).
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Fri Aug 18, 2017 8:55 am    Post subject: Reply with quote

Oooooh where were you when I was asking about ways to compress disk images and VMs!!!!! :D

For this situation it isn't so great as it seems lrzip has pretty poor support outside linux, and I already solved the problem by pre-processing the file list first (And both rar and 7zip are great at compressing huge numbers of similar files and have very good multi-platform support)

However, I have been after something like lrzip for ages... I'll have to see how it does vs the xdelta/7zip kludge I have going at the moment for these very similar multi-gigabyte disk images! Thanks Ant! :)
Back to top
View user's profile Send private message
Yamakuzure
Advocate
Advocate


Joined: 21 Jun 2006
Posts: 2284
Location: Adendorf, Germany

PostPosted: Fri Aug 18, 2017 9:17 am    Post subject: Re: Weird 7-zip behaviour Reply with quote

Cyker wrote:
Code:
7z a -mx=9 -ms=on -md=512m <dest.7z> <sources>
Did you try this already?
Code:
7za a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on <dest.7z> <directory>
If you feed a file list, it is taken as is. If you feed directories, their content is used in their "natural" order. Maybe that has some effect?

Oh, and do you really think a dictionary size of 512M is needed? 32M should be more than enough.

Edith just tried to pack a 4.1GB VM.
With a dictionary size of 32M, the resulting archive has 1.5GB.
With a dictionary size of 512M, the resulting archive has 1.4GB - but packing took more than twice as long.
_________________
Important German:
  1. "Aha" - German reaction to pretend that you are really interested while giving no f*ck.
  2. "Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.


Last edited by Yamakuzure on Fri Aug 18, 2017 10:55 am; edited 1 time in total
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Fri Aug 18, 2017 10:00 am    Post subject: Reply with quote

I haven't, but TBH I doubt it'll make any difference.

The problem is that, at some point, 7zip stopped grouping similar files together before compressing. I don't know what order it takes them in now, but it is far less optimal than before when doing a solid-compress.

Pre-processing the file list so that similar files are together and feeding that to it fixes the problem.

And the 512MB dictionary? Maybe not always necessary, but I do it because I CAN! :P
(This is literally the only genuinely useful advantage I got from moving to 64-bits so I use the heck out of it! :P)
In this case 512MB is actually not enough - If I could push it to, say, 8GB then the fact that the files are not pre-sorted anymore wouldn't have been a problem!
Back to top
View user's profile Send private message
Yamakuzure
Advocate
Advocate


Joined: 21 Jun 2006
Posts: 2284
Location: Adendorf, Germany

PostPosted: Fri Aug 18, 2017 10:56 am    Post subject: Reply with quote

Cyker wrote:
If I could push it to, say, 8GB then the fact that the files are not pre-sorted anymore wouldn't have been a problem!
...only that you would have an 8GB dictionary in your archive. :D
_________________
Important German:
  1. "Aha" - German reaction to pretend that you are really interested while giving no f*ck.
  2. "Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum