Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
virtuals for gzip, bzip2
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
avx
Advocate
Advocate


Joined: 21 Jun 2004
Posts: 2070

PostPosted: Wed Apr 04, 2012 11:41 pm    Post subject: virtuals for gzip, bzip2 Reply with quote

I'd like to know, if there is a specific reason that there currently are no virtual/gzip and virtual/bzip2 packages, given that there are parallel implementations (pbzip2, pigz) for them. As most current architectures are already multicore, I'd think a virtual would be a good option.

Or is there anything standing against it, as far as the project pages claim, they are compatible to the normal versions.
_________________
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 3818
Location: USA

PostPosted: Thu Apr 05, 2012 5:57 am    Post subject: Reply with quote

would be nice if upstream for gzip/bzip2/xz/... included the threading stuff as part of the regular distribution (and have an option to disable threading...

Have the parallel versions made a completely identical version? Do they work properly in pipes, etc. ? I'm not sure how well parallel works in pipes...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
avx
Advocate
Advocate


Joined: 21 Jun 2004
Posts: 2070

PostPosted: Thu Apr 05, 2012 12:37 pm    Post subject: Reply with quote

AFAIK, parallel here only means to use multiple CPUs/CPU cores when available, not more than one file in parallel. I've got both of them, replacing the originals, and didn't have a problem, yet, except for this one, which is fixed by now(#312967.
_________________
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 3818
Location: USA

PostPosted: Thu Apr 05, 2012 3:20 pm    Post subject: Reply with quote

Interesting. I was just wondering if you did something like

cat /dev/urandom |pbzip2 > file2.bz2

how that would behave. I don't think this really is parallelizeable except if perhaps it waited for enough data to come through the pipe before sending off a second thread to compress, then it has to schedule the two threads to make sure the pieces are in order. This is a different problem than

pbzip2 bigfile

where all pieces are ready...

It might just have a fallback mode, not sure...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
avx
Advocate
Advocate


Joined: 21 Jun 2004
Posts: 2070

PostPosted: Thu Apr 05, 2012 6:10 pm    Post subject: Reply with quote

eccerr0r wrote:
Interesting. I was just wondering if you did something like

cat /dev/urandom |pbzip2 > file2.bz2

how that would behave. I don't think this really is parallelizeable except if perhaps it waited for enough data to come through the pipe before sending off a second thread to compress, then it has to schedule the two threads to make sure the pieces are in order. This is a different problem than

pbzip2 bigfile

where all pieces are ready...

It might just have a fallback mode, not sure...
According to `top`, pbzip2 uses ~320% CPU (8 cores, 4 real, 4ht) on your given example, while normal bzip2 only uses ~70% (other 30% of one core are taken up by `cat`).
_________________
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 3818
Location: USA

PostPosted: Thu Apr 05, 2012 7:16 pm    Post subject: Reply with quote

cool, sounds like the bzip2/gzip (coreutils?) should include the multithreaded code.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
jdhore
Developer
Developer


Joined: 13 Apr 2007
Posts: 106

PostPosted: Thu Apr 05, 2012 7:25 pm    Post subject: Reply with quote

The problem with this (AFAIK) is that for a .tar.bz2 or a .tar.gz to be compressed/decompressed in parallel, it has to have been compressed using one of the parallelizing tools and i'd bet almost NONE of the tarballs on the internet are, so i'd say there's not much use in allowing pbzip2 to effectively replace bzip2 (or pigz to replace gzip).
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 323
Location: NRW, Germany

PostPosted: Thu Apr 05, 2012 8:27 pm    Post subject: Reply with quote

jdhore wrote:
The problem with this (AFAIK) is that for a .tar.bz2 or a .tar.gz to be compressed/decompressed in parallel, it has to have been compressed using one of the parallelizing tools and i'd bet almost NONE of the tarballs on the internet are, so i'd say there's not much use in allowing pbzip2 to effectively replace bzip2 (or pigz to replace gzip).

Where do you have this info from?
The german wiki says:
"Parallization is possible, because bzip2 compresses the input stream in blocks that are independent of each other."
Wikipedia says:
"Motivated by the large CPU time required for compression, a modified version was created in 2003 called pbzip2 that supported multi-threading, giving almost linear speed improvements on multi-CPU and multi-core computers.[4] As of May 2010, this functionality has not been incorporated into the main project."
Back to top
View user's profile Send private message
jdhore
Developer
Developer


Joined: 13 Apr 2007
Posts: 106

PostPosted: Thu Apr 05, 2012 9:29 pm    Post subject: Reply with quote

Dr.Willy wrote:
jdhore wrote:
The problem with this (AFAIK) is that for a .tar.bz2 or a .tar.gz to be compressed/decompressed in parallel, it has to have been compressed using one of the parallelizing tools and i'd bet almost NONE of the tarballs on the internet are, so i'd say there's not much use in allowing pbzip2 to effectively replace bzip2 (or pigz to replace gzip).

Where do you have this info from?
The german wiki says:
"Parallization is possible, because bzip2 compresses the input stream in blocks that are independent of each other."
Wikipedia says:
"Motivated by the large CPU time required for compression, a modified version was created in 2003 called pbzip2 that supported multi-threading, giving almost linear speed improvements on multi-CPU and multi-core computers.[4] As of May 2010, this functionality has not been incorporated into the main project."


I don't recall where else i read this when i originally did, but here's a reference to a test WikiMedia did of various bz2 implementations:

https://www.mediawiki.org/wiki/Dbzip2#cite_ref-0

See note #1: ↑ pbzip2 can only parallel-decompress its own funky output files. Regular bzip2 streams must be processed on a single thread.

I would bet that pigz suffers the same problem.
Back to top
View user's profile Send private message
avx
Advocate
Advocate


Joined: 21 Jun 2004
Posts: 2070

PostPosted: Thu Apr 05, 2012 10:44 pm    Post subject: Reply with quote

Even if this were true, how does this pose a problem? If I get that right, at worst it wouldn't be able to de-/compress in parallel in some situations, but it wouldn't fail to do the job. So there would still be a benefit to use these apps and if it's only for self created/creating archives.
_________________
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16102
Location: Colorado

PostPosted: Mon Apr 09, 2012 2:45 am    Post subject: Reply with quote

Moved from Gentoo Chat to Portage & Programming.
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
i92guboj
Moderator
Moderator


Joined: 30 Nov 2004
Posts: 9804
Location: Córdoba (Spain)

PostPosted: Mon Apr 09, 2012 1:55 pm    Post subject: Reply with quote

Not that I have looked so deep into this, I am just thinking aloud, but this could have some other use cases. At least in theory having 7z installed would allow you to uncompress gzip and bzip2 files without problems. Wrapping the syntax to be compatible would be an extra concern though. I also have no idea how does 7z handle pipes if at all. As said, just thinking aloud :lol:
_________________
Gentoo Handbook | My website
Back to top
View user's profile Send private message
avx
Advocate
Advocate


Joined: 21 Jun 2004
Posts: 2070

PostPosted: Mon Apr 09, 2012 2:22 pm    Post subject: Reply with quote

Already got p7zip installed, didn't try the pipe-case as of yet, though.

It's just me thinking, why not go a step forward and allowing those applications as alternatives in the system, if they are more "future-proof"? At least for now, I didn't find a problem, but they could potentially be beneficial, if they weren't, I don't see why they're available in portage after all.
_________________
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
Back to top
View user's profile Send private message
i92guboj
Moderator
Moderator


Joined: 30 Nov 2004
Posts: 9804
Location: Córdoba (Spain)

PostPosted: Tue Apr 10, 2012 12:24 pm    Post subject: Reply with quote

In case anyone cares, I just upgrades my kernel patchlevel from 3.3.0 to 3.3.1 using this:

Code:
7z x -so ../patch-3.3.1.bz2 2>/dev/null | patch -p1


Much nicer than just

Code:
bzcat patch-3.3.1.bz2 | patch -1


:lol:

But my point is that pipes seem to work the same, so, using 7z to fill an hypothetical gzip or bzip2 virtual seems doable. Not straightforward, though. For many reasons. :roll:
_________________
Gentoo Handbook | My website
Back to top
View user's profile Send private message
avx
Advocate
Advocate


Joined: 21 Jun 2004
Posts: 2070

PostPosted: Tue Apr 10, 2012 12:28 pm    Post subject: Reply with quote

Sure, 7z is another possible way, not that easy to do, though. On the other hand, pigz/pbzip2 work as a drop-in replacement.
_________________
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum