View previous topic :: View next topic |
Author |
Message |
avx Advocate
Joined: 21 Jun 2004 Posts: 2152
|
Posted: Wed Apr 04, 2012 11:41 pm Post subject: virtuals for gzip, bzip2 |
|
|
I'd like to know, if there is a specific reason that there currently are no virtual/gzip and virtual/bzip2 packages, given that there are parallel implementations (pbzip2, pigz) for them. As most current architectures are already multicore, I'd think a virtual would be a good option.
Or is there anything standing against it, as far as the project pages claim, they are compatible to the normal versions. _________________ ++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9677 Location: almost Mile High in the USA
|
Posted: Thu Apr 05, 2012 5:57 am Post subject: |
|
|
would be nice if upstream for gzip/bzip2/xz/... included the threading stuff as part of the regular distribution (and have an option to disable threading...
Have the parallel versions made a completely identical version? Do they work properly in pipes, etc. ? I'm not sure how well parallel works in pipes... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
avx Advocate
Joined: 21 Jun 2004 Posts: 2152
|
Posted: Thu Apr 05, 2012 12:37 pm Post subject: |
|
|
AFAIK, parallel here only means to use multiple CPUs/CPU cores when available, not more than one file in parallel. I've got both of them, replacing the originals, and didn't have a problem, yet, except for this one, which is fixed by now(#312967. _________________ ++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9677 Location: almost Mile High in the USA
|
Posted: Thu Apr 05, 2012 3:20 pm Post subject: |
|
|
Interesting. I was just wondering if you did something like
cat /dev/urandom |pbzip2 > file2.bz2
how that would behave. I don't think this really is parallelizeable except if perhaps it waited for enough data to come through the pipe before sending off a second thread to compress, then it has to schedule the two threads to make sure the pieces are in order. This is a different problem than
pbzip2 bigfile
where all pieces are ready...
It might just have a fallback mode, not sure... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
avx Advocate
Joined: 21 Jun 2004 Posts: 2152
|
Posted: Thu Apr 05, 2012 6:10 pm Post subject: |
|
|
eccerr0r wrote: | Interesting. I was just wondering if you did something like
cat /dev/urandom |pbzip2 > file2.bz2
how that would behave. I don't think this really is parallelizeable except if perhaps it waited for enough data to come through the pipe before sending off a second thread to compress, then it has to schedule the two threads to make sure the pieces are in order. This is a different problem than
pbzip2 bigfile
where all pieces are ready...
It might just have a fallback mode, not sure... | According to `top`, pbzip2 uses ~320% CPU (8 cores, 4 real, 4ht) on your given example, while normal bzip2 only uses ~70% (other 30% of one core are taken up by `cat`). _________________ ++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9677 Location: almost Mile High in the USA
|
Posted: Thu Apr 05, 2012 7:16 pm Post subject: |
|
|
cool, sounds like the bzip2/gzip (coreutils?) should include the multithreaded code. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
jdhore Retired Dev
Joined: 13 Apr 2007 Posts: 106
|
Posted: Thu Apr 05, 2012 7:25 pm Post subject: |
|
|
The problem with this (AFAIK) is that for a .tar.bz2 or a .tar.gz to be compressed/decompressed in parallel, it has to have been compressed using one of the parallelizing tools and i'd bet almost NONE of the tarballs on the internet are, so i'd say there's not much use in allowing pbzip2 to effectively replace bzip2 (or pigz to replace gzip). |
|
Back to top |
|
|
Dr.Willy Guru
Joined: 15 Jul 2007 Posts: 547 Location: NRW, Germany
|
Posted: Thu Apr 05, 2012 8:27 pm Post subject: |
|
|
jdhore wrote: | The problem with this (AFAIK) is that for a .tar.bz2 or a .tar.gz to be compressed/decompressed in parallel, it has to have been compressed using one of the parallelizing tools and i'd bet almost NONE of the tarballs on the internet are, so i'd say there's not much use in allowing pbzip2 to effectively replace bzip2 (or pigz to replace gzip). |
Where do you have this info from?
The german wiki says:
"Parallization is possible, because bzip2 compresses the input stream in blocks that are independent of each other."
Wikipedia says:
"Motivated by the large CPU time required for compression, a modified version was created in 2003 called pbzip2 that supported multi-threading, giving almost linear speed improvements on multi-CPU and multi-core computers.[4] As of May 2010, this functionality has not been incorporated into the main project." |
|
Back to top |
|
|
jdhore Retired Dev
Joined: 13 Apr 2007 Posts: 106
|
Posted: Thu Apr 05, 2012 9:29 pm Post subject: |
|
|
Dr.Willy wrote: | jdhore wrote: | The problem with this (AFAIK) is that for a .tar.bz2 or a .tar.gz to be compressed/decompressed in parallel, it has to have been compressed using one of the parallelizing tools and i'd bet almost NONE of the tarballs on the internet are, so i'd say there's not much use in allowing pbzip2 to effectively replace bzip2 (or pigz to replace gzip). |
Where do you have this info from?
The german wiki says:
"Parallization is possible, because bzip2 compresses the input stream in blocks that are independent of each other."
Wikipedia says:
"Motivated by the large CPU time required for compression, a modified version was created in 2003 called pbzip2 that supported multi-threading, giving almost linear speed improvements on multi-CPU and multi-core computers.[4] As of May 2010, this functionality has not been incorporated into the main project." |
I don't recall where else i read this when i originally did, but here's a reference to a test WikiMedia did of various bz2 implementations:
https://www.mediawiki.org/wiki/Dbzip2#cite_ref-0
See note #1: ↑ pbzip2 can only parallel-decompress its own funky output files. Regular bzip2 streams must be processed on a single thread.
I would bet that pigz suffers the same problem. |
|
Back to top |
|
|
avx Advocate
Joined: 21 Jun 2004 Posts: 2152
|
Posted: Thu Apr 05, 2012 10:44 pm Post subject: |
|
|
Even if this were true, how does this pose a problem? If I get that right, at worst it wouldn't be able to de-/compress in parallel in some situations, but it wouldn't fail to do the job. So there would still be a benefit to use these apps and if it's only for self created/creating archives. _________________ ++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>. |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20067
|
Posted: Mon Apr 09, 2012 2:45 am Post subject: |
|
|
Moved from Gentoo Chat to Portage & Programming. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
i92guboj Bodhisattva
Joined: 30 Nov 2004 Posts: 10315 Location: Córdoba (Spain)
|
Posted: Mon Apr 09, 2012 1:55 pm Post subject: |
|
|
Not that I have looked so deep into this, I am just thinking aloud, but this could have some other use cases. At least in theory having 7z installed would allow you to uncompress gzip and bzip2 files without problems. Wrapping the syntax to be compatible would be an extra concern though. I also have no idea how does 7z handle pipes if at all. As said, just thinking aloud |
|
Back to top |
|
|
avx Advocate
Joined: 21 Jun 2004 Posts: 2152
|
Posted: Mon Apr 09, 2012 2:22 pm Post subject: |
|
|
Already got p7zip installed, didn't try the pipe-case as of yet, though.
It's just me thinking, why not go a step forward and allowing those applications as alternatives in the system, if they are more "future-proof"? At least for now, I didn't find a problem, but they could potentially be beneficial, if they weren't, I don't see why they're available in portage after all. _________________ ++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>. |
|
Back to top |
|
|
i92guboj Bodhisattva
Joined: 30 Nov 2004 Posts: 10315 Location: Córdoba (Spain)
|
Posted: Tue Apr 10, 2012 12:24 pm Post subject: |
|
|
In case anyone cares, I just upgrades my kernel patchlevel from 3.3.0 to 3.3.1 using this:
Code: | 7z x -so ../patch-3.3.1.bz2 2>/dev/null | patch -p1 |
Much nicer than just
Code: | bzcat patch-3.3.1.bz2 | patch -1 |
But my point is that pipes seem to work the same, so, using 7z to fill an hypothetical gzip or bzip2 virtual seems doable. Not straightforward, though. For many reasons. |
|
Back to top |
|
|
avx Advocate
Joined: 21 Jun 2004 Posts: 2152
|
Posted: Tue Apr 10, 2012 12:28 pm Post subject: |
|
|
Sure, 7z is another possible way, not that easy to do, though. On the other hand, pigz/pbzip2 work as a drop-in replacement. _________________ ++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>. |
|
Back to top |
|
|
|