Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
tar decompression "feature"?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Wed Mar 15, 2017 12:35 am    Post subject: tar decompression "feature"? Reply with quote

I botched typing trying to unarchive a tar bz2 file:

Code:
/tmp $ file /usr/portage/distfiles/iputils-s20151218.tar.bz2
/usr/portage/distfiles/iputils-s20151218.tar.bz2: bzip2 compressed data, block size = 900k
/tmp $ tar xf /usr/portage/distfiles/iputils-s20151218.tar.bz2
/tmp $


What's wrong with this picture? Well, historically this should have failed.

It's one thing to pass along which compressor you want to use, but now autodetecting?

Get with the times, or more feature bloat?

[EDIT: This is not part of coreutils, tar is in its own package. Not sure why I thought it was in coreutils.]
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?


Last edited by eccerr0r on Wed Mar 15, 2017 4:48 pm; edited 1 time in total
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21490

PostPosted: Wed Mar 15, 2017 12:58 am    Post subject: Reply with quote

You have an interesting definition of historical. :) As far as I know, this is purely a feature of tar, not coreutils.
NEWS: Release 1.15:
version 1.15 - Sergey Poznyakoff, 2004-12-20

* Compressed archives are recognised automatically, it is no longer
necessary to specify -Z, -z, or -j options to read them. Thus, you can
now run `tar tf archive.tar.gz'.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Wed Mar 15, 2017 5:35 am    Post subject: Reply with quote

I guess I never noticed it for over a decade, alas when I first used Linux (~1997), it required the correct decompression flag... I have definitely ran into the "can't find anything in tar file" "skipping archive" when forgetting the decompress flag.

Then again if you specify the wrong flag, it will complain...

I suppose the main reason why I always use the flag is because I'm forced to use it on real tape drives, I'm not sure if you're allowed to rewind on tape streams when autodetecting compression...

(Wow... I can't believe it... I am almost a 20 year Linux veteran ... sheesh...)
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Wed Mar 15, 2017 10:19 am    Post subject: Reply with quote

eccerr0r wrote:
I suppose the main reason why I always use the flag is because I'm forced to use it on real tape drives, I'm not sure if you're allowed to rewind on tape streams when autodetecting compression.
Just FYI, you are allowed to reposition tape in a fairly flexible manner. Repositioning to the beginning of a file is quick: no full tape rewind required. Of course, in this particular use case, the block is almost certainly still in cache, hence the tape is unlikely to actually have to physically move.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2977
Location: Germany

PostPosted: Wed Mar 15, 2017 11:11 am    Post subject: Reply with quote

eccerr0r wrote:
I guess I never noticed it for over a decade, alas when I first used Linux (~1997), it required the correct decompression flag...


What do you mean a decade. Debian still required it until recently :lol:

Seriously though I don't mind features like this. It's not hard to implement and makes tar a lot more convenient.

I would also love an update to dd to stop using blocksize 512 bytes, especially for simple dd if= of= where blocksize does not matter in the first place, it just pointlessly slows you down. It should be able to use 64K blocks internally without breaking backwards compatibility.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Wed Mar 15, 2017 11:57 am    Post subject: Reply with quote

frostschutz wrote:
I would also love an update to dd to stop using blocksize 512 bytes, especially for simple dd if= of= where blocksize does not matter in the first place, it just pointlessly slows you down.

No, it doesn't. Due to some data recovery sessions, I had several experiments with dd and other tools, and for some mysterious reason, blocksize 512 is optimal. Perhaps kernel or hardware caching are still optimized for it. 64k blocks or even larger actually slowed down.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Wed Mar 15, 2017 4:34 pm    Post subject: Reply with quote

John R. Graham wrote:
Just FYI, you are allowed to reposition tape in a fairly flexible manner. Repositioning to the beginning of a file is quick: no full tape rewind required. Of course, in this particular use case, the block is almost certainly still in cache, hence the tape is unlikely to actually have to physically move.

- John

That's the thing,the code would have to read a bit, then rewind and then feed it to the decompression program (as a naïve implementation).

Tapes aren't block devices, so it is not cached. This requires a bit of trickery unless tar itself stores the few bytes it read and pipes it along (and not just throw out all the bytes read feed the whole file to the decompression program). Depending on the tape, rewinding may be costly. Linear streamers might be easier than helical scan...

frostschutz wrote:
What do you mean a decade. Debian still required it until recently :lol:

Okay, I knew I wasn't smoking something when I knew it was required for a while. Yes I used debian for a while too :o
Quote:
Seriously though I don't mind features like this. It's not hard to implement and makes tar a lot more convenient.

Whatever happened to do something and do something well. IMHO this isn't the "unix mantra" - the program "file" is supposed to detect file types and one should do this via scripts (then again, you would force a need to mt rewind or mt bsf as you're surely going to have to back up the stream). I guess in this case since tar would have to buffer, this functionality would have to done within tar... and run multithreaded as well, which is a fairly "new" programming style.

Then again, does it actually work with tape drives? I'll need to test it, unless someone has a real tape drive available. I do, but not hooked up ... actually perhaps not necessary: this fails

Code:
$ cat openGPIB-0.13.tar.gz |tar tf -
tar: Archive is compressed. Use -z option
tar: Error is not recoverable: exiting now


Ahh proof that tar is lazier than I thought and indeed uses the naïve implementation by throwing out all the bytes and telling the decompression program to start afresh - which you can't do on pipes or character devices. This is gnu tar 1.29.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Wed Mar 15, 2017 5:48 pm    Post subject: Reply with quote

I have several tape drives. Can test this evening (EST5EDT) at the latest.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3104

PostPosted: Wed Mar 15, 2017 7:02 pm    Post subject: Reply with quote

Code:
$ tar czf test.tgz testdir/
$ rmdir testdir/
$ cat test.tgz | tar x
tar: Archive is compressed. Use -z option
tar: Error is not recoverable: exiting now
$

I's gonna fail.
I wonder if John's mileage would vary ;)
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Wed Mar 15, 2017 7:08 pm    Post subject: Reply with quote

I intend to specify the tape device directly as a command line option. Theoretically an app could be smart enough to control a tape drive. After all, there is that 't' right there in the name.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3104

PostPosted: Wed Mar 15, 2017 7:34 pm    Post subject: Reply with quote

It could also be smart enough to buffer the header in memory by itself. Easier and more generic than controlling the tape.
Still, I'm curious about the actual results from your little test.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Wed Mar 15, 2017 8:20 pm    Post subject: Reply with quote

I wish that floppy tapes still worked in Linux. It's the only linear tape drive that I have that I also have media for.

Right now I only have one linear tape that would work in Linux (SCSI DC-6525, 500MB) but I don't have any tapes. I have three SCSI helical scan drives (two 8mm (2.5GB) and one 4mm (DDS2)) that I do have tapes for, but I'll need to dig up a SCSI card... Needless to say these tapes are not used, as the density is way too low for my TB-sized array...

I do recall some inconsistencies with mt while controlling tapes, so there might be some luck some tapes can actually take rewinds from tar (after all, there exist some software that make tape work like very slow disk drives) but ultimately random access is not something tape is good at.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Wed Mar 15, 2017 8:41 pm    Post subject: Reply with quote

Ugh. I don't. Travan: what an awful kludge. "Real" tape drives verify as they are writing with a separate read head (or at least gap) and separate read electronics, automatically re-writing until they get a good read back. Also, Travan's data rate is incompatible with modern disk sizes.

I've got two DLT drives at home and loads of media. At work, I've loaned out my development machine's controller as a spare so that my Ultrium 6 LTO drive is unavailable right now.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Thu Mar 16, 2017 12:01 am    Post subject: Reply with quote

Simplistic it is!
Code:
~ # tar -cjvf /dev/nst0 *
<snip>
~ # mt weof 2
~ # mt rewind
~ # tar -tvf /dev/nst0
bzip2: (stdin) is not a bzip2 file.
tar: Child died with signal 13
tar: Error is not recoverable: exiting now
~ # mt rewind
~ # tar -tjvf /dev/nst0
-rw-r--r-- root/root     40757 2011-11-19 11:01 1
-rwxr-xr-x root/root       745 2007-08-20 19:04 1.bash
-rw-r--r-- root/root      1486 2007-05-07 23:35 1.diff
<snip>
Interesting that my error message is different from eccerr0r's.

eccerr0r wrote:
Tapes aren't block devices, so it is not cached. This requires a bit of trickery unless tar itself stores the few bytes it read and pipes it along (and not just throw out all the bytes read feed the whole file to the decompression program). ...
Thanks for educating me on that.

Edit: Corrected trace to include omitted rewind.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.


Last edited by John R. Graham on Thu Mar 16, 2017 2:17 am; edited 1 time in total
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Thu Mar 16, 2017 1:38 am    Post subject: Reply with quote

Ok, that really is kind of strange.

Indeed tar is doing something bad: it read a few bytes from /dev/nst0 and then deemed it to be a bzip2 file. Then it fed bzip2 < /dev/nst0 - but did not rewind it, and bzip2 can't rewind - so it got the wrong magic number. It then says it's not a bzip2 file, tar doesn't like that response and kills the pipe, bzip2 SIGPIPEs (signal 13) and we're dead.

But in your case you ran tar twice, first time failed which should have picked up something from the stream and advanced the pointer, but the second time succeeded...which means it got the whole file. Unless the rewind was omitted from the trace, somehow there was an implicit rewind from somewhere, or perhaps the kernel automatically pointed back to the beginning of the file? Weird... This doesn't make sense, almost means that you can easily write tar to take advantage of this behavior to detect the compression.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Thu Mar 16, 2017 2:12 am    Post subject: Reply with quote

The second one isn't strange at all. I included the correct compression option on the command line. Oh, wait. Somehow, I did omit a rewind. I'll edit it in above.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Thu Mar 16, 2017 2:19 am    Post subject: Reply with quote

Ah... Now it makes sense, thanks for updating, and now it's as expected :)
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum