Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
wget not liking > 2GiB the first time around? [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Wed Jan 12, 2022 9:46 pm    Post subject: wget not liking > 2GiB the first time around? [SOLVED] Reply with quote

Code:
~/www$ ls -l threegig.img
-rw-r--r-- 1 me me 3221225472 Jan 12 14:17 threegig.img
~/www$ mkdir tempcrap
~/www$ cd tempcrap
~/www/tempcrap$ wget http://127.0.0.1/~me/threegig.img
--2022-01-12 14:28:19--  http://127.0.0.1/~me/threegig.img
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2147483647 (2.0G)
Saving to: 'threegig.img'

threegig.img        100%[===================>]   2.00G  57.9MB/s    in 58s     

2022-01-12 14:33:22 (35.2 MB/s) - 'threegig.img' saved [2147483647/2147483647]

~/www/tempcrap$ wget -c http://127.0.0.1/~me/threegig.img
--2022-01-12 14:34:33--  http://127.0.0.1/~me/threegig.img
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 3221225472 (3.0G), 1073741825 (1.0G) remaining
Saving to: 'threegig.img'

threegig.img        100%[+++++++++++++======>]   3.00G  40.3MB/s    in 17s     

2022-01-12 14:34:50 (60.6 MB/s) - 'threegig.img' saved [3221225472/3221225472]

Why is 1GB missing the first time around?

Though this is a semi-contrived case, but the problem is real. And there is no X-Y problem here, nor is this illicit traffic: Geofabrik does not offer bittorrent or rsync when downloading OSM extract data...

Also as another strange observation, if at least 1 byte is already downloaded and one continues, it will detect the full file size... Weird.

-----------------

SOLVED: Do NOT use wget-1.12.1, it has a regression on 32-bit. Upgrade to 1.12.2.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?


Last edited by eccerr0r on Thu Jan 13, 2022 10:06 am; edited 2 times in total
Back to top
View user's profile Send private message
pingtoo
l33t
l33t


Joined: 10 Sep 2021
Posts: 926
Location: Richmond Hill, Canada

PostPosted: Wed Jan 12, 2022 10:04 pm    Post subject: Re: wget not liking > 2GiB the first time around? Reply with quote

I think if the site/webserver decide not to give out the whole file at once than not much wget can do to make it happen :D

eccerr0r wrote:
Code:
~/www$ ls -l threegig.img
-rw-r--r-- 1 me me 3221225472 Jan 12 14:17 threegig.img
~/www$ mkdir tempcrap
~/www$ cd tempcrap
~/www/tempcrap$ wget http://127.0.0.1/~me/threegig.img
--2022-01-12 14:28:19--  http://127.0.0.1/~me/threegig.img
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2147483647 (2.0G)   <--------------------------------------------- site/webserver decision
Saving to: 'threegig.img'

threegig.img        100%[===================>]   2.00G  57.9MB/s    in 58s     

2022-01-12 14:33:22 (35.2 MB/s) - 'threegig.img' saved [2147483647/2147483647]

~/www/tempcrap$ wget -c http://127.0.0.1/~me/threegig.img
--2022-01-12 14:34:33--  http://127.0.0.1/~me/threegig.img
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 3221225472 (3.0G), 1073741825 (1.0G) remaining <------------------- site/webserver decision
Saving to: 'threegig.img'

threegig.img        100%[+++++++++++++======>]   3.00G  40.3MB/s    in 17s     

2022-01-12 14:34:50 (60.6 MB/s) - 'threegig.img' saved [3221225472/3221225472]

Why is 1GB missing the first time around?

Though this is a semi-contrived case, but the problem is real. And there is no X-Y problem here, nor is this illicit traffic: Geofabrik does not offer bittorrent or rsync when downloading OSM extract data...

Also as another strange observation, if at least 1 byte is already downloaded and one continues, it will detect the full file size... Weird.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Wed Jan 12, 2022 10:57 pm    Post subject: Reply with quote

Well, it did cough up the whole file the second time around...

Also firefox slurps down the whole file in one go to the same server(s) - both geofabrik and my Apache server.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
pingtoo
l33t
l33t


Joined: 10 Sep 2021
Posts: 926
Location: Richmond Hill, Canada

PostPosted: Wed Jan 12, 2022 11:17 pm    Post subject: Reply with quote

eccerr0r wrote:
Well, it did cough up the whole file the second time around...

Also firefox slurps down the whole file in one go to the same server(s) - both geofabrik and my Apache server.


Your second round use -c option which tell site/webserver continue from where it left.
man wget wrote:
Code:

       -c
       --continue
           Continue getting a partially-downloaded file.  This is useful when
           you want to finish up a download started by a previous instance of
           Wget, or by another program.  For instance:

                   wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z

           If there is a file named ls-lR.Z in the current directory, Wget
           will assume that it is the first portion of the remote file, and
           will ask the server to continue the retrieval from an offset equal
           to the length of the local file.

           Note that you don't need to specify this option if you just want
           the current invocation of Wget to retry downloading a file should
           the connection be lost midway through.  This is the default
           behavior.  -c only affects resumption of downloads started prior to
           this invocation of Wget, and whose local files are still sitting
           around.

           Without -c, the previous example would just download the remote
           file to ls-lR.Z.1, leaving the truncated ls-lR.Z file alone.

           If you use -c on a non-empty file, and the server does not support
           continued downloading, Wget will restart the download from scratch
           and overwrite the existing file entirely.

           Beginning with Wget 1.7, if you use -c on a file which is of equal
           size as the one on the server, Wget will refuse to download the
           file and print an explanatory message.  The same happens when the
           file is smaller on the server than locally (presumably because it
           was changed on the server since your last download
           attempt)---because "continuing" is not meaningful, no download
           occurs.

           On the other side of the coin, while using -c, any file that's
           bigger on the server than locally will be considered an incomplete
           download and only "(length(remote) - length(local))" bytes will be
           downloaded and tacked onto the end of the local file.  This
           behavior can be desirable in certain cases---for instance, you can
           use wget -c to download just the new portion that's been appended
           to a data collection or log file.

           However, if the file is bigger on the server because it's been
           changed, as opposed to just appended to, you'll end up with a
           garbled file.  Wget has no way of verifying that the local file is
           really a valid prefix of the remote file.  You need to be
           especially careful of this when using -c in conjunction with -r,
           since every file will be considered as an "incomplete download"
           candidate.

           Another instance where you'll get a garbled file if you try to use
           -c is if you have a lame HTTP proxy that inserts a "transfer
           interrupted" string into the local file.  In the future a
           "rollback" option may be added to deal with this case.

           Note that -c only works with FTP servers and with HTTP servers that
           support the "Range" header.


Maybe site/webserber more friendly to firefox? :-)
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Thu Jan 13, 2022 12:43 am    Post subject: Reply with quote

Well it appears so... now the question is... is Firefox right or is wget right?
They both can't be doing the right behavior....

Curl agrees with firefox. So net-misc/wget-1.21.1 has a bug, and it looks like it's only on 32-bit as 64-bit works.

--

looks like wget-1.21.2 has the bug fixed and is currently getting stabilized. Bug's been around for a while too, lol...wish I noticed it really was a bug, earlier, instead of put up with it, oh well.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum