Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Portage & Programming
  • Search

status and future(?) of http-replicator

Problems with emerge or ebuilds? Have a basic programming question about C, PHP, Perl, BASH or something else?
Post Reply
Advanced search
3 posts • Page 1 of 1
Author
Message
mmogilvi
n00b
n00b
Posts: 64
Joined: Fri May 13, 2011 3:13 am

status and future(?) of http-replicator

  • Quote

Post by mmogilvi » Fri Sep 13, 2019 11:44 pm

Overview:

http-replicator's primary purpose is to locally cache and distribute source packages for portage within a LAN environment. This can be useful if you have limited external internet bandwidth from inside that LAN, but:

IMHO, it also just seems like good manners to minimize the load you place on Gentoo and other third party servers and bandwidth that you aren't paying for.

For this "manners" reason, all else being equal, I would generally recommend setting it up on any LAN that has more than one Gentoo machine on it, even if your LAN has excellent external bandwidth.

However, it is not a maintained package (somewhat surprising considering the third party argument above), there are a number of unresolved issues (new and old) with it, and (update 2020-02-26) it is currently in the process of being removed from the main portage tree (last rites).

----
Versions

There are (were) actually two main versions of http-replicator (3 and 4) in tree, but both are dead upstream. 4 is basically a complete rewrite of 3 with no shared code, and to be brief, 4 definitely deserves its "alpha" version status. For more information about the differences, see some of the comments attached to [bug=676758]bug 676758[/bug]

Although they both have some unique advantages, in balance I would usually recommend sticking with 3 (which feels much more like a well tested, stable release), instead of 4 (which while it looks promising, it also looks "not complete", nor thoroughly audited for security). I'm surprised version 4 was stabilized at all. If it were up to me, I would leave 4 masked until/unless someone puts the effort into fixing the obvious issues and auditing it for less obvious issues. (Again, see [bug=676758]bug 676758[/bug].)

Version 3 also has some protection against filename collisions (below) and a couple of trivial "help out newbies" customizations from [bug=442874]bug 442874[/bug] that have not been incorporated into version 4.

However, version 3 has been removed from the tree completely for a second time:
https://gitweb.gentoo.org/repo/gentoo.g ... 7f3a062f0b

----
Flat Directory Collisions

The recommendation elog/printed by http-replicator 3's ebuild recently became outdated, as of [bug=174612]bug 174612[/bug]. Http-replicator 3 itself is fine, but the recommended setting is now outdated. In detail:

Old: Since [bug=442874]bug 442874[/bug], the http-replicator-3 ebuild recommends customizing every client machine's FETCHCOMMAND so that if any package's ebuild specifies a customized local name for a file (to disambiguate files in the the flat distfiles directory), then http-replicator can use the same customized name in its cache. (i.e., it allows distinguishing http://example.com/packageName/VERSION1/package.tar.bz2 from http://example.com/packageName/VERSION2/package.tar.bz2 where only the URL prefix changes and the filename is identical, when stored in a flat directory structure.).

Code: Select all

FETCHCOMMAND="wget -t 3 -T 60 --passive-ftp -O \"\${DISTDIR}/\${FILE}\" --header=\"X-unique-cache-name: \${FILE}\" \"\${URI}\""
New: Recently [bug=174612]bug 174612[/bug] included a change in how the FILE variable is defined when invoking FETCHCOMMAND. It appends ".__download__" to the name when downloading, and then portage renames it only if the download was successful. This is to make the downloads atomic (which is definitely a useful feature), but it does not interact well with http-replicator and the previously-recommended FETCHCOMMAND:
  • If the "X-unique-cache-name" is specified, then the http-replicator's cache now uses the ".__download__" suffix on the filename, since that is what it was told to use...
  • Files can end up downloaded and cached twice, due to the new name.
  • The "repcacheman" tool doesn't handle ".__download__" in a sensible fashion at all. I think it erases all the currently-used ".__download__" files, and leaves a lot non-".__download__" files present even though they wont be used by clients with the newer portage...
Some possible fix options (or parts of fixes):
  • If you haven't customized FETCHCOMMAND, then http-replicator still works like before, including occasional name collisions in its cache directory. (Without patches, version 4 does this regardless of the client's configured FETCHCOMMAND.)
  • UPDATE 2019-10-13: The following doesn't quite work properly.

    Code: Select all

    FETCHCOMMAND="wget -t 3 -T 60 --passive-ftp -O \"\${DISTDIR}/\${FILE}\" --header=\"X-unique-cache-name: \${FILE%.__download__}\" \"\${URI}\""
    The "%.__download__" part of the variable expansion is supposed to strip out that suffix if present, but it only works from /bin/sh itself (not whatever portage is doing):

    The above only sort of works, by accident. It looks like the command is parsed with code in portage itself rather than a shell, the '%' character doesn't work as intended, and the entire surrounding "--header" option is turned into an empty string when passed into wget. Then wget gives a confusing warning about "http://" having no hostname, but otherwise works. I didn't initially notice this because it was hidden by the portage automatic background downloading feature, and I didn't test as thoroughly as I should have.
  • Current recommendation: The following may be overkill, but I now recommend using a script similar to the following:

    Code: Select all

    #!/bin/sh
    
    # Copyright (C) 2015-2019 Matthew Ogilvie  [mmogilvi+gnto / zoho dot com]
    #
    # This program is free software; you can redistribute it and/or modify
    # it under the terms of the GNU General Public License as published by
    # the Free Software Foundation; either version 2 of the License, or
    # (at your option) any later version.
    #
    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    # GNU General Public License for more details.
    #
    # You should have received a copy of the GNU General Public License
    # along with this program; if not, write to the Free Software
    # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
    
    # Usage: fetcher [-c] -O FILE URL
    #   Wraps wget, and conditionally adds X-unique-cache-name header
    #   to HTTP request only if potentially useful.
    #
    #/etc/portage/make.conf:
    # FETCHCOMMAND="/etc/portage/fetcher -O \"\${DISTDIR}/\${FILE}\" \"\${URI}\""
    # RESUMECOMMAND="/etc/portage/fetcher -c -O \"\${DISTDIR}/\${FILE}\" \"\${URI}\""
    
    f_args="-t 3 -T 60 --passive-ftp"
    f_file=
    f_url=
    f_help=
    
    while [ x"$1" != x"" ] ; do
      case "$1" in
        -O )
          f_file="$2"
          shift
        ;;
        -c )
          f_args="$1 $f_args"
        ;;
        -* )
          echo "Unknown option: $1" 1>&2
          f_help=1
        ;;
        * )
          f_url="$1"
      esac
      shift
    done
    
    if [ x"$f_file" = x"" -o x"$f_url" = x"" ] ; then
      echo "Missing FILE and/or URL" 1>&2
      f_help=1
    fi
    
    if [ x"$f_help" != x"" ] ; then
      echo "Usage: $0 [-c] -O FILE URL" 1>&2
      exit 1
    fi
    
    extraHeader=
    if [ x"$http_proxy" != x"" ] ; then
      baseFile="$(basename "${f_file%.__download__}")"
      if [ x"$(basename "$f_url")" != x"$baseFile" ] ; then
        extraHeader="X-unique-cache-name: $baseFile"
      fi
    fi
    
    if [ x"$extraHeader" != x"" ] ; then
      echo "Adding '$extraHeader' to request" 1>&2
      exec wget $f_args -O "$f_file" --header="$extraHeader" "$f_url"
    else
      exec wget $f_args -O "$f_file" "$f_url"
    fi
    (This is slightly updated from the version in [bug]442874[/bug], comment 23)

    Code: Select all

    FETCHCOMMAND="/etc/portage/fetcher -O \"\${DISTDIR}/\${FILE}\" \"\${URI}\""
    RESUMECOMMAND="/etc/portage/fetcher -c -O \"\${DISTDIR}/\${FILE}\" \"\${URI}\""
  • Any user that has followed the previous recommendation should change their FETCHCOMMAND based on the new recommendation. Not sure how to help them accomplish this. Is it worth a news item for anyone with http-replicator-3 installed? Should the ebuild actively look for misconfigurations and strongly warn about them (but detecting that on the server doesn't help clients)?
  • Perhaps http-replicator itself (and maybe repcacheman) could strip the suffix. This requires less user involvement, but feels like an ugly hack.
  • Alternatively, perhaps portage should define additional variable(s), in case the suffix changes to include a unique string like a PID or something in the future, the FETCHCOMMAND can still have access to the final "flat distfiles" filename.
  • Perhaps portage should change the default FETCHCOMMAND, so it would not need to be manually customized by users. See [bug]442874[/bug], comment 17 for an untested patch to change portage's default FETCHCOMMAND.
----
Python 2, repcacheman, and last rites:

UPDATE 2020-02026:

Recently for [bug]705606[/bug], http-replicator has been scheduled for complete removal (last rites), mostly because of its unmaintained status and its dependency on python 2.

I've been running a copy of the http-replicator 3 ebuild that I saved to my personal local overlay some months ago. I suspect that http-replicator itself (probably both versions) will continue to work as long as python 2 is available at all, and I'm currently planning to continue using it as long as it works.

But the associated repcacheman utility is likely to stop working sooner: It looks like repcacheman hooks into portage python APIs, and I suspect that support for running portage under python 2 is likely to be dropped much sooner than the python 2 interpreter ebuild is dropped. Repcacheman's purpose is to setup/update the cache directory, and remove distfiles that are duplicates of any files that can be quickly re-downloaded from http-replicator's cache. (eclean from app-portage/gentoolkit is also useful, for a related but different purpose described below.)

Repcacheman's source is only 200 lines. See /usr/portage/net-proxy/http-replicator/files/http-replicator-3.0-repcacheman-0.44-r2, which is shared by both http-replicator 3 and 4. I suspect getting it to work under python 3 would probably be almost trivial for someone familiar with portage python APIs (and python itself). But I don't have that familiarity, and am having trouble summoning enough personal interest to actually dig into it.

In addition to using repcacheman, I also occasionally run something like "( export DISTDIR=/var/cache/http-replicator/ ; eclean -i distfiles )". This should work as long as http-replicator itself is working, although it doesn't cleanup duplicate files that could easily be re-downloaded from http-replicator's cache.

Without repcacheman, http-replicator would be functionally fairly similar to using other proxy tools for caching distfiles. I don't know which of those other tools would be best (nice, simple, lightweight, and focused on sharing flat distfiles), as an alternative. There is some discussion about alternatives over in https://forums.gentoo.org/viewtopic-t-1 ... cator.html

Some of the open bugs for http-replicator are primarily concerned with repcacheman. For example, see [bug]504538[/bug].

----
Also note:
  • http-replicator-4 does not support X-unique-cache-name, although [bug]442874[/bug], comment 18 contains an untested patch for version 4. See the considerations (including some security considerations) mentioned there and in [bug]524208[/bug].
  • Last rites and alternatives are being discussed in https://forums.gentoo.org/viewtopic-t-1 ... cator.html
----

So ultimately, what is the future of http-replicator? Is there another simple/lightweight proxy server that can do the same thing? Is it just not worth it?

----
History of edits:
  • 2020-02-26: Add section about python 2, repcacheman, and last rites. Other scattered small edits.
  • 2019-10-13: Fixup recommended FETCHCOMMAND for version 3 so it actually works.
Last edited by mmogilvi on Thu Feb 27, 2020 6:25 am, edited 1 time in total.
Top
Thistled
Guru
Guru
User avatar
Posts: 572
Joined: Thu Jan 06, 2011 6:57 pm
Location: Scotland
Contact:
Contact Thistled
Website

  • Quote

Post by Thistled » Mon Feb 03, 2020 5:52 pm

Just want to say a big thanks for trying to deal with some of the problems with version 3 of http-replicator.
I know the devs are asking you to take on the proxy-maintenance of the package, and it's a big ask.

Is the problem I am seeing below:

Code: Select all

[ IDLE ] Mon Feb  3 16:58:02 2020
[ BUSY ] Mon Feb  3 16:58:02 2020
[ 0001 ] Mon Feb  3 16:58:02 2020
  0001   Accepted request from [192.168.1.2]:37812
  0001   Waiting at 16: RECV(4,16:58:17)
  0001   Client sends GET http://pig2:8080/distfiles/libgweather-3.32.2.tar.xz HTTP/1.1
  0001   Switching to HttpProtocol
  0001   Cache position: libgweather-3.32.2.tar.xz
  0001   Requesting address info for pig2:8080
  0001   Connecting to [127.0.0.1]:8080
  0001   Waiting at 36: SEND(5,16:58:17)
[ 0002 ] Mon Feb  3 16:58:02 2020
  0001   Waiting at 39: RECV(5,16:58:17)
  0002   Accepted request from [127.0.0.1]:49830
  0002   Waiting at 16: RECV(6,16:58:17)
  0002   Client sends GET /distfiles/libgweather-3.32.2.tar.xz HTTP/1.1
  0002   Error: invalid url: /distfiles/libgweather-3.32.2.tar.xz
  0001   Switching to ExceptionResponse
  0001   Traceback (most recent call last):
  0001     File "/usr/lib/python-exec/python2.7/http-replicator", line 40, in Replicator
  0001       protocol.recv( server )
  0001     File "/usr/lib/python2.7/site-packages/Protocol.py", line 149, in recv
  0001       assert chunk, 'server closed connection before sending a complete message header'
  0001   AssertionError: server closed connection before sending a complete message header
  0001   Waiting at 52: SEND(4,16:58:17)
  0001   Transaction successfully completed
[ IDLE ] Mon Feb  3 16:58:02 2020
because portage has changed the way it "does the business" these days, and that is why I am seeing this error?
Http-Replicator just refuses to send out source files even though they are there.
I am not a coder, but would I be right to suggest it's python code that is freaking out because /distfiles is being requested at the URL?
I'm lost.

[EDIT] To change to version 3 and not current incarnation
Whatever you do, do it properly!
Top
mmogilvi
n00b
n00b
Posts: 64
Joined: Fri May 13, 2011 3:13 am

  • Quote

Post by mmogilvi » Tue Feb 04, 2020 4:49 am

@Thistled: some thoughts below. The third bullet below may be the most useful for you, but the others help provide context.

1. Despite your edits about version 3, that log snippet looks like it is more likely to have come from version 4. Version 3 was removed from portage several months ago, although I've been keeping it around in my personal overlay only.

2. Proxy vs SRC_URI: It looks like the SRC_URI is somehow set to the proxy server (rather than letting the SRC_URI be a canonical URL, and the http_proxy setting should just redirect it through the proxy)? When I try to download dev-libs/libgweather, emerge prints the following to stdout:

Code: Select all

>>> Fetching (4 of 4) dev-libs/libgweather-3.32.2-r1::gentoo
>>> Downloading 'http://mirrors.lug.mtu.edu/gentoo/distfiles/f1/libgweather-3.32.2.tar.xz'
--2020-02-03 20:29:24--  http://mirrors.lug.mtu.edu/gentoo/distfiles/f1/libgweather-3.32.2.tar.xz
Connecting to 127.0.0.1:8082... connected.
Proxy request sent, awaiting response... 200 OK
Length: 2716144 (2.6M) [application/octet-stream]
Saving to: ‘/var/cache/distfiles/libgweather-3.32.2.tar.xz.__download__’

/var/cache/distfile 100%[===================>]   2.59M   754KB/s    in 3.5s    

2020-02-03 20:29:28 (754 KB/s) - ‘/var/cache/distfiles/libgweather-3.32.2.tar.xz.__download__’ saved [2716144/2716144]

 * libgweather-3.32.2.tar.xz BLAKE2B SHA512 size ;-) ...                 [ ok ]
Note mirrors.lug.mtu.edu for the URL, even though it connects to 127.0.0.1:8082 for the proxy (most of that section is printed by wget, not the proxy). Is your setup trying to use the same host for both the URL and the proxy, somehow? (I notice that the libgweather ebuild does not explicitly list a SRC_URI; presumably one of the inherited eclasses is defining it somehow.)

Perhaps you've tried to set GENTOO_MIRRORS to the same thing as http_proxy? Although full mirrors and partial proxies have some similarities, they aren't really the same thing.

3. It sounds like you downloaded libgweather-3.32.2.tar.xz manually, and installed it somewhere where you expect http-replicator to find it directly? Where did you put it? If it is in your main distfiles directory (traditionally /usr/portage/distfiles, although new installs (or adjusted old ones) now default to /var/cache/distfiles as above), portage should find it without downloading anything (neither proxied nor direct). If it is in /var/cache/http-replicator (or wherever /etc/conf.d/http-replicator points it), then http-replicator (either version) should find it regardless of the host and directory parts of the SRC_URI.

4. You haven't tried to remove the "-f" option when starting http-replicator, have you? In version 4, removing "-f" probably enables a security hole, and doesn't interact well with "repcacheman" nor does it avoid duplication if clients are trying to use different mirrors for the same files.
Top
Post Reply

3 posts • Page 1 of 1

Return to “Portage & Programming”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic