http-replicator's primary purpose is to locally cache and distribute source packages for portage within a LAN environment. This can be useful if you have limited external internet bandwidth from inside that LAN, but:
IMHO, it also just seems like good manners to minimize the load you place on Gentoo and other third party servers and bandwidth that you aren't paying for.
For this "manners" reason, all else being equal, I would generally recommend setting it up on any LAN that has more than one Gentoo machine on it, even if your LAN has excellent external bandwidth.
However, it is not a maintained package (somewhat surprising considering the third party argument above), there are a number of unresolved issues (new and old) with it, and (update 2020-02-26) it is currently in the process of being removed from the main portage tree (last rites).
----
Versions
There are (were) actually two main versions of http-replicator (3 and 4) in tree, but both are dead upstream. 4 is basically a complete rewrite of 3 with no shared code, and to be brief, 4 definitely deserves its "alpha" version status. For more information about the differences, see some of the comments attached to [bug=676758]bug 676758[/bug]
Although they both have some unique advantages, in balance I would usually recommend sticking with 3 (which feels much more like a well tested, stable release), instead of 4 (which while it looks promising, it also looks "not complete", nor thoroughly audited for security). I'm surprised version 4 was stabilized at all. If it were up to me, I would leave 4 masked until/unless someone puts the effort into fixing the obvious issues and auditing it for less obvious issues. (Again, see [bug=676758]bug 676758[/bug].)
Version 3 also has some protection against filename collisions (below) and a couple of trivial "help out newbies" customizations from [bug=442874]bug 442874[/bug] that have not been incorporated into version 4.
However, version 3 has been removed from the tree completely for a second time:
https://gitweb.gentoo.org/repo/gentoo.g ... 7f3a062f0b
----
Flat Directory Collisions
The recommendation elog/printed by http-replicator 3's ebuild recently became outdated, as of [bug=174612]bug 174612[/bug]. Http-replicator 3 itself is fine, but the recommended setting is now outdated. In detail:
Old: Since [bug=442874]bug 442874[/bug], the http-replicator-3 ebuild recommends customizing every client machine's FETCHCOMMAND so that if any package's ebuild specifies a customized local name for a file (to disambiguate files in the the flat distfiles directory), then http-replicator can use the same customized name in its cache. (i.e., it allows distinguishing http://example.com/packageName/VERSION1/package.tar.bz2 from http://example.com/packageName/VERSION2/package.tar.bz2 where only the URL prefix changes and the filename is identical, when stored in a flat directory structure.).
Code: Select all
FETCHCOMMAND="wget -t 3 -T 60 --passive-ftp -O \"\${DISTDIR}/\${FILE}\" --header=\"X-unique-cache-name: \${FILE}\" \"\${URI}\""- If the "X-unique-cache-name" is specified, then the http-replicator's cache now uses the ".__download__" suffix on the filename, since that is what it was told to use...
- Files can end up downloaded and cached twice, due to the new name.
- The "repcacheman" tool doesn't handle ".__download__" in a sensible fashion at all. I think it erases all the currently-used ".__download__" files, and leaves a lot non-".__download__" files present even though they wont be used by clients with the newer portage...
- If you haven't customized FETCHCOMMAND, then http-replicator still works like before, including occasional name collisions in its cache directory. (Without patches, version 4 does this regardless of the client's configured FETCHCOMMAND.)
- UPDATE 2019-10-13: The following doesn't quite work properly.
The "%.__download__" part of the variable expansion is supposed to strip out that suffix if present, but it only works from /bin/sh itself (not whatever portage is doing):
Code: Select all
FETCHCOMMAND="wget -t 3 -T 60 --passive-ftp -O \"\${DISTDIR}/\${FILE}\" --header=\"X-unique-cache-name: \${FILE%.__download__}\" \"\${URI}\""
The above only sort of works, by accident. It looks like the command is parsed with code in portage itself rather than a shell, the '%' character doesn't work as intended, and the entire surrounding "--header" option is turned into an empty string when passed into wget. Then wget gives a confusing warning about "http://" having no hostname, but otherwise works. I didn't initially notice this because it was hidden by the portage automatic background downloading feature, and I didn't test as thoroughly as I should have. - Current recommendation: The following may be overkill, but I now recommend using a script similar to the following:
(This is slightly updated from the version in [bug]442874[/bug], comment 23)
Code: Select all
#!/bin/sh # Copyright (C) 2015-2019 Matthew Ogilvie [mmogilvi+gnto / zoho dot com] # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # Usage: fetcher [-c] -O FILE URL # Wraps wget, and conditionally adds X-unique-cache-name header # to HTTP request only if potentially useful. # #/etc/portage/make.conf: # FETCHCOMMAND="/etc/portage/fetcher -O \"\${DISTDIR}/\${FILE}\" \"\${URI}\"" # RESUMECOMMAND="/etc/portage/fetcher -c -O \"\${DISTDIR}/\${FILE}\" \"\${URI}\"" f_args="-t 3 -T 60 --passive-ftp" f_file= f_url= f_help= while [ x"$1" != x"" ] ; do case "$1" in -O ) f_file="$2" shift ;; -c ) f_args="$1 $f_args" ;; -* ) echo "Unknown option: $1" 1>&2 f_help=1 ;; * ) f_url="$1" esac shift done if [ x"$f_file" = x"" -o x"$f_url" = x"" ] ; then echo "Missing FILE and/or URL" 1>&2 f_help=1 fi if [ x"$f_help" != x"" ] ; then echo "Usage: $0 [-c] -O FILE URL" 1>&2 exit 1 fi extraHeader= if [ x"$http_proxy" != x"" ] ; then baseFile="$(basename "${f_file%.__download__}")" if [ x"$(basename "$f_url")" != x"$baseFile" ] ; then extraHeader="X-unique-cache-name: $baseFile" fi fi if [ x"$extraHeader" != x"" ] ; then echo "Adding '$extraHeader' to request" 1>&2 exec wget $f_args -O "$f_file" --header="$extraHeader" "$f_url" else exec wget $f_args -O "$f_file" "$f_url" fiCode: Select all
FETCHCOMMAND="/etc/portage/fetcher -O \"\${DISTDIR}/\${FILE}\" \"\${URI}\"" RESUMECOMMAND="/etc/portage/fetcher -c -O \"\${DISTDIR}/\${FILE}\" \"\${URI}\"" - Any user that has followed the previous recommendation should change their FETCHCOMMAND based on the new recommendation. Not sure how to help them accomplish this. Is it worth a news item for anyone with http-replicator-3 installed? Should the ebuild actively look for misconfigurations and strongly warn about them (but detecting that on the server doesn't help clients)?
- Perhaps http-replicator itself (and maybe repcacheman) could strip the suffix. This requires less user involvement, but feels like an ugly hack.
- Alternatively, perhaps portage should define additional variable(s), in case the suffix changes to include a unique string like a PID or something in the future, the FETCHCOMMAND can still have access to the final "flat distfiles" filename.
- Perhaps portage should change the default FETCHCOMMAND, so it would not need to be manually customized by users. See [bug]442874[/bug], comment 17 for an untested patch to change portage's default FETCHCOMMAND.
Python 2, repcacheman, and last rites:
UPDATE 2020-02026:
Recently for [bug]705606[/bug], http-replicator has been scheduled for complete removal (last rites), mostly because of its unmaintained status and its dependency on python 2.
I've been running a copy of the http-replicator 3 ebuild that I saved to my personal local overlay some months ago. I suspect that http-replicator itself (probably both versions) will continue to work as long as python 2 is available at all, and I'm currently planning to continue using it as long as it works.
But the associated repcacheman utility is likely to stop working sooner: It looks like repcacheman hooks into portage python APIs, and I suspect that support for running portage under python 2 is likely to be dropped much sooner than the python 2 interpreter ebuild is dropped. Repcacheman's purpose is to setup/update the cache directory, and remove distfiles that are duplicates of any files that can be quickly re-downloaded from http-replicator's cache. (eclean from app-portage/gentoolkit is also useful, for a related but different purpose described below.)
Repcacheman's source is only 200 lines. See /usr/portage/net-proxy/http-replicator/files/http-replicator-3.0-repcacheman-0.44-r2, which is shared by both http-replicator 3 and 4. I suspect getting it to work under python 3 would probably be almost trivial for someone familiar with portage python APIs (and python itself). But I don't have that familiarity, and am having trouble summoning enough personal interest to actually dig into it.
In addition to using repcacheman, I also occasionally run something like "( export DISTDIR=/var/cache/http-replicator/ ; eclean -i distfiles )". This should work as long as http-replicator itself is working, although it doesn't cleanup duplicate files that could easily be re-downloaded from http-replicator's cache.
Without repcacheman, http-replicator would be functionally fairly similar to using other proxy tools for caching distfiles. I don't know which of those other tools would be best (nice, simple, lightweight, and focused on sharing flat distfiles), as an alternative. There is some discussion about alternatives over in https://forums.gentoo.org/viewtopic-t-1 ... cator.html
Some of the open bugs for http-replicator are primarily concerned with repcacheman. For example, see [bug]504538[/bug].
----
Also note:
- http-replicator-4 does not support X-unique-cache-name, although [bug]442874[/bug], comment 18 contains an untested patch for version 4. See the considerations (including some security considerations) mentioned there and in [bug]524208[/bug].
- Last rites and alternatives are being discussed in https://forums.gentoo.org/viewtopic-t-1 ... cator.html
So ultimately, what is the future of http-replicator? Is there another simple/lightweight proxy server that can do the same thing? Is it just not worth it?
----
History of edits:
- 2020-02-26: Add section about python 2, repcacheman, and last rites. Other scattered small edits.
- 2019-10-13: Fixup recommended FETCHCOMMAND for version 3 so it actually works.

