Synopsis
The default behavior of Gentoos Portage system is very powerful for single installs but quickly becomes redundant when more than two machines are involved. It becomes especially acute when youre dealing with an install base of dozens or more. Using Emerge over the Internet for every machine consumes unnecessary bandwidth, overloads Gentoos mirrors and demands unwelcome access to the Internet for each of those machines.
A more efficient internal infrastructure would be centered on a single point of access to the Gentoo mirrors. This Portage gateway server would be responsible for retrieving updates to the Portage Tree and maintaining a central repository of Gentoo packages (distfiles). In smaller networks all internal machines would draw from the Portage gateway. In larger networks access could be cascaded to secondary and tertiary servers to distribute load or handle complex network structures.
While some additional admin work is required such as maintaining the subset of Gentoo packages for downstream clients. There are additional benefits to the obvious bandwidth savings and relief for the Gentoo mirrors.
The Gentoo gateway admin has the ability to control the available Gentoo packages (distfiles) inside the network. This would ensure that beta packages do not creep into production machines. Alternatively two secondary servers could stem from the gateway. One would be masked and used by production machines the other unmasked and used by the development and test machines.
Getting a complete backup of the Gentoo packages installed throughout the network would be as simple as copying the ../distfiles/ directory on the Portage gateway server. This beats running to each machine trying to capture every package that has been downloaded.
Only the Portage gateway server needs access to the Internet. This has many implications beyond the scope of this HowTo but the security benefits are obvious. The Portage gateway server can of course itself be behind a firewall.
Before We Begin
Your Portage gateway server is not meant to be a complete mirror of the 4500+ Gentoo packages available. While over time you will accumulate a comprehensive repository of source packages and their dependencies there is no point in putting load on the mirrors for source you never intend to compile. Someone somewhere has to pay for the hosting of our Gentoo community. Lets not abuse this free service by hoarding data that will likely become obsolete before anyone uses it.
This applies to syncing your Portage Tree as well. You might have noticed Gentoos HowTo on setting up a RSYNC mirror. We will be using the same software for our gateway but you must ignore Gentoos recommendation to sync every 30 minutes. Gentoos policy to sync every 30 minutes is meant for public servers ONLY. There is no harm in automating the sync process for your internal machines and setting the frequency to whatever you like. However, leave the Portage gateways emerge sync as a manual process and update only as needed. You should only require a sync to get new software, correct a bug or patch a security hole.
Scenarios
Setting up a Portage gateway and its subsequent infrastructure is not very difficult. In fact there are several distinct approaches. Also, the techniques from one setup can be mixed with another. Several possibilities are detailed below but only number three is expanded because it accomplishes everything outlined in the synopsis.
-- Setup #1: Proxy Cache --
If your internal Gentoo machines (Gentoo clients) are behind a proxy firewall they can take advantage of the caching feature built into most proxy servers. The proxy firewall reduces bandwidth usage by caching the results of recent http or ftp requests including Gentoo packages (distfiles). Therefore a recent emerge would have cached any required Gentoo packages on the proxy server. Doing another emerge within a reasonable amount of time would download the cached copy of the package rather than going to a Gentoo mirror. The problem with most proxy caches is that there is an expiry time on cashed content. This means that if you dont emerge soon enough Portage has to get yet another duplicate copy from the Gentoo mirrors. If you have access to your proxy server and can set the expiry times for cached content this is a quick way to setup a pseudo Portage gateway. This setup is best for small networks.
-- Setup #2: NFS or Samba network shares --
Using a network file share for the Portage tree and distfiles is a great solution for small workgroups and college dorms. All it requires is editing the make.conf file and directing each machine to use the share for its Portage tree and Gentoo packages. This offers good flexibility because everyone can update the Portage tree and/or add to the package repository as needed. However, cascading to multiple machines is difficult using shares. Originally I had stated in error that the Gentoo bootable CD-ROM did not support NFS and Samba shares. Revised: [contributed by GTVincent] When started from the x86 1.4_RC4 CDRom, it is possible to start nfsmount and mount /mnt/gentoo/usr/portage/distfiles from another computer after untar-ing a stage-file and creating the /usr/portage/distfiles directory, but before chroot-ing to /mnt/gentoo.
-- Setup #3: Rsync for both the Portage tree and Gentoo packages (distfiles) ---
This setup aims to accomplish everything laid out in the discussion above. As you will see it provides flexibility and control in larger environments.
Portage uses two methods to keep an updated Portage tree and retrieve current Gentoo packages (distfiles). Rsync (rsync) is used for the tree and wget is used for the packages. Im not aware of the motivations behind these protocol choices but they work very well.
We will continue to use rsync for updating the Portage tree and set your client machines to draw from your designated Portage gateway. This will be accomplished using Gentoos rsync daemon on the gateway. We will drop wget for Gentoo package retrieval (distfiles) and instead instruct the clients emerge to use rsync with the Portage gateway. The bonus is that rsync is already in place, its fast and its configurable.
Procedures
OK, lets get into the nuts and bolts of the setup procedure. Im assuming that you have two or more Gentoo machines and that one of them is built and connected to the Internet (this will be the Portage gateway). Remember that we want to accomplish two things: 1) have your clients (internal machines) update their Portage tree from your Portage gateway, 2) have all your clients (internal machines) download the necessary Gentoo packages (distfiles) from the Portage gateway when they do an emerge.
1.0 Portage gateway setup
To serve the Portage tree and distfiles you need to be running the Gentoo rsync daemon. Gentoo conveniently provides this software in a Portage package (naturally). Do an emerge of the following package and wait for it to compile.
Code listing 1.1
Code: Select all
#emerge app-admin/gentoo-rsync-mirrorCode listing 1.2
Code: Select all
#nano /etc/rsync/rsyncd.confFile Listing 1.1
Code: Select all
#uid = nobody
#gid = nobody
use chroot = no
max connections = 10
pid file = /var/run/rsyncd.pid
motd file = /etc/rsync/rsyncd.motd
transfer logging = yes
log format = %t %a %m %f %b
syslog facility = local3
timeout = 300
#hosts allow = <your list>
[gentoo-portage]
#For replicating the Portage tree to internal clients
path = /usr/portage
comment = Gentoo Linux Portage tree mirror
exclude = distfiles
[gentoo-packages]
#For distributing Portage packages (distfiles) to internal clients
path = /usr/portage/distfiles
comment = Gentoo Linux Packages mirror
#[gentoo-x86-portage]
#This entry is for backward compatibility and is generally no longer required.
#path = /usr/portage
#comment = Old Gentoo Linux Portage treeThe first line of interest is motd file = /etc/rsync/rsyncd.motd. It points to the message of the day that will be displayed every time rsync delivers files. The rsync.motd file is just text so put anything you want in it (server name, IP, admin contact, etc). As always you can just edit it with nano.
I put the #hosts allow = <your list> line in to illustrate the various security settings you can tweak. This option allows you to specify a range of addresses that are allowed to rsync with this machine. If a requesting machine isnt in this range the request is denied. Check the man pages (#man rsyncd.conf) for more discussion about rsync security options.
If a default rsyncd.conf file was created when you emerged than you would have noticed two blocks of options at the bottom of the file. These are rsync modules. They specify what directories to share and where they are located on the locale machine. The sample file above has commented out one default module and added one new module.
The [gentoo-portage] module is responsible for sharing the Portage tree. It is important that the path is properly configured to reference the location of the local Portage tree. Just as important is the exclude property. If the distfiles directory is not excluded than every time an internal machine syncs with the gateway the Gentoo packages will go along for the ride. This is not desirable in most circumstances.
The [gentoo-packages] module is responsible for sharing the Gentoo packages (distfiles). This module is not specified in the default rsyncd.conf files so you will have to create it. It is important that the path is properly configured to reference the location of the local Gentoo packages (distfiles) and not the Portage tree.
The [gentoo-x86-portage] module is there for backward compatibility. Your need for this will depend on how current your install base is, for this setup Ive left it out.
Now that the rsync daemon is configured we can set it up to start when we boot the machine. You may want to adjust the runlevel to suits your needs.
Code listing 1.3:
Code: Select all
#rc-update add rsyncd defaultCode listing 1.4:
Code: Select all
#/etc/init.d/rsyncd startYour Portage gateway is ready to go!
2.0 Internal Gentoo Machine Setup (client setup)
Note: Im assuming that you can see your Portage gateway (you should be able to ping it). Ideally you should have DNS setup properly in your /etc/resolv.conf file and a DNS server on your network. The Gentoo install guide details this.
Ok, so youve just booted your client machine from your Gentoo cd-rom, created your partitions, extracted stage 1, chrooted, etc, etc, and you need to emerge sync for the first time. If you built your Portage gateway from Stage 1 than all (or most) of the Gentoo packages you need should be on that machine. Lets go get them.
All you have to do is add two lines to your /etc/make.conf file but before we do that we must prepare a couple variables. Find and uncomment the following lines in your /etc/make.conf file:
File listing 2.1
Code: Select all
PORTDIR=/usr/portage
DISTDIR=${PORTDIR}/distfilesOk, now lets tell your machine where to get the Portage tree. You can put the following line anywhere in the /etc/make.conf file but grouping it with the other rsync options is ideal.
File listing 2.2
Code: Select all
SYNC=rsync://<your Portage gateways IP or DNS here>/gentoo-portage-- rsync:// instructs your machine to use the rsync protocol.
-- <your gateway address> If you have DNS working than put in the name of your server otherwise use its IP. Do NOT include the greater or less than signs (<>).
-- /gentoo-portage Recall that this is the name of the module you specified in the rsyncd.conf file on your Portage gateway. The module contains a path that points to the gateways local Portage tree.
Exiting the file and doing an emerge sync right now would result in a successfully updated Portage tree but lets finish the rest of configuration.
Now we will tell portage how to download files for an emerge process. It is VERY important that you get the syntax right here. You can put the following line anywhere in the /etc/make.conf file but grouping it with the other fetch commands is ideal. Note: Regardless of what your browser is displaying it should all be on one line.
File listing 2.3
Code: Select all
FETCHCOMMAND=rsync rsync://<your Portage gateways IP or DNS>/gentoo-packages/\${FILE} ${DISTDIR}-- The FETCHCOMMAND feature of Portage allows you to specify a wide variety of methods to retrieve Gentoo packages from your Portage gateway. Kudos to the Gentoo folks the flexibility is great!
-- rsync -v is telling Portage to use the rsync program to get the file. The -v is optional as are many other settings you could apply. Check the man pages (#man rsync) for more choices.
-- rsync:// This is telling rsync that you need to reach across the network using the rsync protocol
-- <your gateway address> If you have DNS working than put in the name of your server otherwise use its IP. Do NOT include the greater or less than signs (<>).
-- /gentoo-packages Recall that this is the name of the module you specified in the rsyncd.conf file on your Portage gateway. The module contains a path that points to the gateways local Gentoo package directory.
-- /\${FILE} This variable contains the file name emerge is trying to obtain. Note the forward slash and backslash combination this is important.
-- ${DISTDIR} This variable tells rsync were to put the files on the local (client) machine. There should be a space between it and the ${FILE} variable
-- Note the quotes around everything after the equals sign.
Save your file and exit.
At this point emerging any packages on your client machine will retrieve them from your Portage gateway.
Final Thoughts
What if my package is not on the Portage gateway? If one of your client machines requests a package that is not available on the Portage gateway obviously the emerge operation will fail. No problem, make a note of what package you want and logon to the Portage gateway. Perform emerge f <needed package name>. This will retrieve the package onto the gateway without compiling it. Now perform the emerge on the client machine again and all is good.
It may be helpful to have all of the USE flags added to the Portage gateway machine to ensure every dependency package is retrieved (can someone verify this). Emerge ufed, its a very handy tool for editing your USE flags. Tip: [contributed by Me] Putting "cvs" in your server make.conf "features" should enable all useflags, even when new ones get created.
On some occasions you may find that some files required by a clients emerge do not download when you perform the same emerge on the Portage gateway. I dont understand yet why this happens (maybe someone could enlighten me). Regardless, all you have to do is manually grab that file from the Internet using wget on the Portage gateway or download and copy the file to the Portage gateways distfiles directory. The file name should be listed in the failed emerges output.
Of course over time you will accumulate a selection of packages that is comprehensive and customized to your needs. If you are supporting fifty client machines you only have to download a needed package once onto the gateway and all of those clients can emerge it without going to the Internet.
If you have cascaded your Portage gateway to multiple servers you have very good redundancy. If your Portage gateway dies just upgrade one of the secondary servers to gateway status. Keep in mind though that an infrastructure built around segregating packages would not be suitable for this.
If the network share method is working well in your environment then just add the Gentoo rsync daemon to support stage 1 installs. This would give you flexibility and complete support of stages 1-3.
Thats it for this HowTo. I hope it relieves a few headaches and eases some bandwidth woes. A big thanks to the Gentoo people and the forum community!
-- Thank you to GTVincent, Me for their corrections and contributions.




