Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
HOWTO: Central Gentoo Mirror for your Internal Network
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3, 4, 5, 6  Next  
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
Grimthorn
n00b
n00b


Joined: 04 Jun 2003
Posts: 10

PostPosted: Sat Jun 07, 2003 11:04 pm    Post subject: HOWTO: Central Gentoo Mirror for your Internal Network Reply with quote

HOWTO: Central Gentoo Mirror for Internal Network (supports stages 123)

Synopsis
The default behavior of Gentoo’s Portage system is very powerful for single installs but quickly becomes redundant when more than two machines are involved. It becomes especially acute when you’re dealing with an install base of dozens or more. Using Emerge over the Internet for every machine consumes unnecessary bandwidth, overloads Gentoo’s mirrors and demands unwelcome access to the Internet for each of those machines.

A more efficient internal infrastructure would be centered on a single point of access to the Gentoo mirrors. This Portage “gateway” server would be responsible for retrieving updates to the Portage Tree and maintaining a central repository of Gentoo packages (distfiles). In smaller networks all internal machines would draw from the Portage gateway. In larger networks access could be cascaded to secondary and tertiary servers to distribute load or handle complex network structures.

While some additional admin work is required such as maintaining the subset of Gentoo packages for downstream clients. There are additional benefits to the obvious bandwidth savings and relief for the Gentoo mirrors.

The Gentoo gateway admin has the ability to control the available Gentoo packages (distfiles) inside the network. This would ensure that beta packages do not creep into production machines. Alternatively two secondary servers could stem from the gateway. One would be masked and used by production machines the other unmasked and used by the development and test machines.

Getting a complete backup of the Gentoo packages installed throughout the network would be as simple as copying the “../distfiles/” directory on the Portage gateway server. This beats running to each machine trying to capture every package that has been downloaded.

Only the Portage gateway server needs access to the Internet. This has many implications beyond the scope of this HowTo but the security benefits are obvious. The Portage gateway server can of course itself be behind a firewall.

Before We Begin
Your Portage gateway server is not meant to be a complete mirror of the 4500+ Gentoo packages available. While over time you will accumulate a comprehensive repository of source packages and their dependencies there is no point in putting load on the mirrors for source you never intend to compile. Someone somewhere has to pay for the hosting of our Gentoo community. Let’s not abuse this “free” service by hoarding data that will likely become obsolete before anyone uses it.

This applies to syncing your Portage Tree as well. You might have noticed Gentoo’s HowTo on setting up a RSYNC mirror. We will be using the same software for our gateway but you must ignore Gentoo’s recommendation to sync every 30 minutes. Gentoo’s policy to sync every 30 minutes is meant for public servers ONLY. There is no harm in automating the sync process for your internal machines and setting the frequency to whatever you like. However, leave the Portage gateway’s emerge sync as a manual process and update only as needed. You should only require a sync to get new software, correct a bug or patch a security hole.

Scenarios
Setting up a Portage gateway and its subsequent infrastructure is not very difficult. In fact there are several distinct approaches. Also, the techniques from one setup can be mixed with another. Several possibilities are detailed below but only number three is expanded because it accomplishes everything outlined in the synopsis.

-- Setup #1: Proxy Cache --
If your internal Gentoo machines (Gentoo clients) are behind a proxy firewall they can take advantage of the caching feature built into most proxy servers. The proxy firewall reduces bandwidth usage by caching the results of recent http or ftp requests including Gentoo packages (distfiles). Therefore a recent emerge would have cached any required Gentoo packages on the proxy server. Doing another emerge within a reasonable amount of time would download the cached copy of the package rather than going to a Gentoo mirror. The problem with most proxy caches is that there is an expiry time on cashed content. This means that if you don’t emerge soon enough Portage has to get yet another duplicate copy from the Gentoo mirrors. If you have access to your proxy server and can set the expiry times for cached content this is a quick way to setup a pseudo Portage gateway. This setup is best for small networks.

-- Setup #2: NFS or Samba network shares --
Using a network file share for the Portage tree and distfiles is a great solution for small workgroups and college dorms. All it requires is editing the make.conf file and directing each machine to use the share for its Portage tree and Gentoo packages. This offers good flexibility because everyone can update the Portage tree and/or add to the package repository as needed. However, cascading to multiple machines is difficult using shares. Originally I had stated in error that the Gentoo bootable CD-ROM did not support NFS and Samba shares. Revised: [contributed by GTVincent] When started from the x86 1.4_RC4 CDRom, it is possible to start nfsmount and mount /mnt/gentoo/usr/portage/distfiles from another computer after untar-ing a stage-file and creating the /usr/portage/distfiles directory, but before chroot-ing to /mnt/gentoo.

-- Setup #3: Rsync for both the Portage tree and Gentoo packages (distfiles) ---
This setup aims to accomplish everything laid out in the discussion above. As you will see it provides flexibility and control in larger environments.

Portage uses two methods to keep an updated Portage tree and retrieve current Gentoo packages (distfiles). Rsync (rsync) is used for the tree and wget is used for the packages. I’m not aware of the motivations behind these protocol choices but they work very well.

We will continue to use rsync for updating the Portage tree and set your client machines to draw from your designated Portage gateway. This will be accomplished using Gentoo’s rsync daemon on the gateway. We will drop wget for Gentoo package retrieval (distfiles) and instead instruct the client’s emerge to use rsync with the Portage gateway. The bonus is that rsync is already in place, it’s fast and it’s configurable.

Procedures
OK, let’s get into the nuts and bolts of the setup procedure. I’m assuming that you have two or more Gentoo machines and that one of them is built and connected to the Internet (this will be the Portage gateway). Remember that we want to accomplish two things: 1) have your clients (internal machines) update their Portage tree from your Portage gateway, 2) have all your clients (internal machines) download the necessary Gentoo packages (distfiles) from the Portage gateway when they do an emerge.

1.0 Portage gateway setup

To serve the Portage tree and distfiles you need to be running the Gentoo rsync daemon. Gentoo conveniently provides this software in a Portage package (naturally). Do an emerge of the following package and wait for it to compile.

Code listing 1.1
Code:
#emerge app-admin/gentoo-rsync-mirror


Now let’s configure the rsync daemon (it’s not running yet). The rsyncd.conf file should have been created on compile but if it wasn’t create one yourself.

Code listing 1.2
Code:
#nano /etc/rsync/rsyncd.conf


Regardless of whether the file exists or not you will want it to look like this:

File Listing 1.1
Code:
#uid = nobody
#gid = nobody
use chroot = no
max connections = 10
pid file = /var/run/rsyncd.pid
motd file = /etc/rsync/rsyncd.motd
transfer logging = yes
log format = %t %a %m %f %b
syslog facility = local3
timeout = 300

#hosts allow = <your list>

[gentoo-portage]
#For replicating the Portage tree to internal clients
path = /usr/portage
comment = Gentoo Linux Portage tree mirror
exclude = distfiles

[gentoo-packages]
#For distributing Portage packages (distfiles) to internal clients
path = /usr/portage/distfiles
comment = Gentoo Linux Packages mirror

#[gentoo-x86-portage]
#This entry is for backward compatibility and is generally no longer required.
#path = /usr/portage
#comment = Old Gentoo Linux Portage tree


We’re only going to discuss the important parts but there are many more configuration options. Check the man pages for a great discussion of the rsync daemon (#man rsyncd.conf) or general rsync uses (#man rsync).

The first line of interest is motd file = /etc/rsync/rsyncd.motd. It points to the “message of the day” that will be displayed every time rsync delivers files. The rsync.motd file is just text so put anything you want in it (server name, IP, admin contact, etc). As always you can just edit it with nano.

I put the #hosts allow = <your list> line in to illustrate the various security settings you can tweak. This option allows you to specify a range of addresses that are allowed to rsync with this machine. If a requesting machine isn’t in this range the request is denied. Check the man pages (#man rsyncd.conf) for more discussion about rsync security options.

If a default rsyncd.conf file was created when you emerged than you would have noticed two blocks of options at the bottom of the file. These are rsync modules. They specify what directories to share and where they are located on the locale machine. The sample file above has commented out one default module and added one new module.

The [gentoo-portage] module is responsible for sharing the Portage tree. It is important that the path is properly configured to reference the location of the local Portage tree. Just as important is the exclude property. If the distfiles directory is not excluded than every time an internal machine syncs with the gateway the Gentoo packages will go along for the ride. This is not desirable in most circumstances.

The [gentoo-packages] module is responsible for sharing the Gentoo packages (distfiles). This module is not specified in the default rsyncd.conf files so you will have to create it. It is important that the path is properly configured to reference the location of the local Gentoo packages (distfiles) and not the Portage tree.

The [gentoo-x86-portage] module is there for backward compatibility. Your need for this will depend on how current your install base is, for this setup I’ve left it out.

Now that the rsync daemon is configured we can set it up to start when we boot the machine. You may want to adjust the runlevel to suits your needs.

Code listing 1.3:
Code:
#rc-update add rsyncd default


Finally let’s get the rsync daemon actually running.

Code listing 1.4:
Code:
#/etc/init.d/rsyncd start


IMPORTANT NOTE: Gentoo has changed the way the rsync daemon is started. You must edit the init script for rsync to work with Gentoo packages. For now this is detailed in a post below but I will update this HowTo ASAP.

Your Portage gateway is ready to go!

2.0 Internal Gentoo Machine Setup (client setup)

Note: I’m assuming that you can “see” your Portage gateway (you should be able to ping it). Ideally you should have DNS setup properly in your /etc/resolv.conf file and a DNS server on your network. The Gentoo install guide details this.

Ok, so you’ve just booted your client machine from your Gentoo cd-rom, created your partitions, extracted stage 1, chrooted, etc, etc, and you need to “emerge sync” for the first time. If you built your Portage gateway from Stage 1 than all (or most) of the Gentoo packages you need should be on that machine. Let’s go get them.

All you have to do is add two lines to your /etc/make.conf file but before we do that we must prepare a couple variables. Find and uncomment the following lines in your /etc/make.conf file:

File listing 2.1
Code:
PORTDIR=/usr/portage
DISTDIR=${PORTDIR}/distfiles


These variables pass important information to the settings listed below. Make sure their values are correct for your system. Each path must point to the corresponding location of your Portage tree and distfiles directories. Unless you’ve change the default behavior of your gentoo install the given values are valid.

Ok, now let’s tell your machine where to get the Portage tree. You can put the following line anywhere in the /etc/make.conf file but grouping it with the other rsync options is ideal.

File listing 2.2
Code:
SYNC=rsync://<your Portage gateway’s IP or DNS here>/gentoo-portage

-- The SYNC command overrides the default location Portage looks for the Portage tree.
-- rsync:// instructs your machine to use the rsync protocol.
-- <your gateway address> If you have DNS working than put in the name of your server otherwise use its IP. Do NOT include the greater or less than signs (<>).
-- /gentoo-portage Recall that this is the name of the module you specified in the rsyncd.conf file on your Portage gateway. The module contains a path that points to the gateway’s local Portage tree.

Exiting the file and doing an “emerge sync” right now would result in a successfully updated Portage tree but let’s finish the rest of configuration.

Now we will tell portage how to download files for an emerge process. It is VERY important that you get the syntax right here. You can put the following line anywhere in the /etc/make.conf file but grouping it with the other fetch commands is ideal. Note: Regardless of what your browser is displaying it should all be on one line.

File listing 2.3
Code:
FETCHCOMMAND=“rsync rsync://<your Portage gateway’s IP or DNS>/gentoo-packages/\${FILE} ${DISTDIR}”


Missing even one character in the line above would result in failed emerge process so let’s review:

-- The FETCHCOMMAND feature of Portage allows you to specify a wide variety of methods to retrieve Gentoo packages from your Portage gateway. Kudos to the Gentoo folks the flexibility is great!
-- rsync -v is telling Portage to use the rsync program to get the file. The “-v” is optional as are many other settings you could apply. Check the man pages (#man rsync) for more choices.
-- rsync:// This is telling rsync that you need to reach across the network using the rsync protocol
-- <your gateway address> If you have DNS working than put in the name of your server otherwise use its IP. Do NOT include the greater or less than signs (<>).
-- /gentoo-packages Recall that this is the name of the module you specified in the rsyncd.conf file on your Portage gateway. The module contains a path that points to the gateway’s local Gentoo package directory.
-- /\${FILE} This variable contains the file name emerge is trying to obtain. Note the forward slash and backslash combination this is important.
-- ${DISTDIR} This variable tells rsync were to put the files on the local (client) machine. There should be a space between it and the ${FILE} variable
-- Note the quotes around everything after the equals sign.

Save your file and exit.

At this point emerging any packages on your client machine will retrieve them from your Portage gateway.

Final Thoughts
What if my package is not on the Portage gateway? If one of your client machines requests a package that is not available on the Portage gateway obviously the emerge operation will fail. No problem, make a note of what package you want and logon to the Portage gateway. Perform “emerge –f <needed package name>”. This will retrieve the package onto the gateway without compiling it. Now perform the emerge on the client machine again and all is good.

It may be helpful to have all of the USE flags added to the Portage gateway machine to ensure every dependency package is retrieved (can someone verify this). Emerge ufed, it’s a very handy tool for editing your USE flags. Tip: [contributed by Me] Putting "cvs" in your server make.conf "features" should enable all useflags, even when new ones get created.

On some occasions you may find that some files required by a client’s emerge do not download when you perform the same emerge on the Portage gateway. I don’t understand yet why this happens (maybe someone could enlighten me). Regardless, all you have to do is manually grab that file from the Internet using wget on the Portage gateway or download and copy the file to the Portage gateway’s distfiles directory. The file name should be listed in the failed emerge’s output.

Of course over time you will accumulate a selection of packages that is comprehensive and customized to your needs. If you are supporting fifty client machines you only have to download a needed package once onto the gateway and all of those clients can emerge it without going to the Internet.

If you have cascaded your Portage gateway to multiple servers you have very good redundancy. If your Portage gateway dies just upgrade one of the secondary servers to gateway status. Keep in mind though that an infrastructure built around segregating packages would not be suitable for this.

If the network share method is working well in your environment then just add the Gentoo rsync daemon to support stage 1 installs. This would give you flexibility and complete support of stages 1-3.

That’s it for this HowTo. I hope it relieves a few headaches and eases some bandwidth woes. A big thanks to the Gentoo people and the forum community!

-- Thank you to GTVincent, Me for their corrections and contributions.


Last edited by Grimthorn on Thu Feb 19, 2004 3:47 pm; edited 4 times in total
Back to top
View user's profile Send private message
jimlynch11
Guru
Guru


Joined: 21 Feb 2003
Posts: 590
Location: massachusetts

PostPosted: Sat Jun 07, 2003 11:40 pm    Post subject: Reply with quote

im nominating this for the best 'first post' ever. nice work man, and welcome to the forums.

ill probably be using this in a couple of weeks when i finally convince my parents to get rid of 98 on their laptop, so thank you.
Back to top
View user's profile Send private message
GTVincent
Tux's lil' helper
Tux's lil' helper


Joined: 26 Oct 2002
Posts: 91
Location: Las Vegas, NV

PostPosted: Sun Jun 08, 2003 11:09 am    Post subject: Re: HOWTO: Central Gentoo Mirror for your Internal Network Reply with quote

Grimthorn wrote:

-- Setup #2: NFS or Samba network shares --
[..]
Network file systems such as NFS and Samba are not, by default, supported by the Gentoo bootable CD-ROM. The bootstrap and initial system emerge processes would require going to the Gentoo mirrors on the internet, hacking the boot ISO or copying the necessary Gentoo packages over on a CD_ROM.

[..]


While the entire HowTo is really very comprehensive, this point is not true. When started from the x86 1.4_RC4 CDRom, it is possible to start nfsmount and mount /mnt/gentoo/usr/portage/distfiles from another computer after untar-ing a stage-file and creating the /usr/portage/distfiles directory, but before chroot-ing to /mnt/gentoo.

Edit: removed the huge letters for your reading pleasure... Glad to be of assistance :wink:


Last edited by GTVincent on Tue Jun 10, 2003 12:47 am; edited 1 time in total
Back to top
View user's profile Send private message
Me
n00b
n00b


Joined: 12 Apr 2003
Posts: 71
Location: Earth

PostPosted: Sun Jun 08, 2003 4:38 pm    Post subject: Reply with quote

Putting "cvs" in your server make.conf "features" should enable all useflags, even when new ones get created.
Back to top
View user's profile Send private message
Grimthorn
n00b
n00b


Joined: 04 Jun 2003
Posts: 10

PostPosted: Mon Jun 09, 2003 1:20 pm    Post subject: Reply with quote

jimlynch11: Thank you! I hope it helps!

GTVincent: Very much appreciated I did not know this. I have corrected the HowTo.

Me: Great tip! I just applied it to our gateway. I've added it to the HowTo.

Thanks folks!

Take care,
Grim
Back to top
View user's profile Send private message
Koon
Retired Dev
Retired Dev


Joined: 10 Dec 2002
Posts: 518

PostPosted: Wed Jun 11, 2003 6:58 am    Post subject: Reply with quote

Nice work !
I want to discuss some of the drawbacks of the different solutions (correct me if I'm wrong).

Setup 0 : complete rsync mirror
* 90% of the things you download are useless because you won't ever use them.
* Load on the Gentoo servers

Setup 1 : proxy cache
* work only for packages, not for the tree (or is there a way to proxy/cache rsync ?)
* problems with expiry times of the packages

Setup 2 : network share of the /usr/portage tree
* it was not meant to do this so there is a potential simultaneous access conflict (two workstations doing emerge sync or downloading the same package at the same time)
* vulnerable setup (centralized)

Setup 3 : partial rsync gateway (your setup)
* maintenance problem : you can't easily feed the tree with new packages from the workstation, that is there is no way for the workstation to contribute to the tree. For new packages AND for every update of every package you have to do emerge -f on the gateway.
* emerge -p shows available packages which are in fact not available on the gateway. The emerge will fail later when the package will be fetched...
* you have to download every package that could be needed for specific USE flags (using something like the magic 'cvs' feature). That means for me Gnome user I have to get and update KDE packages to support the kde USE flag while I won't ever need them

If there are easy ways of avoiding the drawbacks in your solution, please let me know. I am still looking for the perfect solution for a local (enterprise) portage tree. When I will get your opinion on this, I may post another thread to discuss the specifications for a perfect enterprise-oriented portage gateway.

Question on the implementation chosen : advantages/drawbacks of using rsync instead of wget for the packages ? I think the problems you sometimes get in fetching the packages might come from rsync compatibility problems. You can easily set up an HTTP server to serve the packages using wget ? Or do I miss a point ?

Thank you for your patience !
-K
Back to top
View user's profile Send private message
hackertype
n00b
n00b


Joined: 03 Jun 2003
Posts: 32

PostPosted: Thu Jun 12, 2003 8:16 pm    Post subject: Reply with quote

I just want to point out that there is trouble right here in River City.

I followed the instuctions to set up the mirror on my local network. I can emerge the entire portage tree just fine, but when it comes to downloading tarballs from the distfiles folder via rsync the client hangs and then has a timeout.

A possible workaround (thanks spyderous) for this is too symlink your distfiles folder to htdocs. Then run some webserver (I'm emerging apache right now) to allow your clients to wget the tarballs.

On the client machines change GENTOO_MIRRORS to point to your local half-rsync-half-webserver-gentoo-mirror.
Back to top
View user's profile Send private message
Grimthorn
n00b
n00b


Joined: 04 Jun 2003
Posts: 10

PostPosted: Fri Jun 13, 2003 12:46 am    Post subject: Reply with quote

Koon,

Why rsync? Mostly for simplicity's sake. It allows for the construction of a Gentoo mirror with the least amount of knowledge, build time and configuration. However, as you point out this is at the expense of automation.

Just to clarify: One doesn't have to use the "cvs" feature. It's only recommended (maybe not even needed...someone confirm?) so that the gateway provides the highest level of availability. If no one on the internal network uses kde than modify the gateway's USE flags to reflect that. Unfortunately this increases administration yet again.

If you want a higher level of automation we'll have to throw out rsync and go back to httpd. You could direct your clients to a php script that automatically retrieves the package from the web if it's not available locally. I think there have been posts on this... I'll track down the thread, post it here, and add it to the HowTo as time permits.

hackertype,

Hmm, something's amiss. What package is causing the problem? Could you post the error? We're currently using this method and have only come across a few problems that were correctable. However, we're mostly servers so perhaps you've uncovered something new. Post some more info and I'll try to figure it out.

Grim
Back to top
View user's profile Send private message
Grimthorn
n00b
n00b


Joined: 04 Jun 2003
Posts: 10

PostPosted: Fri Jun 13, 2003 1:22 am    Post subject: Reply with quote

Koon,

The link I was referring to is here.

This would be nice to implement for some of our users. As we continue to roll out Gentoo I will probably put something like this together. I'll post the complete solution here as an add on to the HowTo but if you beat me too it that would be great! ;-)

Take care,
Grim
Back to top
View user's profile Send private message
hackertype
n00b
n00b


Joined: 03 Jun 2003
Posts: 32

PostPosted: Fri Jun 13, 2003 5:02 pm    Post subject: Reply with quote

Grimthorn wrote:
Post some more info and I'll try to figure it out.

Grim


Okay then. My rsync.conf file:
Code:

#uid = nobody
#gid = nobody
use chroot = no
max connections = 20
pid file = /var/run/rsyncd.pid
motd file = /etc/rsync/rsyncd.motd
transfer logging = yes
log format = %t %a %m %f %b
syslog facility = local3
timeout = 300

[gentoo-x86-portage]
#this entry is for compatibility
path = /opt/gentoo-rsync/portage
comment = Gentoo Linux Portage tree

[gentoo-portage]
#For replicating the Portage tree to internal clients
path = /usr/portage
comment = Gentoo Linux Portage tree mirror
exclude = distfiles

[gentoo-packages]
#For distributing Portage packages (distfiles) to internal clients
path = /usr/portage/distfiles
comment = Gentoo Linux Packages mirror


Rsyncing from gentoo-packages simply won't work, while rsyncing from gentoo-portage works fine.

Quote:

[root@www smwiki]# rsync -v rsync://10.0.0.30/gentoo-packages/zip23.tar.gz .
This is rsync[number].[country].gentoo.org.

zip23.tar.gz
write failed on zip23.tar.gz : Bad address
rsync error: error in file IO (code 11) at receiver.c(271)
rsync: connection unexpectedly closed (104 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(150)
[root@www smwiki]# rsync -v rsync://10.0.0.30/gentoo-portage/dev-tex/eurosym/eurosym-1.2.ebuild
This is rsync[number].[country].gentoo.org.

-rw-r--r-- 1150 2003/06/09 07:11:26 eurosym-1.2.ebuild
wrote 122 bytes read 118 bytes 480.00 bytes/sec
total size is 1150 speedup is 4.79


This happens with any package I choose. I can rsync ebuilds out of portage, yet I can't rsync tarballs out of distfiles.

The rsync server is version 2.5.6 and it is running on an imac.
Back to top
View user's profile Send private message
heijs
Apprentice
Apprentice


Joined: 12 Jun 2002
Posts: 174
Location: Groningen

PostPosted: Sat Jun 14, 2003 10:08 am    Post subject: Reply with quote

Same error here on an AMD Athlon with the same configuration...

I really don't understand the error and I followed the guide perfectly!
Back to top
View user's profile Send private message
zen_guerrilla
Guru
Guru


Joined: 18 Apr 2002
Posts: 343
Location: Greece

PostPosted: Sat Jun 14, 2003 7:41 pm    Post subject: Slighty OT :) Reply with quote

Koon wrote:
I am still looking for the perfect solution for a local (enterprise) portage tree. When I will get your opinion on this, I may post another thread to discuss the specifications for a perfect enterprise-oriented portage gateway.

Just some tips for implementing gentoo at lans from my experience...
Let's say that you have a local lan of ~20 boxes and u want to install gentoo with the same setup on all the boxes.
1. Share /usr/portage via nfs from 1 box, use autofs to mount that share when needed from the other boxes.
2. Have "emerge sync" invoked from crontab on the 'server' box once a day & that way have all boxes syncronized at once.
3. Use packages (emerge -b/-k, man make.conf/emerge) to reduce compile times. If your boxes have different arch consider compiling stuff with -march=i686. If you want to use packages for different arch's, set PKGDIR accordingly to eg. /usr/portage/packages-athlon or packages-pentium3.
4. Use distcc to reduce compile times even more.
Back to top
View user's profile Send private message
Thorbjorn
n00b
n00b


Joined: 16 Jan 2003
Posts: 23

PostPosted: Mon Jun 16, 2003 2:32 am    Post subject: Re: Slighty OT :) Reply with quote

zen_guerrilla wrote:
1. Share /usr/portage via nfs from 1 box, use autofs to mount that share when needed from the other boxes.


nfs assuming you can hit nfs from client to server .. rsync over ssh for a more secure and distributed loacal system..


On another note. I was actually doing this very thing, but i Am working on transparent proxy/cache for the package tarballs. Which will then solve the issue of use flags and having to manually emerge -f on the server. Under this scheme your clients will use htpp://yourserver/gentoo-cache/ for the FETCHCOMMAND.. This will in turn proxy the connect to whatever server for the package and then cache the downloaded tarballs.

under this situation if you have alot of machines doing a global update you should hit with 90-100% cache accuracy ( depending on the package and USE flags) and obtain maybe a better reduction of bandwidth on the external link because your only fetching what is needed.. not everything you _might_ need ( under features=cvs ). Anyhow im doin the proxy cache setup right now, and I will post here with my configs and all that.

Great howto By the way!.
_________________
An intellectual is someone whose mind watches itself.
- Albert Camus
Back to top
View user's profile Send private message
Thorbjorn
n00b
n00b


Joined: 16 Jan 2003
Posts: 23

PostPosted: Mon Jun 16, 2003 3:49 am    Post subject: Reply with quote

here is my mod_proxy with caching support.

tis is the first time i setup mod_proxy in apache, but it is working fine for me. I need to tweek with the cache settings a bit maybee but I would exspect you all would need to. One more thing i want to do is get the ftp proxy working. My initial attempots at gettin that goin were not succsefful. I get a segfault on child processes when trying to proxy ftp.. I am not sure if this is due to the way the mod_proxy was compiled or not, and not alot of info on google about it. Anyone know whats goin on there ?

My approach was to setup an internal caching proxy on a virtual host as not to interupt anything else runnign on this box. I assum you know how to uncoomment the LoadModule directives in your apache configs ( mod_proxy is a std module) and I assume you are using apache 1.3.x ( for 2.0 your config would be a bit different). I also assume you know how to setup a virt host, but you can slap this in a location directive if you want to.

Here is my apache virtual host:
Code:

<VirtualHost  yourwhatever >
     # we dont want to proxy just do ProxyPass This is a security measure:
    ProxyRequests Off
     
    # where the cache files live ( apache needs to be albe to write here )
    CacheRoot "/somedir/cache/httpd"
    CacheSize 102400
    CacheGcInterval 8
    CacheMaxExpire 168
    CacheLastModifiedFactor 0.1
    CacheDefaultExpire 72

     
     #setup the site we want to proxy
    ProxyPass /gentoo/ http://csociety-ftp.ecn.purdue.edu/pub/gentoo/

     # setup some restrictions on who can connect
    <Directory proxy:*>
        Order deny,allow
        Deny from all
        Allow from  <your domain>, <yourip>
    </Directory>
</VirtualHost>


and Viola any request to yourvirt/gentoo/ is transparently proxied, and all downloads from said URL are cached on your local server...

Each of the seven Cache directives, controls how the server handles caching. Setting the CacheRoot enables caching on the server. This directory must be writable by the user running the server (usually "nobody"). The CacheSize sets the desired space usage in kilobytes. You will probably want to set this higher than the default of 5, based on your available disk space, to allow the greatest number of documents to be stored locally, thus allowing local cache access by the clients. Garbage collection, which enforces the cache size, is set in hours by the CacheGcInterval. If unspecified, the cache size will grow until disk space runs out. CacheMaxExpire specifies the maximum number of hours for which cached documents will be retained without checking the host server. If the origin server for a document did not send an expiry date, then the CacheLastModifiedFactor will be used to estimate one by multiplying the factor by the time the document was last modified. If the protocol used to retrieve a document does not support expiry times (FTP, for example), the CacheDefaultExpire directive specifies the number of hours until it expires.


now just edit your make.conf to point to yourserver like so:

Code:

  GENTOO_MIRRORS="http://myproxy/gentoo"


and all is good. sit back and enjoy not sucking up all your bandwidth building new gcc on emerge -u world on the 12 cluster nodes in your garage.. er.. or whatever you may have ;)

Edit Added the info about the cache vars.
_________________
An intellectual is someone whose mind watches itself.
- Albert Camus
Back to top
View user's profile Send private message
zen_guerrilla
Guru
Guru


Joined: 18 Apr 2002
Posts: 343
Location: Greece

PostPosted: Mon Jun 16, 2003 9:30 am    Post subject: Re: Slighty OT :) Reply with quote

Thorbjorn wrote:
zen_guerrilla wrote:
1. Share /usr/portage via nfs from 1 box, use autofs to mount that share when needed from the other boxes.

Which will then solve the issue of use flags and having to manually emerge -f on the server.

Actually you don't need to invoke 'emerge -f' or set flags on the server, u just emerge something on eg box1, which will download the sources in the /usr/portage of the 'server' & thus when box{2,3,4,5...} needs the same sources it uses them the same way. Fetching sources & emerge sync'ing only once (on the 'server') for the whole lan. Plus if you need packages just create them on a box & then 'emerge -k' on the others. Plus u don't need a dedicated box for that solution, the 'server' can work also as a workstation at the same time.
The proxy/rsync approach is quite sophisticated & could do the job on a large (enterprise) lan, but for 10-20 boxes I think my way is easier to setup/admin & scales better.
Back to top
View user's profile Send private message
Koon
Retired Dev
Retired Dev


Joined: 10 Dec 2002
Posts: 518

PostPosted: Mon Jun 16, 2003 11:21 am    Post subject: Reply with quote

OK, after a bit of research, I think there is only 3 good solutions to the problem, the best depending on what you exactly need.

1/ NFS Share the tree and distfiles
(+) no need to emerge sync on the workstations
(-) risk (tiny) of access while the distfile is downloaded causing problems
This solution is best on small networks and when people auto-administrate their machines.
Question : is NFS the base way to share ?

2/ Rsync the tree, cache the distfiles
(The solution described here, with or without the proxy/cache setup)
This solution is best on medium networks and when administration is done only by a small team of admins.
Question : IMHO proxy/cache is better than doing emerge -f... Any pros to the emerge -f ?

3/ PortageSQL (see breakmygentoo)
(+) Accounting of what's installed, where...
(-) not available yet !
This solution is best on large networks, since accounting becomes quickly necessary.

I think I will finally go for the solution 1 since we only have half a dozen boxen and 3 of them are directly administered by their primary users (they use portage directly).

Maybe we should describe all solutions in the HOWTO so that it becomes a definitive guide describing all the options you have for using portage in a LAN, with their relative pros and cons.

-K
Back to top
View user's profile Send private message
GurliGebis
Retired Dev
Retired Dev


Joined: 08 Aug 2002
Posts: 509

PostPosted: Mon Jun 16, 2003 2:06 pm    Post subject: Reply with quote

Well, my installation works like this:

I have the server serving /usr/portage/distfiles over nfs, so the clients mount it in their /usr/portage/distfiles . The clients fetches the distfiles from the webservers like normally, but since they all have /usr/portage/distfiles mounted from the server, the file only needs to be downloaded once.
The server also runs the rsync daemon, so the clients can rsync against it, and thereby save bandwidth.
The server rsyncs once a day.
_________________
Queen Rocks.
Back to top
View user's profile Send private message
narksunamun
n00b
n00b


Joined: 11 Nov 2002
Posts: 3

PostPosted: Mon Jun 16, 2003 4:52 pm    Post subject: Reply with quote

Hi,

Here is my solution to mirror Portage Tree in a centralized network of Gentoo Linux computers. I've a server and serveral clients. On the server, I maintain the Portage tree for all comp. The first portage tree in located in /usr/portage_client. An rsync daemon runs on the server and shares /usr/portage_client. Each client has only the server has its Gentoo Portage Tree mirror in /etc/make.conf. The second portage tree is located in /usr/portage on the server. This directory is the location of is used when I want to get the last Gentoo portage tree from an official Gentoo mirror by an emerge rsync.

When I want to update the portage tree in /usr/portage_client with the most recent portage tree in /usr/portage, I do it in three steps :

1. I stop the rsyncd daemon which runs on the server
2. I copy the content of /usr/portage in /usr/portage_client
3. I start the rsyncd daemon

Each client runs an emerge rsync each day at 2 a.m.
The server has the most recent portage tree.

That's all !!!
_________________
I'm getting interest in JAVA, Linux & Gentoo
Master's Degree Student in Network, Multimedia and Internet at University Of Reunion Island
Back to top
View user's profile Send private message
Grimthorn
n00b
n00b


Joined: 04 Jun 2003
Posts: 10

PostPosted: Mon Jun 16, 2003 8:54 pm    Post subject: Reply with quote

hackertype, heijs

I've finally been able to recreate your error messages. Here's the scoop:

Shortly after I posted the HowTo, Gentoo updated the init.d script that handles the rsync daemon. If you had been diligently doing emerge sync and emerge -u system than you would have picked up this new script. Essentially they've added a parameter that tells rsync to compress the files before transferring them. Unfortunately when rsync tries to compress an already compressed file (Gentoo packages) it craps out.

To fix the problem:

Edit your rsyncd init.d script as follows:
Code:
nano -w /etc/init.d/rsyncd


Find this line:
Code:
RSYNC_OPTS="--safe-links --compress --bwlimit=700 --timeout=1800"


Make it look like this:

Code:
RSYNC_OPTS="--safe-links --timeout=1800"


Explanation:
--compress: Obviously tells rsync to compress the file before sending it. This is what's causing the problem.
--bwlimit=700: This throttles the bandwidth rsync uses. This shouldn't matter on an internal network so I've removed it. If you have bandwidth problems on your internal network than leave this parameter as is.

That's it. Let me know if it works.

Take care,
Grim
Back to top
View user's profile Send private message
Grimthorn
n00b
n00b


Joined: 04 Jun 2003
Posts: 10

PostPosted: Mon Jun 16, 2003 9:43 pm    Post subject: Reply with quote

As Koon suggests it would be great to add all of these configs to the HowTo so that it will become the definitive guide to creating a Gentoo gateway. Time constraints would keep me from implementing and testing every config so any help would be greatly appreciated! To start I've asked a few questions below.


Thorbjorn: Nice setup and great idea. I would like to try this. I think it would work well in our environment.

Koon: Thanks for all your "tire kicking". Your observations have helped focus our attention on the real issues.

zen_guerrilla: Thanks for the tips! Good idea about maintaining different packages for different arch's. I had not thought of this and it could be useful in our environment.

narksunamun: I'm curious about your setup. Do you maintain two portage trees to avoid a collision between the gateway's update of the portage tree and the client's requests for the portage tree?

GurliGebis: How many clients do you support? Have you had any problems with updating the packages while a client is requesting them?
Back to top
View user's profile Send private message
Thorbjorn
n00b
n00b


Joined: 16 Jan 2003
Posts: 23

PostPosted: Mon Jun 16, 2003 9:52 pm    Post subject: Reply with quote

Koon wrote:

Question : IMHO proxy/cache is better than doing emerge -f... Any pros to the emerge -f ?



because on all my clients i just emerge away and every thing is cached transparently ( nto the builds but the dist files ) I run many different machines ( workstations servers) and multiple architectures sun/ppc/i386. Caching of builds makes little sense since just about no 2 machines on my network build the same thng the same way. The proxy rsync purpose is strictly to limit my bandwidth usage. Case in pooint this morning i had 10 machines update gzip. Only the proxy grabbed that file external and the rest grabbed it from the cache. I didnt have to change anything in how i do my emegers from my workstations. I.E. its transparent. Whereas the emerge -f has 2 weaknesses. #1 you gotta manually do this on the "server" #2 you have to have cvs in your FEATURES so you can fetch all the posible depandant files before you emerge on your "clients". The transparent proxy/cache just works no need to muck with it.

phew long rant.


EDIT: OMG i thought you were arguing that proxy/cache was NOT better thena emerge -f .. lol rack this rant up to knee-jerk :P
_________________
An intellectual is someone whose mind watches itself.
- Albert Camus
Back to top
View user's profile Send private message
Thorbjorn
n00b
n00b


Joined: 16 Jan 2003
Posts: 23

PostPosted: Mon Jun 16, 2003 9:58 pm    Post subject: Reply with quote

GurliGebis wrote:
Well, my installation works like this:

I have the server serving /usr/portage/distfiles over nfs, so the clients mount it in their /usr/portage/distfiles . The clients fetches the distfiles from the webservers like normally, but since they all have /usr/portage/distfiles mounted from the server, the file only needs to be downloaded once.
The server also runs the rsync daemon, so the clients can rsync against it, and thereby save bandwidth.
The server rsyncs once a day.


this is a good way to go adn i have a server at home that doesthis, but i dont have the ability to have a single NFS mount across all the different internal nets. So i use the proxy/cache to achive basically the same thing. ( also theres always the NFS and Security issues that come along with running portmap adn nfs if your concerned about that )
_________________
An intellectual is someone whose mind watches itself.
- Albert Camus
Back to top
View user's profile Send private message
hackertype
n00b
n00b


Joined: 03 Jun 2003
Posts: 32

PostPosted: Mon Jun 16, 2003 10:39 pm    Post subject: Reply with quote

Grimthorn

Yes that was the problem. The fix worked for me. Thanks.

BTW emerge -f is fine with me. I would rather rsync for both distfiles and ebuild sources.
Back to top
View user's profile Send private message
Koon
Retired Dev
Retired Dev


Joined: 10 Dec 2002
Posts: 518

PostPosted: Tue Jun 17, 2003 7:15 am    Post subject: Reply with quote

Grimthorn wrote:
As Koon suggests it would be great to add all of these configs to the HowTo so that it will become the definitive guide to creating a Gentoo gateway. Time constraints would keep me from implementing and testing every config so any help would be greatly appreciated!

We should try first to determine the list of setups we recommend. For example, is there a point to talk about "emerge -f" updates if everyone agrees the proxy/cache is better for distfiles...

For my own setup I am changing my mind every two days : now I would rather go the Thorbjorn/Narksunamun way, which works even when the gateway is down, rather than the true NFS sharing way...

Grimthorn wrote:
narksunamun : I'm curious about your setup. Do you maintain two portage trees to avoid a collision between the gateway's update of the portage tree and the client's requests for the portage tree?

Maybe it's a way to control which trees are made available to the final workstations ? I was considering something like this (a way to certify a validated tree and then publish it to the other workstations...)

Thorbjorn wrote:
OMG i thought you were arguing that proxy/cache was NOT better thena emerge -f .. lol rack this rant up to knee-jerk

hehehe... I was on your side, in fact. Waiting for Grimthorm to defend the emerge-f option :)

-K
Back to top
View user's profile Send private message
cwng
n00b
n00b


Joined: 21 Nov 2002
Posts: 68
Location: Singapore

PostPosted: Tue Jun 17, 2003 10:21 am    Post subject: Reply with quote

Hello, this is a very good guide. I used it and it works ... but I decided against using rsync to fetch distfiles. I prefered a http method in GENTOO_MIRRORS so that if a source tar is not in the gateway, emerge will failover and use an alternative mirror.

To that effect, I emerged 'mini_httpd' (apache is an overkill, unless you actually already have apache set up).
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Goto page 1, 2, 3, 4, 5, 6  Next
Page 1 of 6

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum