View previous topic :: View next topic |
Author |
Message |
guy Apprentice
Joined: 31 Mar 2003 Posts: 286 Location: USA
|
Posted: Tue May 06, 2003 6:12 pm Post subject: |
|
|
it'd be cool if we had a "word-of-mouth" system in which the mirrors tell a few people they have a certain update, they tell a few more, etc etc until all or at least most people involved get the updated rsync. |
|
Back to top |
|
|
fishhead Apprentice
Joined: 07 Mar 2003 Posts: 162 Location: Pasadena, CA
|
Posted: Tue May 06, 2003 7:07 pm Post subject: |
|
|
PowerFactor's post got me thinking.
Perhaps if we combined my idea above with rsync. Each system can find what files need to be updated and then use rsync to update those files. I think as things stand right now the ENTIRE set of files on the server is checked, not just the needed ones. |
|
Back to top |
|
|
trooper82 n00b
Joined: 15 Mar 2003 Posts: 57
|
Posted: Wed May 07, 2003 1:03 am Post subject: |
|
|
Guilty!.... as charged. I read that newsletter and knew they were talking to me. I appolgize, to the entire Gentoo community.
pjp Site Admin wrote:
Quote: | You could designate a local mirror. Let the mirror sync, then have your other machines sync to it.
|
I am doing just that, serving 9 machines at the moment.
According to the documentation for setting up an rsync mirror....
http://www.gentoo.org/doc/en/rsync.xml
Quote: |
Update Frequency
Updates must occur at :00 and :30 of each hour, 24 hours a day. It is very important that this schedule is followed strictly, as we use a round robin style DNS to select the users' rsync server.
|
I should have used common sense and realized that I am not hosting an "Official" rsync server, so have no need to be updating as frequently as I was. I have knocked it down to once a day, may take it to once a week.
___________________________________________
Trooper82 |
|
Back to top |
|
|
djco Retired Dev
Joined: 29 Mar 2003 Posts: 67 Location: 52.36, 4.89
|
Posted: Wed May 07, 2003 8:42 am Post subject: Another way to handle sync |
|
|
Well, I was thinking about this yesterday, and even though I have not had too much experience with Gentoo and don't know shit about Python, I think I have something to contribute.
It seems to me Portage works with two layers right now:
1. You get the whole Portage tree: all available, which will be updated on emerge sync. emerge can find out from this tree what software you want to emerge.
2. When emerging a package, the actual sources for that package are retrieved from the internet and compiled to binary.
Is that about right? I would propose an extra layer, then:
1. You get a list (this could be implemented in XML very well) of all the ebuilds currently in Portage. This would include versions as well as the stability for different architectures and dependencies. If the file gets to big, it could be split up into several files for every category. This list will be your primary tree.
2. Ebuilds are only saved on your computer for packages that you actually emerged. This means rsync does not have to be used at all for ebuilds. Some info can be cut from the ebuild because it is already in the XML file.
3. The actual files are retrieved from the internet as stated in the ebuild, and the package is built and compiled.
The advantages:
- Only use rsync (possibly) for the XML files
- Portage tree is much smaller (XML also compresses very well)
- Easy to get info from the tree for emerge as well as external tools
- Less bandwidth used as ebuilds are only transferred as necessary
Little example for one of these XML files:
Code: |
<category name="sys-apps">
<package name="less">
<version name="378-r2">
<keywords>x86 ppc sparc alpha mips hppa arm</keywords>
<slot>0</slot>
<license>GPL-2</license>
<depend>virtual/glibc >=sys-libs/ncurses-5.2</depend>
</version>
</package>
</category>
|
|
|
Back to top |
|
|
roderickvd n00b
Joined: 25 Aug 2002 Posts: 46 Location: University of Twente
|
Posted: Wed May 07, 2003 10:15 am Post subject: Addition to the guidelines |
|
|
For all those that track Portage daily like I do, I recommend that you read through the daily CVS ChangeLog on the Gentoo web site first. If there's anything of interest go ahead and update, otherwise check back tomorrow.
How's that for an addition to the guidelines? |
|
Back to top |
|
|
leahcim n00b
Joined: 17 Mar 2003 Posts: 29
|
Posted: Thu May 08, 2003 3:29 am Post subject: |
|
|
Is it just a case of a previous kludge to get around a dns issue where folk were told to use rsync.<country code>.gentoo.org has bitten the UK server owner on the butt because he has the only uk rsync server? I'm using uk on one machine and europe on another (which includes the uk one), I note that the only message wrt rsync abuse I've see to date is from the UK server. Coincidence I bet
Does rsync transfer a lot of data if nothing has changed?
The other suggestions wrt patches over complete tarballs make sense, but this was talking about rsync. If there's not enough bandwidth resource to cope with rsync, it hardly matters saving bandwidth for the source, you're dead before you've got the .ebuild, let alone the tarball.
Not that I'd advocate syncing all the time, nor wasting bandwidth for the sake of it, I just think that if there are 50mb of changes to something in a time period, syncing 10 times or once over that time period should propagate that information as close as possible to the same figure.
Or as someone just said, we should all check a web site to see what's changed - why isn't that web site being blocked to once a day too? |
|
Back to top |
|
|
guero61 l33t
Joined: 14 Oct 2002 Posts: 811 Location: Behind you
|
Posted: Thu May 08, 2003 5:01 am Post subject: Re: Addition to the guidelines |
|
|
roderickvd wrote: | For all those that track Portage daily like I do, I recommend that you read through the daily CVS ChangeLog on the Gentoo web site first. If there's anything of interest go ahead and update, otherwise check back tomorrow.
How's that for an addition to the guidelines? |
Heck, if someone was enterprising enough, they'd write a screen scraper with wget and maybe a little Perl glue to watch the daily cvslog for interesting updates... could mail 'em right to you...
If I really wanted to keep up to date and not overload the rsync servers, this is what I'd do -- one extra hit every 30 minutes or so (or daily if that's how often the changelog changes) wouldn't put as much extra load on http servers that are designed for much heavier traffic than rsync. |
|
Back to top |
|
|
djco Retired Dev
Joined: 29 Mar 2003 Posts: 67 Location: 52.36, 4.89
|
Posted: Thu May 08, 2003 6:21 am Post subject: Re: Addition to the guidelines |
|
|
guero61 wrote: | roderickvd wrote: | For all those that track Portage daily like I do, I recommend that you read through the daily CVS ChangeLog on the Gentoo web site first. If there's anything of interest go ahead and update, otherwise check back tomorrow.
How's that for an addition to the guidelines? |
Heck, if someone was enterprising enough, they'd write a screen scraper with wget and maybe a little Perl glue to watch the daily cvslog for interesting updates... could mail 'em right to you...
If I really wanted to keep up to date and not overload the rsync servers, this is what I'd do -- one extra hit every 30 minutes or so (or daily if that's how often the changelog changes) wouldn't put as much extra load on http servers that are designed for much heavier traffic than rsync. |
Something like this? |
|
Back to top |
|
|
guero61 l33t
Joined: 14 Oct 2002 Posts: 811 Location: Behind you
|
Posted: Thu May 08, 2003 12:15 pm Post subject: |
|
|
There ya go! Look at the man!
*aside*
Crikey, what an industrious chap! D'ya think he'd share that script with the world of over-rsyncers so they'll rest easy in knowing they have the most up-to-the-minute packages??? |
|
Back to top |
|
|
cies n00b
Joined: 10 Apr 2002 Posts: 9
|
|
Back to top |
|
|
djco Retired Dev
Joined: 29 Mar 2003 Posts: 67 Location: 52.36, 4.89
|
Posted: Thu May 08, 2003 2:22 pm Post subject: |
|
|
Well, there's a few problems, still.
- It scrapes the online package database, which apparently only includes stable packages.
- It's a bunch of PHP scripts interacting with wget, and it's not yet automated, so I'll update it once a day for now.
- It's grasping 69 pages from the gentoo.org server, which means you wouldn't want to update it every 30 minutes (I don't think administrators would appreciate that very much).
Any way, I'll first make a version that checks out all of the information by itself and caches it for about 12 hours, maybe 6. Meanwhile, I think it would be nice to have a last-updated date for every package, and I'm still thinking if it would be nice to include older ebuilds, too. |
|
Back to top |
|
|
gilesc n00b
Joined: 01 Dec 2002 Posts: 40
|
Posted: Thu May 08, 2003 3:18 pm Post subject: Instructions for setting up a private rsync server |
|
|
Some users may have 50 gentoo boxes masquerading or NAT'ing behind a single IP. Even if these users set their boxes to only attempt an rsync once a week or so there would still appear to be 7 attemps on the rsync server from the same IP.
Are there any instructions out there to setup a private rsync server which can rsync once a day for all 50 machines on the LAN? |
|
Back to top |
|
|
gilesc n00b
Joined: 01 Dec 2002 Posts: 40
|
Posted: Thu May 08, 2003 3:33 pm Post subject: Re: RSYNC once a week... |
|
|
Mystilleef wrote: |
Remember if Microsoft was offering this same service, you'd probably be paying $50.00 a month for it. Public and network responsibility can only benefit all of us.
Mystilleef |
err... Windows Update?? |
|
Back to top |
|
|
carambola5 Apprentice
Joined: 10 Jul 2002 Posts: 214
|
Posted: Thu May 08, 2003 5:18 pm Post subject: Re: RSYNC once a week... |
|
|
gilesc wrote: | Mystilleef wrote: |
Remember if Microsoft was offering this same service, you'd probably be paying $50.00 a month for it. Public and network responsibility can only benefit all of us.
Mystilleef |
err... Windows Update?? |
I went to a Microsoft Security Seminar (no flames please... it was for work) and some of the most knowledgable attendees mentioned that they were having significant difficulties setting up Windows Update.
No, I'm not talking about the generic, everyday uses of Windows Update. The heart of the program is actually quite complex. It is suited for commercial deployment. Essentially, the IT department of a largish company can setup a Windows Update server that automatically downloads all of the updates (customizable to suit only the operating systems used in the company). Then, the sysadmins can test the updates on spares, and finally deploy accepted updates. All of the workstations are preconfigured to listen only to the local Windows Update server and will automatically install the patches approved by the sysadmins.
It's a very neat concept that technically works. Unfortunately for the IT guys that were at the seminar, it's an absolute pain in the arse to setup.
If gentoo were thinking about appealing to large-scale companies, I think emerge sync (in using rsync) is very outdated. What I would like to see is a server daemon that follows update etiquette and distributes patches to selective computers within its "domain." That way, we can still have the local rsync server idea... only tremendously souped up. |
|
Back to top |
|
|
Princess Firefly Tux's lil' helper
Joined: 21 Apr 2002 Posts: 80
|
Posted: Thu May 08, 2003 8:26 pm Post subject: What's new instead of what's different |
|
|
Maybe the problem is that we're worrying about checking everything in portage against everything on our system. All we really care about is what has changed sinced that last time we -checked-.
Something like what Manuzhai and co are talking about seems the way to go. Maybe somethng like this would be even better thouh:
Would it be possible for there to be a file generated on all the rsync mirors (or something) of the form:
Code: |
newest_added_package date_added
2nd_newest_added_package date_added
...
|
It could contain all the added/updated packages in the last 4 weeks or so (fairly small). If the users systems kept track of the last time they rsynced (note: not -downloaded- any specific packages, just rsynced) it would be really simple to display a message that listed all the new packages since the last rsync. If they haven't rsynced for 4 weeks (or whatever time is set) it would indiicate that it's probably time to rsync. Then it could prompt folks if they wanted to continue with the rsync or not bother.
Here's an example of me using something like this:
Code: |
#emerge rsync
...
The last time you rsync'ed was Monday, May 5th, 2003. The following new packages have been updated/added to portage since your last rsync:
mozilla-1.3b
openssh-7.777-r4
gnome-3.0
Do you wish to continue with the rsync? [y/n] N
|
So I know exactly what's new since my last rsync. If I don't feel like installing mozilla or gnome anyway I'd rather not bother with a huge rsync, then a time consuming emerge -up world just to figure that out. Not only is this way last bandwidth, it's way way more effecient and convienient for us users. (I'd love it... then again, it is my idea).
The only real problem is that the next time I rsync, it will do the check and it won't list mozillla, openssh, and gnome because they haven't been updated -since the last rsync- but I really think that's okay. I guess it'd be possible to keep a list on the local machine and then update that once they did rsync... I'm not sure it's necessary though.
Sure, I might forget in 5 days that gnome-3.0 is in portage (probably not ) but it really doesn't matter. People are rsync too often not too little so this really isn't an issue. Also there may be something weird like if you do a full rsync, upgrade package X which was just added to that mirror moment before and then immediately try to rsync again. I might say that package X has been added to portage since the last rsync cause you have now connected to a different mirror that hadn't had the file propagated through yet (30 minuted rsync delay for official mirrors). But it doesn't matter at all. First of all, the situation is unlikely, secondly people that would experience it are rsync'ing twice an hour, and thirdly we're not changing regular rsync so it wouldn't mess anything up, it'd just be one bogus error message that captain super duper bleeding edge would have to learn to get used to.
Also, I'd just like to say that rsync works pretty good for me (and I've used it on a # of different computers in a # of different locations). I very occasionally stop an rsync because a mirror is going too slow but I've never been kicked off halfway through a sync or anything like that. Maybe lovechild has a broken rsync binary (check those CFLAGS) Seriously though, the problem is not the protocol/program, it's what we're serving that needs to be addressed. Maybe there's a better program we can use but that's a different issue altogether. |
|
Back to top |
|
|
dmmgentoo n00b
Joined: 16 Jun 2002 Posts: 38
|
Posted: Fri May 09, 2003 5:52 am Post subject: |
|
|
sibbe wrote: | Maybe this is a good time to bring out the discussion about different sync methods. The choice in Gentoo is rsync, but eg. FreeBSD primarily uses cvsup, NetBSD sup (IIRC) and OpenBSD supports various methods too.
If Gentoo (officially) supported other methods for syncing the portage tree it would (ofcourse) lighten the load on rsync servers.
All methods have their flaws (even rsync), cvsup is written in m3 and therefore isn't very portable etc.
I know, this is a little off topic, since it's not going to solve any bandwidth usage problems. I just think there should be alternatives. |
I think cvsup is very frugal WRT bw. Maybe it should be tested. Is there a cvsupd for Linux? AFAIK, the version of cvsup for Linux is a statically-linked binary. Doing a port of cvsup on Gentoo would be interesting. I don't know what version of Modula-3 cvsup uses, but I've heard there were some problems getting the m3 libs to compile on Linux. |
|
Back to top |
|
|
roderickvd n00b
Joined: 25 Aug 2002 Posts: 46 Location: University of Twente
|
Posted: Fri May 09, 2003 11:32 am Post subject: Modula-3 |
|
|
I've been a long-time FreeBSD user and I adore cvsup. Simple and fast.
There have been problems getting Modula-3 to compile on platforms other than x86 because of some heavy architecture dependancies, but I know for a fact that it has been ported to UltraSPARC. |
|
Back to top |
|
|
klieber Bodhisattva
Joined: 17 Apr 2002 Posts: 3657 Location: San Francisco, CA
|
Posted: Fri May 09, 2003 12:10 pm Post subject: Re: Modula-3 |
|
|
roderickvd wrote: | There have been problems getting Modula-3 to compile on platforms other than x86 because of some heavy architecture dependancies, but I know for a fact that it has been ported to UltraSPARC. |
This is the primary reason against moving to cvsup, btw.
--kurt _________________ The problem with political jokes is that they get elected |
|
Back to top |
|
|
dreamer3 Guru
Joined: 24 Sep 2002 Posts: 553
|
Posted: Sat May 10, 2003 12:09 am Post subject: |
|
|
Lovechild wrote: | aethyr wrote: | PowerFactor:
Again, like I've been saying to anyone who will listen, I think the key is to keep these distfiles on the harddrive and use small, incremental patches to patch them to newer versions.
| I've been talking about this for ages, and the conclusion has always been that it's just to hard to implement. It would be a mighty cool feature, I agree. |
It's extra headache because _new_ downloads should probably get the full, un-patched source of the latest version while people upgrading should get the patches... just makes sense... but I guess it wouldn't be a huge deal for everyone to get base + small patches... making diffs of source can be done with a simple batch scripts...
Gentoo has already done this in the past on certiain occasions (think X, Mozilla) and I appreciate it greatly (modem user). |
|
Back to top |
|
|
dreamer3 Guru
Joined: 24 Sep 2002 Posts: 553
|
Posted: Sat May 10, 2003 12:37 am Post subject: |
|
|
Quote: | I don't really agree wiht that. From my understanding cpu cycles are generally cheaper than bandwidth. Yes the current problem with excessive rsyncing seems to be cpu hogging. But does that mean we should waste bandwidth to save cpu cycles? If we could come up with something non cpu intensive that used a little bandwidth as rsync then I would be all for it. But I'm not ready to toss rsync to the wolves just yet. |
I don't think having gziped ebuilds would use very much bandwidth... not familiar with how "smart" rsync is with how much of changed files it sends but I can't imagine gzip would be so bad for ebuilds... If we did that and then came up with a brilliant way to diff the tarballs (source) I'd say overall we'd be saving a lot.
Maybe the proposed system could only gzip a diff from one version of an ebuild to another and only send that... though that gets messy.
Last edited by dreamer3 on Sat May 10, 2003 10:53 pm; edited 1 time in total |
|
Back to top |
|
|
Lovechild Advocate
Joined: 17 May 2002 Posts: 2858 Location: Århus, Denmark
|
Posted: Sat May 10, 2003 7:27 am Post subject: |
|
|
-edited-
Last edited by Lovechild on Mon May 12, 2003 9:30 am; edited 1 time in total |
|
Back to top |
|
|
Genone Retired Dev
Joined: 14 Mar 2003 Posts: 9532 Location: beyond the rim
|
Posted: Mon May 12, 2003 2:22 am Post subject: Re: Instructions for setting up a private rsync server |
|
|
gilesc wrote: | Are there any instructions out there to setup a private rsync server which can rsync once a day for all 50 machines on the LAN? |
Even easier, set up a NFS server that exports /usr/portage to the other boxes, so you only need to emerge sync one box and all other boxes have the current portage tree.
@klieber: if you're interested I can write a short article about this for GWN |
|
Back to top |
|
|
dufeu l33t
Joined: 30 Aug 2002 Posts: 924 Location: US-FL-EST
|
Posted: Mon May 12, 2003 3:33 am Post subject: Re: Instructions for setting up a private rsync server |
|
|
Genone wrote: | gilesc wrote: | Are there any instructions out there to setup a private rsync server which can rsync once a day for all 50 machines on the LAN? |
Even easier, set up a NFS server that exports /usr/portage to the other boxes, so you only need to emerge sync one box and all other boxes have the current portage tree.
@klieber: if you're interested I can write a short article about this for GWN |
I'm _very_ interested in instructions on how to do this.
Currently, in an effort to _not_ hammer the servers, I do this:
1) I only 'emerge sync' manually.
2) I do 'emerge -pu world' manually. I then either do 'emerge -u world' or 'emerge pkg1 pkg2 ... pkgn'. I find there are times when there is a package I _don't want to_ or _can't yet_ upgrade. The important point is that this happens often enough that I feel it important enough to retain manual control.
3) I build new systems by popping in the hard drive of the new system into my primary machine. In other words, my primary workstation becomes a super Gentoo LiveCD ISO. The important difference is that I have direct machine access to the latest packages I use. To compile everything, I simply set the new disk's make.conf for what ever the target CPU is. And instead of rebooting and building the desktop later, I simply stay in the chrooted environment and build all the packages I want for the target machine.
4) I'm learning how to use 'rsync' to remotely keep all my different machines '/usr/portage/distfiles/' directories in sync. Since this is local across my network, I don't bother the servers.
HOWEVER, I consider number 4 to be a bit brute force.
So yes, I'm very interested in learning how to set up NFS so that I have only one 'distfiles' directory.
_________________ People whom think M$ is mediocre, don't know the half of it. |
|
Back to top |
|
|
Genone Retired Dev
Joined: 14 Mar 2003 Posts: 9532 Location: beyond the rim
|
Posted: Mon May 12, 2003 9:24 am Post subject: |
|
|
Hehe, just saw that klieber already picked up my NFS instructions from the mailinglist in GWN. If you have further questions feel free to ask. |
|
Back to top |
|
|
dufeu l33t
Joined: 30 Aug 2002 Posts: 924 Location: US-FL-EST
|
Posted: Mon May 12, 2003 2:57 pm Post subject: Some questions |
|
|
Genone wrote: | Hehe, just saw that klieber already picked up my NFS instructions from the mailinglist in GWN. If you have further questions feel free to ask. |
I'm going to have to take you up on that. I've read through the entire aformentioned thread. From my level of newbieness, I'm afraid all the suggestions require more knowledge than I currently possess. So I'll have to ask some questions a little further back.
Let's say I want to set up a gentoo distfile server using NFS called: distserve at LAN IP address 192.168.0.11. Let's call the client machines distclient01 ... distclientNN.
For each client machine, distclientNN, assuming I've compiled in kernel support for NFS (I have the defaults), I need to do the following:
1) Add the following to /etc/fstab:
Code: |
disthost:/usr/portage /usr/portage nfs rsize=8192,wsize=8192,timeo=14,intr,ro
disthost:/usr/portage/distfiles /usr/portage/distfiles nfs rsize=8192,wsize=8192,timeo=14,intr,rw
disthost:/usr/portage/packages/$HOST /usr/portage/packages nfs rsize=8192,wsize=8192,timeo=14,intr,rw
|
2) Move /usr/portage to somewhere safe and create a new mountpoint.
Code: |
# mkdir /safe
# mv /usr/portage /safe/
# mkdir /usr/portage
|
3) Add portmap to the init process
Code: |
# rc-update add portmap default
|
4) Add netmount to the init process
Code: |
# rc-update add netmount default
|
And I think that's it for the clients. I'm assuming, because I don't know better, that the defaults for portmap and netmount are fine.
I'm off to do a little reading on setting up an NFS server, something else I haven't learned to do yet. Hopefully, ibiblio is back up so that I can get to the tldp NFS HOW-TO. _________________ People whom think M$ is mediocre, don't know the half of it. |
|
Back to top |
|
|
|