Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
rsync etiquette guideline
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
guy
Apprentice
Apprentice


Joined: 31 Mar 2003
Posts: 286
Location: USA

PostPosted: Tue May 06, 2003 6:12 pm    Post subject: Reply with quote

it'd be cool if we had a "word-of-mouth" system in which the mirrors tell a few people they have a certain update, they tell a few more, etc etc until all or at least most people involved get the updated rsync.
Back to top
View user's profile Send private message
fishhead
Apprentice
Apprentice


Joined: 07 Mar 2003
Posts: 162
Location: Pasadena, CA

PostPosted: Tue May 06, 2003 7:07 pm    Post subject: Reply with quote

PowerFactor's post got me thinking.
Perhaps if we combined my idea above with rsync. Each system can find what files need to be updated and then use rsync to update those files. I think as things stand right now the ENTIRE set of files on the server is checked, not just the needed ones.
Back to top
View user's profile Send private message
trooper82
n00b
n00b


Joined: 15 Mar 2003
Posts: 57

PostPosted: Wed May 07, 2003 1:03 am    Post subject: Reply with quote

Guilty!.... as charged. I read that newsletter and knew they were talking to me. I appolgize, to the entire Gentoo community.


pjp Site Admin wrote:

Quote:
You could designate a local mirror. Let the mirror sync, then have your other machines sync to it.


I am doing just that, serving 9 machines at the moment.


According to the documentation for setting up an rsync mirror....

http://www.gentoo.org/doc/en/rsync.xml

Quote:

Update Frequency

Updates must occur at :00 and :30 of each hour, 24 hours a day. It is very important that this schedule is followed strictly, as we use a round robin style DNS to select the users' rsync server.



I should have used common sense and realized that I am not hosting an "Official" rsync server, so have no need to be updating as frequently as I was. I have knocked it down to once a day, may take it to once a week.

___________________________________________

Trooper82
Back to top
View user's profile Send private message
djco
Retired Dev
Retired Dev


Joined: 29 Mar 2003
Posts: 67
Location: 52.36, 4.89

PostPosted: Wed May 07, 2003 8:42 am    Post subject: Another way to handle sync Reply with quote

Well, I was thinking about this yesterday, and even though I have not had too much experience with Gentoo and don't know shit about Python, I think I have something to contribute.

It seems to me Portage works with two layers right now:

1. You get the whole Portage tree: all available, which will be updated on emerge sync. emerge can find out from this tree what software you want to emerge.

2. When emerging a package, the actual sources for that package are retrieved from the internet and compiled to binary.

Is that about right? I would propose an extra layer, then:

1. You get a list (this could be implemented in XML very well) of all the ebuilds currently in Portage. This would include versions as well as the stability for different architectures and dependencies. If the file gets to big, it could be split up into several files for every category. This list will be your primary tree.

2. Ebuilds are only saved on your computer for packages that you actually emerged. This means rsync does not have to be used at all for ebuilds. Some info can be cut from the ebuild because it is already in the XML file.

3. The actual files are retrieved from the internet as stated in the ebuild, and the package is built and compiled.

The advantages:

- Only use rsync (possibly) for the XML files
- Portage tree is much smaller (XML also compresses very well)
- Easy to get info from the tree for emerge as well as external tools
- Less bandwidth used as ebuilds are only transferred as necessary

Little example for one of these XML files:

Code:

<category name="sys-apps">
    <package name="less">
        <version name="378-r2">
            <keywords>x86 ppc sparc alpha mips hppa arm</keywords>
            <slot>0</slot>
            <license>GPL-2</license>
            <depend>virtual/glibc >=sys-libs/ncurses-5.2</depend>
        </version>
    </package>
</category>
Back to top
View user's profile Send private message
roderickvd
n00b
n00b


Joined: 25 Aug 2002
Posts: 46
Location: University of Twente

PostPosted: Wed May 07, 2003 10:15 am    Post subject: Addition to the guidelines Reply with quote

For all those that track Portage daily like I do, I recommend that you read through the daily CVS ChangeLog on the Gentoo web site first. If there's anything of interest go ahead and update, otherwise check back tomorrow.

How's that for an addition to the guidelines?
Back to top
View user's profile Send private message
leahcim
n00b
n00b


Joined: 17 Mar 2003
Posts: 29

PostPosted: Thu May 08, 2003 3:29 am    Post subject: Reply with quote

Is it just a case of a previous kludge to get around a dns issue where folk were told to use rsync.<country code>.gentoo.org has bitten the UK server owner on the butt because he has the only uk rsync server? I'm using uk on one machine and europe on another (which includes the uk one), I note that the only message wrt rsync abuse I've see to date is from the UK server. Coincidence I bet :)

Does rsync transfer a lot of data if nothing has changed?

The other suggestions wrt patches over complete tarballs make sense, but this was talking about rsync. If there's not enough bandwidth resource to cope with rsync, it hardly matters saving bandwidth for the source, you're dead before you've got the .ebuild, let alone the tarball.

Not that I'd advocate syncing all the time, nor wasting bandwidth for the sake of it, I just think that if there are 50mb of changes to something in a time period, syncing 10 times or once over that time period should propagate that information as close as possible to the same figure.

Or as someone just said, we should all check a web site to see what's changed - why isn't that web site being blocked to once a day too? :)
Back to top
View user's profile Send private message
guero61
l33t
l33t


Joined: 14 Oct 2002
Posts: 811
Location: Behind you

PostPosted: Thu May 08, 2003 5:01 am    Post subject: Re: Addition to the guidelines Reply with quote

roderickvd wrote:
For all those that track Portage daily like I do, I recommend that you read through the daily CVS ChangeLog on the Gentoo web site first. If there's anything of interest go ahead and update, otherwise check back tomorrow.

How's that for an addition to the guidelines?


Heck, if someone was enterprising enough, they'd write a screen scraper with wget and maybe a little Perl glue to watch the daily cvslog for interesting updates... could mail 'em right to you...

If I really wanted to keep up to date and not overload the rsync servers, this is what I'd do -- one extra hit every 30 minutes or so (or daily if that's how often the changelog changes) wouldn't put as much extra load on http servers that are designed for much heavier traffic than rsync.
Back to top
View user's profile Send private message
djco
Retired Dev
Retired Dev


Joined: 29 Mar 2003
Posts: 67
Location: 52.36, 4.89

PostPosted: Thu May 08, 2003 6:21 am    Post subject: Re: Addition to the guidelines Reply with quote

guero61 wrote:
roderickvd wrote:
For all those that track Portage daily like I do, I recommend that you read through the daily CVS ChangeLog on the Gentoo web site first. If there's anything of interest go ahead and update, otherwise check back tomorrow.

How's that for an addition to the guidelines?


Heck, if someone was enterprising enough, they'd write a screen scraper with wget and maybe a little Perl glue to watch the daily cvslog for interesting updates... could mail 'em right to you...

If I really wanted to keep up to date and not overload the rsync servers, this is what I'd do -- one extra hit every 30 minutes or so (or daily if that's how often the changelog changes) wouldn't put as much extra load on http servers that are designed for much heavier traffic than rsync.

Something like this? :)
Back to top
View user's profile Send private message
guero61
l33t
l33t


Joined: 14 Oct 2002
Posts: 811
Location: Behind you

PostPosted: Thu May 08, 2003 12:15 pm    Post subject: Reply with quote

There ya go! Look at the man!

*aside*
Crikey, what an industrious chap! D'ya think he'd share that script with the world of over-rsyncers so they'll rest easy in knowing they have the most up-to-the-minute packages???
Back to top
View user's profile Send private message
cies
n00b
n00b


Joined: 10 Apr 2002
Posts: 9

PostPosted: Thu May 08, 2003 2:20 pm    Post subject: Using P2P distfile sharing to reduce mirror loads Reply with quote

Opened thread on:

https://forums.gentoo.org/viewtopic.php?t=52676

Here you go...
Back to top
View user's profile Send private message
djco
Retired Dev
Retired Dev


Joined: 29 Mar 2003
Posts: 67
Location: 52.36, 4.89

PostPosted: Thu May 08, 2003 2:22 pm    Post subject: Reply with quote

Well, there's a few problems, still.

- It scrapes the online package database, which apparently only includes stable packages.

- It's a bunch of PHP scripts interacting with wget, and it's not yet automated, so I'll update it once a day for now.

- It's grasping 69 pages from the gentoo.org server, which means you wouldn't want to update it every 30 minutes (I don't think administrators would appreciate that very much).

Any way, I'll first make a version that checks out all of the information by itself and caches it for about 12 hours, maybe 6. Meanwhile, I think it would be nice to have a last-updated date for every package, and I'm still thinking if it would be nice to include older ebuilds, too.
Back to top
View user's profile Send private message
gilesc
n00b
n00b


Joined: 01 Dec 2002
Posts: 40

PostPosted: Thu May 08, 2003 3:18 pm    Post subject: Instructions for setting up a private rsync server Reply with quote

Some users may have 50 gentoo boxes masquerading or NAT'ing behind a single IP. Even if these users set their boxes to only attempt an rsync once a week or so there would still appear to be 7 attemps on the rsync server from the same IP.

Are there any instructions out there to setup a private rsync server which can rsync once a day for all 50 machines on the LAN?
Back to top
View user's profile Send private message
gilesc
n00b
n00b


Joined: 01 Dec 2002
Posts: 40

PostPosted: Thu May 08, 2003 3:33 pm    Post subject: Re: RSYNC once a week... Reply with quote

Mystilleef wrote:


Remember if Microsoft was offering this same service, you'd probably be paying $50.00 a month for it. Public and network responsibility can only benefit all of us.

Mystilleef


err... Windows Update??
Back to top
View user's profile Send private message
carambola5
Apprentice
Apprentice


Joined: 10 Jul 2002
Posts: 214

PostPosted: Thu May 08, 2003 5:18 pm    Post subject: Re: RSYNC once a week... Reply with quote

gilesc wrote:
Mystilleef wrote:


Remember if Microsoft was offering this same service, you'd probably be paying $50.00 a month for it. Public and network responsibility can only benefit all of us.

Mystilleef


err... Windows Update??


I went to a Microsoft Security Seminar (no flames please... it was for work) and some of the most knowledgable attendees mentioned that they were having significant difficulties setting up Windows Update.

No, I'm not talking about the generic, everyday uses of Windows Update. The heart of the program is actually quite complex. It is suited for commercial deployment. Essentially, the IT department of a largish company can setup a Windows Update server that automatically downloads all of the updates (customizable to suit only the operating systems used in the company). Then, the sysadmins can test the updates on spares, and finally deploy accepted updates. All of the workstations are preconfigured to listen only to the local Windows Update server and will automatically install the patches approved by the sysadmins.

It's a very neat concept that technically works. Unfortunately for the IT guys that were at the seminar, it's an absolute pain in the arse to setup.

If gentoo were thinking about appealing to large-scale companies, I think emerge sync (in using rsync) is very outdated. What I would like to see is a server daemon that follows update etiquette and distributes patches to selective computers within its "domain." That way, we can still have the local rsync server idea... only tremendously souped up.
Back to top
View user's profile Send private message
Princess Firefly
Tux's lil' helper
Tux's lil' helper


Joined: 21 Apr 2002
Posts: 80

PostPosted: Thu May 08, 2003 8:26 pm    Post subject: What's new instead of what's different Reply with quote

Maybe the problem is that we're worrying about checking everything in portage against everything on our system. All we really care about is what has changed sinced that last time we -checked-.

Something like what Manuzhai and co are talking about seems the way to go. Maybe somethng like this would be even better thouh:

Would it be possible for there to be a file generated on all the rsync mirors (or something) of the form:

Code:

newest_added_package  date_added
2nd_newest_added_package date_added
...


It could contain all the added/updated packages in the last 4 weeks or so (fairly small). If the users systems kept track of the last time they rsynced (note: not -downloaded- any specific packages, just rsynced) it would be really simple to display a message that listed all the new packages since the last rsync. If they haven't rsynced for 4 weeks (or whatever time is set) it would indiicate that it's probably time to rsync. Then it could prompt folks if they wanted to continue with the rsync or not bother.

Here's an example of me using something like this:

Code:

#emerge rsync
...

The last time you rsync'ed was Monday, May 5th, 2003.  The following new packages have been updated/added to portage since your last rsync:

mozilla-1.3b
openssh-7.777-r4
gnome-3.0

Do you wish to continue with the rsync? [y/n]   N


So I know exactly what's new since my last rsync. If I don't feel like installing mozilla or gnome anyway I'd rather not bother with a huge rsync, then a time consuming emerge -up world just to figure that out. Not only is this way last bandwidth, it's way way more effecient and convienient for us users. (I'd love it... then again, it is my idea).

The only real problem is that the next time I rsync, it will do the check and it won't list mozillla, openssh, and gnome because they haven't been updated -since the last rsync- but I really think that's okay. I guess it'd be possible to keep a list on the local machine and then update that once they did rsync... I'm not sure it's necessary though.

Sure, I might forget in 5 days that gnome-3.0 is in portage (probably not :) ) but it really doesn't matter. People are rsync too often not too little so this really isn't an issue. Also there may be something weird like if you do a full rsync, upgrade package X which was just added to that mirror moment before and then immediately try to rsync again. I might say that package X has been added to portage since the last rsync cause you have now connected to a different mirror that hadn't had the file propagated through yet (30 minuted rsync delay for official mirrors). But it doesn't matter at all. First of all, the situation is unlikely, secondly people that would experience it are rsync'ing twice an hour, and thirdly we're not changing regular rsync so it wouldn't mess anything up, it'd just be one bogus error message that captain super duper bleeding edge would have to learn to get used to.

Also, I'd just like to say that rsync works pretty good for me (and I've used it on a # of different computers in a # of different locations). I very occasionally stop an rsync because a mirror is going too slow but I've never been kicked off halfway through a sync or anything like that. Maybe lovechild has a broken rsync binary (check those CFLAGS) :) Seriously though, the problem is not the protocol/program, it's what we're serving that needs to be addressed. Maybe there's a better program we can use but that's a different issue altogether.
Back to top
View user's profile Send private message
dmmgentoo
n00b
n00b


Joined: 16 Jun 2002
Posts: 38

PostPosted: Fri May 09, 2003 5:52 am    Post subject: Reply with quote

sibbe wrote:
Maybe this is a good time to bring out the discussion about different sync methods. The choice in Gentoo is rsync, but eg. FreeBSD primarily uses cvsup, NetBSD sup (IIRC) and OpenBSD supports various methods too.

If Gentoo (officially) supported other methods for syncing the portage tree it would (ofcourse) lighten the load on rsync servers.
All methods have their flaws (even rsync), cvsup is written in m3 and therefore isn't very portable etc.

I know, this is a little off topic, since it's not going to solve any bandwidth usage problems. I just think there should be alternatives.


I think cvsup is very frugal WRT bw. Maybe it should be tested. Is there a cvsupd for Linux? AFAIK, the version of cvsup for Linux is a statically-linked binary. Doing a port of cvsup on Gentoo would be interesting. I don't know what version of Modula-3 cvsup uses, but I've heard there were some problems getting the m3 libs to compile on Linux.
Back to top
View user's profile Send private message
roderickvd
n00b
n00b


Joined: 25 Aug 2002
Posts: 46
Location: University of Twente

PostPosted: Fri May 09, 2003 11:32 am    Post subject: Modula-3 Reply with quote

I've been a long-time FreeBSD user and I adore cvsup. Simple and fast.

There have been problems getting Modula-3 to compile on platforms other than x86 because of some heavy architecture dependancies, but I know for a fact that it has been ported to UltraSPARC.
Back to top
View user's profile Send private message
klieber
Bodhisattva
Bodhisattva


Joined: 17 Apr 2002
Posts: 3657
Location: San Francisco, CA

PostPosted: Fri May 09, 2003 12:10 pm    Post subject: Re: Modula-3 Reply with quote

roderickvd wrote:
There have been problems getting Modula-3 to compile on platforms other than x86 because of some heavy architecture dependancies, but I know for a fact that it has been ported to UltraSPARC.

This is the primary reason against moving to cvsup, btw.

--kurt
_________________
The problem with political jokes is that they get elected
Back to top
View user's profile Send private message
dreamer3
Guru
Guru


Joined: 24 Sep 2002
Posts: 553

PostPosted: Sat May 10, 2003 12:09 am    Post subject: Reply with quote

Lovechild wrote:
aethyr wrote:
PowerFactor:
Again, like I've been saying to anyone who will listen, I think the key is to keep these distfiles on the harddrive and use small, incremental patches to patch them to newer versions.
I've been talking about this for ages, and the conclusion has always been that it's just to hard to implement. It would be a mighty cool feature, I agree.

It's extra headache because _new_ downloads should probably get the full, un-patched source of the latest version while people upgrading should get the patches... just makes sense... but I guess it wouldn't be a huge deal for everyone to get base + small patches... making diffs of source can be done with a simple batch scripts...

Gentoo has already done this in the past on certiain occasions (think X, Mozilla) and I appreciate it greatly (modem user).
Back to top
View user's profile Send private message
dreamer3
Guru
Guru


Joined: 24 Sep 2002
Posts: 553

PostPosted: Sat May 10, 2003 12:37 am    Post subject: Reply with quote

Quote:
I don't really agree wiht that. From my understanding cpu cycles are generally cheaper than bandwidth. Yes the current problem with excessive rsyncing seems to be cpu hogging. But does that mean we should waste bandwidth to save cpu cycles? If we could come up with something non cpu intensive that used a little bandwidth as rsync then I would be all for it. But I'm not ready to toss rsync to the wolves just yet.

I don't think having gziped ebuilds would use very much bandwidth... not familiar with how "smart" rsync is with how much of changed files it sends but I can't imagine gzip would be so bad for ebuilds... If we did that and then came up with a brilliant way to diff the tarballs (source) I'd say overall we'd be saving a lot. ;-)

Maybe the proposed system could only gzip a diff from one version of an ebuild to another and only send that... though that gets messy.


Last edited by dreamer3 on Sat May 10, 2003 10:53 pm; edited 1 time in total
Back to top
View user's profile Send private message
Lovechild
Advocate
Advocate


Joined: 17 May 2002
Posts: 2858
Location: Århus, Denmark

PostPosted: Sat May 10, 2003 7:27 am    Post subject: Reply with quote

-edited-

Last edited by Lovechild on Mon May 12, 2003 9:30 am; edited 1 time in total
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9507
Location: beyond the rim

PostPosted: Mon May 12, 2003 2:22 am    Post subject: Re: Instructions for setting up a private rsync server Reply with quote

gilesc wrote:
Are there any instructions out there to setup a private rsync server which can rsync once a day for all 50 machines on the LAN?


Even easier, set up a NFS server that exports /usr/portage to the other boxes, so you only need to emerge sync one box and all other boxes have the current portage tree.

@klieber: if you're interested I can write a short article about this for GWN
Back to top
View user's profile Send private message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 924
Location: US-FL-EST

PostPosted: Mon May 12, 2003 3:33 am    Post subject: Re: Instructions for setting up a private rsync server Reply with quote

Genone wrote:
gilesc wrote:
Are there any instructions out there to setup a private rsync server which can rsync once a day for all 50 machines on the LAN?


Even easier, set up a NFS server that exports /usr/portage to the other boxes, so you only need to emerge sync one box and all other boxes have the current portage tree.

@klieber: if you're interested I can write a short article about this for GWN


I'm _very_ interested in instructions on how to do this.

Currently, in an effort to _not_ hammer the servers, I do this:

1) I only 'emerge sync' manually.

2) I do 'emerge -pu world' manually. I then either do 'emerge -u world' or 'emerge pkg1 pkg2 ... pkgn'. I find there are times when there is a package I _don't want to_ or _can't yet_ upgrade. The important point is that this happens often enough that I feel it important enough to retain manual control.

3) I build new systems by popping in the hard drive of the new system into my primary machine. In other words, my primary workstation becomes a super Gentoo LiveCD ISO. The important difference is that I have direct machine access to the latest packages I use. To compile everything, I simply set the new disk's make.conf for what ever the target CPU is. And instead of rebooting and building the desktop later, I simply stay in the chrooted environment and build all the packages I want for the target machine.

4) I'm learning how to use 'rsync' to remotely keep all my different machines '/usr/portage/distfiles/' directories in sync. Since this is local across my network, I don't bother the servers.

HOWEVER, I consider number 4 to be a bit brute force.

So yes, I'm very interested in learning how to set up NFS so that I have only one 'distfiles' directory.

:-D
_________________
People whom think M$ is mediocre, don't know the half of it.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9507
Location: beyond the rim

PostPosted: Mon May 12, 2003 9:24 am    Post subject: Reply with quote

Hehe, just saw that klieber already picked up my NFS instructions from the mailinglist in GWN. If you have further questions feel free to ask.
Back to top
View user's profile Send private message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 924
Location: US-FL-EST

PostPosted: Mon May 12, 2003 2:57 pm    Post subject: Some questions Reply with quote

Genone wrote:
Hehe, just saw that klieber already picked up my NFS instructions from the mailinglist in GWN. If you have further questions feel free to ask.


I'm going to have to take you up on that. I've read through the entire aformentioned thread. From my level of newbieness, I'm afraid all the suggestions require more knowledge than I currently possess. So I'll have to ask some questions a little further back.

Let's say I want to set up a gentoo distfile server using NFS called: distserve at LAN IP address 192.168.0.11. Let's call the client machines distclient01 ... distclientNN.

For each client machine, distclientNN, assuming I've compiled in kernel support for NFS (I have the defaults), I need to do the following:

1) Add the following to /etc/fstab:
Code:

disthost:/usr/portage                    /usr/portage              nfs     rsize=8192,wsize=8192,timeo=14,intr,ro
disthost:/usr/portage/distfiles          /usr/portage/distfiles    nfs     rsize=8192,wsize=8192,timeo=14,intr,rw
disthost:/usr/portage/packages/$HOST     /usr/portage/packages     nfs     rsize=8192,wsize=8192,timeo=14,intr,rw


2) Move /usr/portage to somewhere safe and create a new mountpoint.
Code:

# mkdir /safe
# mv /usr/portage /safe/
# mkdir /usr/portage


3) Add portmap to the init process
Code:

# rc-update add portmap default


4) Add netmount to the init process
Code:

# rc-update add netmount default


And I think that's it for the clients. I'm assuming, because I don't know better, that the defaults for portmap and netmount are fine.

I'm off to do a little reading on setting up an NFS server, something else I haven't learned to do yet. Hopefully, ibiblio is back up so that I can get to the tldp NFS HOW-TO.
_________________
People whom think M$ is mediocre, don't know the half of it.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Goto page Previous  1, 2, 3, 4  Next
Page 3 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum