Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
rsync etiquette guideline
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
sibbe
n00b
n00b


Joined: 31 Jan 2003
Posts: 35
Location: Helsinki, Finland

PostPosted: Mon May 05, 2003 2:13 pm    Post subject: Reply with quote

Maybe this is a good time to bring out the discussion about different sync methods. The choice in Gentoo is rsync, but eg. FreeBSD primarily uses cvsup, NetBSD sup (IIRC) and OpenBSD supports various methods too.

If Gentoo (officially) supported other methods for syncing the portage tree it would (ofcourse) lighten the load on rsync servers.
All methods have their flaws (even rsync), cvsup is written in m3 and therefore isn't very portable etc.

I know, this is a little off topic, since it's not going to solve any bandwidth usage problems. I just think there should be alternatives.
_________________
jyrki muukkonen
Back to top
View user's profile Send private message
d3c3it
l33t
l33t


Joined: 01 Mar 2003
Posts: 765
Location: Manchester, UK

PostPosted: Mon May 05, 2003 2:15 pm    Post subject: Reply with quote

well i was 1 of the offenders until about a week ago when a msg popped up on the rsync mirror i use *uk* saying more than twice aday will be banned or considered abusive since then its been only once aday because i always find some package has an update but i think thats totally fair inplementing this after all its free:)
_________________
Some people go to counselling,
others use linux
Back to top
View user's profile Send private message
djco
Developer
Developer


Joined: 29 Mar 2003
Posts: 67
Location: 52.36, 4.89

PostPosted: Mon May 05, 2003 2:23 pm    Post subject: Reply with quote

I think changing the SYNC var in make.conf is not in the installation manual. It might help saying something about it there, so that newbies will more frequently change their SYNC var (thus conserving bandwidth for the default server).
Back to top
View user's profile Send private message
Caffeine
Guru
Guru


Joined: 17 Jul 2002
Posts: 401
Location: Melbourne, Australia

PostPosted: Mon May 05, 2003 3:31 pm    Post subject: Reply with quote

How about a package similar to app-admin/mirrorselect called app-admin/rsyncselect ? Or package these two apps together?

Then add a step to the install instructions "Run mirrorselect -a 3 and rsyncselect"
Back to top
View user's profile Send private message
shadow255
Guru
Guru


Joined: 04 Apr 2003
Posts: 406

PostPosted: Mon May 05, 2003 3:37 pm    Post subject: Reply with quote

Manuzhai wrote:
I think changing the SYNC var in make.conf is not in the installation manual. It might help saying something about it there, so that newbies will more frequently change their SYNC var (thus conserving bandwidth for the default server).

This should probably be filed as a bug, but I'll add that it's not necessarily a matter of changing, but possibly a matter of adding the SYNC variable. I couldn't find it in my make.conf, nor any comments about it. I hope there are plans to add mention of it in future revisions to portage.
_________________
Vogon poetry is of course the third worst in the Universe. -- Douglas Adams, The Hitchhiker's Guide to the Galaxy
Back to top
View user's profile Send private message
carambola5
Apprentice
Apprentice


Joined: 10 Jul 2002
Posts: 214
Location: Madtown, WI

PostPosted: Mon May 05, 2003 5:50 pm    Post subject: Reply with quote

csnyder wrote:
I have a local rsync mirror that resyncs hourly. Is this considered against rsync etiquette? It's serving 8 Gentoo boxen at the moment.


Since we have generally established that this is a "Bad Idea," allow me to propose the following solution (This will only apply to people with a dedicated server/computer with ~100% uptime):

Install a (yet-to-be-written) program on the server. I will call this ghost program: smart-sync.
Smart-sync will have a few rules built in, such as:

  1. Never sync more than once per 4 hours unless (forced && !cron).
  2. Never sync more than 4 times per day (ever).
  3. Execute a cron-like sync once per day subject to rules #1 & #2.

(All of these rules are customizable)

The client machines will have the server set as the sync mirror.
When a client requests an emerge sync, the following steps occur:

  • Client: requests rsync with server.
  • Server: if a log check shows that syncing with official gentoo sync mirrors would not violate a rule, the server will perform an rsync.
  • Server: whether the official rsync was executed or not, will provide sync service to client
  • Client: receives rsync service from server.


This way, you only ever have one computer sync with the official mirrors (which it looks like csnyder has already taken care of), but it also ensures that etiquette rules are not broken. The above values that I have in the rules are just examples, and I think those should be the approximate defaults.

One last thing.... the clients' sync timeout variables might have to be significantly extended for this to work.

And there it is: a transparent, etiquette-obeying, and clever solution.
_________________
Get Firefox!

Proper Web Development

I'm done at 999.
Back to top
View user's profile Send private message
henke
Apprentice
Apprentice


Joined: 30 Sep 2002
Posts: 165
Location: Stockholm, Sweden

PostPosted: Mon May 05, 2003 6:08 pm    Post subject: Reply with quote

klieber wrote:
Have you considered waiting 4 or even (gasp) 6 hours to let the changes propogate naturally? All the rsync mirrors sync every 30 minutes against one master rsync mirror (rsync1.us.gentoo.org) so all mirrors should have new updates within 30 minutes of them hitting the tree.


Right now I'm wating for xfree-4.3.0-r3 (Castle Rock driver baby :) ) How do I find out if this package has been released without rsyncing?
Back to top
View user's profile Send private message
gigel
Guru
Guru


Joined: 14 Jan 2003
Posts: 344
Location: .RO

PostPosted: Mon May 05, 2003 6:16 pm    Post subject: Reply with quote

hm, i'm behind a NAT server but(thx God) i am the only gentoo user in our 80 comps network...so i sync 1 time/day ....
for me is just fine
_________________
$emerge sux
:D
Back to top
View user's profile Send private message
klieber
Administrator
Administrator


Joined: 17 Apr 2002
Posts: 3657
Location: San Francisco, CA

PostPosted: Mon May 05, 2003 6:21 pm    Post subject: Reply with quote

henke wrote:
Right now I'm wating for xfree-4.3.0-r3 (Castle Rock driver baby :) ) How do I find out if this package has been released without rsyncing?

Nobody said you shouldn't rsync. Schedule a nightly cron job to rsync while you're sleeping. Then, check in the morning to see if it was released.

--kurt
_________________
The problem with political jokes is that they get elected
Back to top
View user's profile Send private message
henke
Apprentice
Apprentice


Joined: 30 Sep 2002
Posts: 165
Location: Stockholm, Sweden

PostPosted: Mon May 05, 2003 7:13 pm    Post subject: Reply with quote

Hmm, usually I only rsync once, twice a week or so. Is this less stressful for the rsync servers compared to rsyncing once a day?
Back to top
View user's profile Send private message
ebrostig
Bodhisattva
Bodhisattva


Joined: 20 Jul 2002
Posts: 3152
Location: Orlando, Fl

PostPosted: Mon May 05, 2003 7:26 pm    Post subject: Reply with quote

Interesting reading.

What if there existed an application that:
- Did not put a lot of stress on the servers
- Only returned a list of changes to the Portage tree since your last check
- Could be used with an argument, i.e chkportage xfree will return yes or no if there is an update in the Portage tree.
- Did not update the Portage tree

A lot of users say they rsync to see if an update to a package has been added to the Portage tree, hence they really don't need to update Portage in order to check.

By having such an application as outlined above, it should be easy to check for an update to Xfree and it would not put a huge stress on the rsync servers nor use a lot of bandwidth.

Comments?

Erik
_________________
'Yes, Firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
Back to top
View user's profile Send private message
klieber
Administrator
Administrator


Joined: 17 Apr 2002
Posts: 3657
Location: San Francisco, CA

PostPosted: Mon May 05, 2003 7:46 pm    Post subject: Reply with quote

henke wrote:
Hmm, usually I only rsync once, twice a week or so. Is this less stressful for the rsync servers compared to rsyncing once a day?

yes.

--kurt
_________________
The problem with political jokes is that they get elected
Back to top
View user's profile Send private message
ErnstlAT
n00b
n00b


Joined: 22 Nov 2002
Posts: 15
Location: Vienna, Austria

PostPosted: Mon May 05, 2003 8:37 pm    Post subject: Rsync Reply with quote

I currently have a setup of 3 gentoo boxes, of which one is the gateway to the internet and plan to add a few more machines for high-performance computing via openMosix.

Of course I have to sync them from time to time ('bout once a week each), so my question: Is there an easy way to do cached proxy-ing of both source tarball downloads (http/ftp) and rsync easily? What's your advice/experience?

I also guess that a couple of rsync's above average are due to NAT'ed workstations, which look like one comp doing a lot of syncs ...

Yours, Ernstl.at
Back to top
View user's profile Send private message
henke
Apprentice
Apprentice


Joined: 30 Sep 2002
Posts: 165
Location: Stockholm, Sweden

PostPosted: Mon May 05, 2003 8:55 pm    Post subject: Reply with quote

ebrostig wrote:
What if there existed an application that:
- Did not put a lot of stress on the servers
- Only returned a list of changes to the Portage tree since your last check
- Could be used with an argument, i.e chkportage xfree will return yes or no if there is an update in the Portage tree.
- Did not update the Portage tree


If I had this app I wouldn't have rsynced this week.

I guess you have to define what "list of changes" mean though. Returning information about everything that had changed would be equivalent to a rsync...
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16113
Location: Colorado

PostPosted: Mon May 05, 2003 9:37 pm    Post subject: Re: Rsync Reply with quote

ErnstlAT wrote:
Of course I have to sync them from time to time ('bout once a week each), so my question: Is there an easy way to do cached proxy-ing of both source tarball downloads (http/ftp) and rsync easily? What's your advice/experience?
You could designate a local mirror. Let the mirror sync, then have your other machines sync to it.
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
Caffeine
Guru
Guru


Joined: 17 Jul 2002
Posts: 401
Location: Melbourne, Australia

PostPosted: Mon May 05, 2003 9:39 pm    Post subject: Re: Rsync Reply with quote

I had a similar situation.
ErnstlAT wrote:
Of course I have to sync them from time to time ('bout once a week each), so my question: Is there an easy way to do cached proxy-ing of both source tarball downloads (http/ftp) and rsync easily? What's your advice/experience?

I use one machine as the master, (say host1) which is updated via rsync daily. Then on the others gentoo boxes, I periodically run:
Code:
rsync -av --progress --stats --delete --delete-after  --exclude='distfiles/*' --exclude='packages/*' -e ssh user@host1:/usr/portage /usr/portage

( I don't remember if the host1:/usr/portage needs a trailing slash or not - the script is at work, and I'm at home. )
Using ssh means you don't need to bother with an rsync server, just sshd.
Of course, if you leave out the --exclude='distfiles/*' part, you'll also get the distfiles. It might be posible to use the portage user also - I haven't tried.

Alternatively, you could export host1:/usr/portage via nfs/smb/whatever to the other gentoo boxes.
Back to top
View user's profile Send private message
ebrostig
Bodhisattva
Bodhisattva


Joined: 20 Jul 2002
Posts: 3152
Location: Orlando, Fl

PostPosted: Mon May 05, 2003 10:49 pm    Post subject: Reply with quote

henke wrote:
ebrostig wrote:
What if there existed an application that:
- Did not put a lot of stress on the servers
- Only returned a list of changes to the Portage tree since your last check
- Could be used with an argument, i.e chkportage xfree will return yes or no if there is an update in the Portage tree.
- Did not update the Portage tree


If I had this app I wouldn't have rsynced this week.

I guess you have to define what "list of changes" mean though. Returning information about everything that had changed would be equivalent to a rsync...


No, it really shouldn't return all the information that rsync does. Remember, rsync also downloads all the ebuilds and other files that has changed in the Portage tree.

My main thing about such an apps would be that it only asks for package changes since your last check, i.e it would return a list of type:
Code:

xfree-4.3.0-r2
mm-sources-2.5.69


Nothing else than this. Doing so would allow people who are waiting on updates to find their way into Portage to check and see if they are indeed updated rather than having to update the whole Portage tree. It should also be faster as it doesn't change anything, it's just a query tool.

Any other ideas/suggestions/comments?

Erik
_________________
'Yes, Firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
Back to top
View user's profile Send private message
puddpunk
l33t
l33t


Joined: 20 Jul 2002
Posts: 681
Location: New Zealand

PostPosted: Tue May 06, 2003 4:25 am    Post subject: Reply with quote

It, perhaps, looks best for Gentoo to design it's own system for syncing? I'm currently learning sockets in python, and I know dev time/effort is stretched enough as it is, but perhaps an idea for Gentoo 2.0? (By then, the rsync traffic would be unbareable).

I've decided, every bonus I get from work automatically goes to Gentoo (yay for paypal) but unfortunately, my last bonus was $3.75 :(

When I feel more comfortable with python, I will look at designing a new system for rsyncing (and perhaps even downloading distfiles? maybe the only way to get distfile diffs implimented reliably).

Anyway, just chucking things around here. Any comments? I know there is going to be the whole "Don't re-invent the wheel" thing going on, but the wheel we need needs to be quite specialist.
Back to top
View user's profile Send private message
Yak
Tux's lil' helper
Tux's lil' helper


Joined: 01 Sep 2002
Posts: 101

PostPosted: Tue May 06, 2003 6:14 am    Post subject: Reply with quote

Obviously the problem here is that the gentoo folks have done too good a job and just made it too easy to update every software with a few keystokes. Now these top few percent are hopelessly addicted to 'emerge rsync'. Better send them to redmond for rehab. :lol:
Back to top
View user's profile Send private message
fishhead
Apprentice
Apprentice


Joined: 07 Mar 2003
Posts: 162
Location: Pasadena, CA

PostPosted: Tue May 06, 2003 10:39 am    Post subject: Reply with quote

a) From what I can tell, `emerge rsync` gets a list of files from the server and then checks your system to see what it needs to update. This itself is already very efficent as it puts most of the load on the user machine. I think it also might make the developer's lives a bit eaiser.

b) The problem with returning only changes for a single package can be summed up in one word: dependancies.

I think the real effort here should be on making it easy for people to set up on-site mirrors (if they have multiple boxes) and making them aware of how much this helps by easing the rsync server load. The current rsync distrobution scheme is like a two level tree - there is some authoritative list and then there are mirrors of that list. The problem with this is that as the comunity grows linearly (and the gentoo comunity may be growing faster than this), so does the load on the servers. By using a multi-level tree, server load grows with aprox. log(number of users) instead.

Remember, the early internet had this problem too. One machine had a "hosts" file that the rest of the network used. Every other machine got it's host file from that machine. The problem was that this had to be distributed to each of N machines on the network. The amount of data that needed to be sent grew proportional to N^2. After a while, this became prohibitively expensive as far as update time. The solution is the domain name tree structure that is used today.

Reducing the amount of data sent to / from the servers can only reduce the load linearly, so this is far less critical than further distributing the load.
Back to top
View user's profile Send private message
betatim
n00b
n00b


Joined: 18 Apr 2002
Posts: 28

PostPosted: Tue May 06, 2003 11:32 am    Post subject: Reply with quote

i know we discussed using bittorrent for rsync/distfile distribution before and sort of decided that it has some flaws , i'm sorry i can't recall what exactly were the problems but i think it had to do with havign a way to find out if an ebuild was the real one and not a bogus one from a "hacker" same problem with the distfiles

what i know about p2p networks is that they all user hashs of soemsort to make sure one file even with different filenames has always the same content.
my question is how difficult is it or is it almost impossible to get fake ebuilds/sources into the protage tree if we use bittorrent for rsyncing and/or distfiles ?

lets assume the ebuilds are signed(that feature is coming up and will be implemented very soon i hope) and you have a md4 hash from a trusted server(as you trust the rsync mirrors at the moment)and a file name

as bittorrent relies on users sharign the files they downloaded would it be possible to make a ebuidl that has evil code inside and still noone will notice even if it is signed AND you would quite surely have a diffenrent hash value
_________________
Never underestimate the power of stupid people in large groups.
Back to top
View user's profile Send private message
Mystilleef
Guru
Guru


Joined: 27 Apr 2003
Posts: 561
Location: Earth

PostPosted: Tue May 06, 2003 12:41 pm    Post subject: RSYNC once a week... Reply with quote

Hello Gents,

Jesus Christ! rsyncing 1 - 2 times a day?!? Why on earth should anyone do that? The only time I rsync is at the end of the week, usually an early saturday morning, before updating my system.

I think for the sake of being considerate, it will only be fair if we rsynced and/or update our Gentoo system at most twice a week and at best once a week.

Remember if Microsoft was offering this same service, you'd probably be paying $50.00 a month for it. Public and network responsibility can only benefit all of us.

Mystilleef
_________________
simple, sleek and sexy text editor for gnome

"My logic is undeniable."
Back to top
View user's profile Send private message
Lovechild
Advocate
Advocate


Joined: 17 May 2002
Posts: 2858
Location: Århus, Denmark

PostPosted: Tue May 06, 2003 1:09 pm    Post subject: Reply with quote

The problem as I inderstand it is not the bandwidth usage (portage after all doesn't push to much data) but the fact that rsync sucks up every available cpu cycle on the server end - rsync is simply not suited for this kind of task.

If anything we should spare the servers CPU power, that would limit the problems we have with connection bottlenecks.

Rsync needs to die a quick, horrible, painful death, even though is has some nice features like proxy usage, it's just to damanding.

This is also why the "check for updates program" won't do much, even though it's a nice idea - it doesn't solve the problem that is rsyncs ressource waste, it only delays it... which might be nice now, but our userbase is contanstly expanding and we will hit the celling, so better to change the system now than later - less people will be affected. (yes I do know that as userbase expands the amounts of mirror might increase - but it's probably not proportional)

And while we are being considerate, let's just kill all the cvs ebuilds in portage - if you want to run CVS software you should need to get the "CVS powerpack" tarball containing all the CVS ebuilds, instead of sucking up server CPU power and bandwidth - tarballs can be distributed easily over several servers, CVS can't. Another thing speaking against CVS ebuilds in the fact that they are impossible to maintain, an error can seem to be in the ebuild, but infact be current CVS state or the other way around (just look at Dan Armaks KDE CVS changelog).
Back to top
View user's profile Send private message
fishhead
Apprentice
Apprentice


Joined: 07 Mar 2003
Posts: 162
Location: Pasadena, CA

PostPosted: Tue May 06, 2003 2:13 pm    Post subject: Reply with quote

In that case it might be better to do something like the following ...

1) Have a server based, gzip'd file list. Checksum the list. Name checksum file something like Portage-20030505-r0.md5
2) Client downloads checksum and compares with local checksum.
3) If the checksums are diffrent, client makes sure date and revision of the checksum are for a later version of the portage tree.
4) If the checksum is for a later version, download the file list.
5) Do a `diff` on the latest list with your list.
6) Since the diff will (more or less) produce a set of changes consistant with changes to the portage tree (i.e. get all ebuilds with a +xxxxxx-1.23.4-r5 line) we'll know what to get.
7) Grab nesicary ebuild files.

Of course, steps 5 and 6 could be done any number of ways besides using diff. Also, you could do this all via http or ftp - in fact, if the program it uses to get the information is chooseable by the user, you could do this any number of ways, like via NFS or something. This should prevent people from sucking up all the server CPU time even if they do something for which they should be shot, like:
Code:
#!/bin/sh
while true
do
        emerge rsync
done


edit: <+3 hours>

The basic idea with the above plan (which I failed to mention) is to transfer nearly all the CPU load to the user's machine. All the nessacary comparisons are done there and all the sync mirror does is function as a repository for the ebuild scripts. With rsync, the mirror is busy matching checksums sent from the original machine. It is very bandwidth efficent, but it is CPU intensive. If bandwidth realy is a concern, then rsync is a little better; however, when CPU usage is a concern, it's better to have the server just handing out gzip'd files (and even a modest machine can do this).


Last edited by fishhead on Tue May 06, 2003 6:34 pm; edited 1 time in total
Back to top
View user's profile Send private message
PowerFactor
Veteran
Veteran


Joined: 30 Jan 2003
Posts: 1692
Location: out of it

PostPosted: Tue May 06, 2003 3:27 pm    Post subject: Reply with quote

Lovechild wrote:
The problem as I inderstand it is not the bandwidth usage (portage after all doesn't push to much data) but the fact that rsync sucks up every available cpu cycle on the server end - rsync is simply not suited for this kind of task.

If anything we should spare the servers CPU power, that would limit the problems we have with connection bottlenecks.

Rsync needs to die a quick, horrible, painful death, even though is has some nice features like proxy usage, it's just to damanding.
I don't really agree wiht that. From my understanding cpu cycles are generally cheaper than bandwidth. Yes the current problem with excessive rsyncing seems to be cpu hogging. But does that mean we should waste bandwidth to save cpu cycles? If we could come up with something non cpu intensive that used a little bandwidth as rsync then I would be all for it. But I'm not ready to toss rsync to the wolves just yet.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Goto page Previous  1, 2, 3, 4  Next
Page 2 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum