View previous topic :: View next topic |
Author |
Message |
sibbe n00b
Joined: 31 Jan 2003 Posts: 35 Location: Helsinki, Finland
|
Posted: Mon May 05, 2003 2:13 pm Post subject: |
|
|
Maybe this is a good time to bring out the discussion about different sync methods. The choice in Gentoo is rsync, but eg. FreeBSD primarily uses cvsup, NetBSD sup (IIRC) and OpenBSD supports various methods too.
If Gentoo (officially) supported other methods for syncing the portage tree it would (ofcourse) lighten the load on rsync servers.
All methods have their flaws (even rsync), cvsup is written in m3 and therefore isn't very portable etc.
I know, this is a little off topic, since it's not going to solve any bandwidth usage problems. I just think there should be alternatives. _________________ jyrki muukkonen |
|
Back to top |
|
|
d3c3it l33t
Joined: 01 Mar 2003 Posts: 765 Location: Manchester, UK
|
Posted: Mon May 05, 2003 2:15 pm Post subject: |
|
|
well i was 1 of the offenders until about a week ago when a msg popped up on the rsync mirror i use *uk* saying more than twice aday will be banned or considered abusive since then its been only once aday because i always find some package has an update but i think thats totally fair inplementing this after all its free:) _________________ Some people go to counselling,
others use linux |
|
Back to top |
|
|
djco Retired Dev
Joined: 29 Mar 2003 Posts: 67 Location: 52.36, 4.89
|
Posted: Mon May 05, 2003 2:23 pm Post subject: |
|
|
I think changing the SYNC var in make.conf is not in the installation manual. It might help saying something about it there, so that newbies will more frequently change their SYNC var (thus conserving bandwidth for the default server). |
|
Back to top |
|
|
Caffeine Guru
Joined: 17 Jul 2002 Posts: 401 Location: Melbourne, Australia
|
Posted: Mon May 05, 2003 3:31 pm Post subject: |
|
|
How about a package similar to app-admin/mirrorselect called app-admin/rsyncselect ? Or package these two apps together?
Then add a step to the install instructions "Run mirrorselect -a 3 and rsyncselect" |
|
Back to top |
|
|
shadow255 Guru
Joined: 04 Apr 2003 Posts: 412
|
Posted: Mon May 05, 2003 3:37 pm Post subject: |
|
|
Manuzhai wrote: | I think changing the SYNC var in make.conf is not in the installation manual. It might help saying something about it there, so that newbies will more frequently change their SYNC var (thus conserving bandwidth for the default server). |
This should probably be filed as a bug, but I'll add that it's not necessarily a matter of changing, but possibly a matter of adding the SYNC variable. I couldn't find it in my make.conf, nor any comments about it. I hope there are plans to add mention of it in future revisions to portage. _________________ Vogon poetry is of course the third worst in the Universe. -- Douglas Adams, The Hitchhiker's Guide to the Galaxy |
|
Back to top |
|
|
carambola5 Apprentice
Joined: 10 Jul 2002 Posts: 214
|
Posted: Mon May 05, 2003 5:50 pm Post subject: |
|
|
csnyder wrote: | I have a local rsync mirror that resyncs hourly. Is this considered against rsync etiquette? It's serving 8 Gentoo boxen at the moment. |
Since we have generally established that this is a "Bad Idea," allow me to propose the following solution (This will only apply to people with a dedicated server/computer with ~100% uptime):
Install a (yet-to-be-written) program on the server. I will call this ghost program: smart-sync.
Smart-sync will have a few rules built in, such as:
- Never sync more than once per 4 hours unless (forced && !cron).
- Never sync more than 4 times per day (ever).
- Execute a cron-like sync once per day subject to rules #1 & #2.
(All of these rules are customizable)
The client machines will have the server set as the sync mirror.
When a client requests an emerge sync, the following steps occur:
- Client: requests rsync with server.
- Server: if a log check shows that syncing with official gentoo sync mirrors would not violate a rule, the server will perform an rsync.
- Server: whether the official rsync was executed or not, will provide sync service to client
- Client: receives rsync service from server.
This way, you only ever have one computer sync with the official mirrors (which it looks like csnyder has already taken care of), but it also ensures that etiquette rules are not broken. The above values that I have in the rules are just examples, and I think those should be the approximate defaults.
One last thing.... the clients' sync timeout variables might have to be significantly extended for this to work.
And there it is: a transparent, etiquette-obeying, and clever solution. |
|
Back to top |
|
|
henke Apprentice
Joined: 30 Sep 2002 Posts: 165 Location: Stockholm, Sweden
|
Posted: Mon May 05, 2003 6:08 pm Post subject: |
|
|
klieber wrote: | Have you considered waiting 4 or even (gasp) 6 hours to let the changes propogate naturally? All the rsync mirrors sync every 30 minutes against one master rsync mirror (rsync1.us.gentoo.org) so all mirrors should have new updates within 30 minutes of them hitting the tree. |
Right now I'm wating for xfree-4.3.0-r3 (Castle Rock driver baby ) How do I find out if this package has been released without rsyncing? |
|
Back to top |
|
|
gigel Guru
Joined: 14 Jan 2003 Posts: 369 Location: .se/.ro
|
Posted: Mon May 05, 2003 6:16 pm Post subject: |
|
|
hm, i'm behind a NAT server but(thx God) i am the only gentoo user in our 80 comps network...so i sync 1 time/day ....
for me is just fine _________________ $emerge sux
|
|
Back to top |
|
|
klieber Bodhisattva
Joined: 17 Apr 2002 Posts: 3657 Location: San Francisco, CA
|
Posted: Mon May 05, 2003 6:21 pm Post subject: |
|
|
henke wrote: | Right now I'm wating for xfree-4.3.0-r3 (Castle Rock driver baby ) How do I find out if this package has been released without rsyncing? |
Nobody said you shouldn't rsync. Schedule a nightly cron job to rsync while you're sleeping. Then, check in the morning to see if it was released.
--kurt _________________ The problem with political jokes is that they get elected |
|
Back to top |
|
|
henke Apprentice
Joined: 30 Sep 2002 Posts: 165 Location: Stockholm, Sweden
|
Posted: Mon May 05, 2003 7:13 pm Post subject: |
|
|
Hmm, usually I only rsync once, twice a week or so. Is this less stressful for the rsync servers compared to rsyncing once a day? |
|
Back to top |
|
|
ebrostig Bodhisattva
Joined: 20 Jul 2002 Posts: 3152 Location: Orlando, Fl
|
Posted: Mon May 05, 2003 7:26 pm Post subject: |
|
|
Interesting reading.
What if there existed an application that:
- Did not put a lot of stress on the servers
- Only returned a list of changes to the Portage tree since your last check
- Could be used with an argument, i.e chkportage xfree will return yes or no if there is an update in the Portage tree.
- Did not update the Portage tree
A lot of users say they rsync to see if an update to a package has been added to the Portage tree, hence they really don't need to update Portage in order to check.
By having such an application as outlined above, it should be easy to check for an update to Xfree and it would not put a huge stress on the rsync servers nor use a lot of bandwidth.
Comments?
Erik _________________ 'Yes, Firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.' |
|
Back to top |
|
|
klieber Bodhisattva
Joined: 17 Apr 2002 Posts: 3657 Location: San Francisco, CA
|
Posted: Mon May 05, 2003 7:46 pm Post subject: |
|
|
henke wrote: | Hmm, usually I only rsync once, twice a week or so. Is this less stressful for the rsync servers compared to rsyncing once a day? |
yes.
--kurt _________________ The problem with political jokes is that they get elected |
|
Back to top |
|
|
ErnstlAT n00b
Joined: 22 Nov 2002 Posts: 15 Location: Vienna, Austria
|
Posted: Mon May 05, 2003 8:37 pm Post subject: Rsync |
|
|
I currently have a setup of 3 gentoo boxes, of which one is the gateway to the internet and plan to add a few more machines for high-performance computing via openMosix.
Of course I have to sync them from time to time ('bout once a week each), so my question: Is there an easy way to do cached proxy-ing of both source tarball downloads (http/ftp) and rsync easily? What's your advice/experience?
I also guess that a couple of rsync's above average are due to NAT'ed workstations, which look like one comp doing a lot of syncs ...
Yours, Ernstl.at |
|
Back to top |
|
|
henke Apprentice
Joined: 30 Sep 2002 Posts: 165 Location: Stockholm, Sweden
|
Posted: Mon May 05, 2003 8:55 pm Post subject: |
|
|
ebrostig wrote: | What if there existed an application that:
- Did not put a lot of stress on the servers
- Only returned a list of changes to the Portage tree since your last check
- Could be used with an argument, i.e chkportage xfree will return yes or no if there is an update in the Portage tree.
- Did not update the Portage tree |
If I had this app I wouldn't have rsynced this week.
I guess you have to define what "list of changes" mean though. Returning information about everything that had changed would be equivalent to a rsync... |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20067
|
Posted: Mon May 05, 2003 9:37 pm Post subject: Re: Rsync |
|
|
ErnstlAT wrote: | Of course I have to sync them from time to time ('bout once a week each), so my question: Is there an easy way to do cached proxy-ing of both source tarball downloads (http/ftp) and rsync easily? What's your advice/experience? | You could designate a local mirror. Let the mirror sync, then have your other machines sync to it. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
Caffeine Guru
Joined: 17 Jul 2002 Posts: 401 Location: Melbourne, Australia
|
Posted: Mon May 05, 2003 9:39 pm Post subject: Re: Rsync |
|
|
I had a similar situation.
ErnstlAT wrote: | Of course I have to sync them from time to time ('bout once a week each), so my question: Is there an easy way to do cached proxy-ing of both source tarball downloads (http/ftp) and rsync easily? What's your advice/experience?
|
I use one machine as the master, (say host1) which is updated via rsync daily. Then on the others gentoo boxes, I periodically run:
Code: | rsync -av --progress --stats --delete --delete-after --exclude='distfiles/*' --exclude='packages/*' -e ssh user@host1:/usr/portage /usr/portage |
( I don't remember if the host1:/usr/portage needs a trailing slash or not - the script is at work, and I'm at home. )
Using ssh means you don't need to bother with an rsync server, just sshd.
Of course, if you leave out the --exclude='distfiles/*' part, you'll also get the distfiles. It might be posible to use the portage user also - I haven't tried.
Alternatively, you could export host1:/usr/portage via nfs/smb/whatever to the other gentoo boxes. |
|
Back to top |
|
|
ebrostig Bodhisattva
Joined: 20 Jul 2002 Posts: 3152 Location: Orlando, Fl
|
Posted: Mon May 05, 2003 10:49 pm Post subject: |
|
|
henke wrote: | ebrostig wrote: | What if there existed an application that:
- Did not put a lot of stress on the servers
- Only returned a list of changes to the Portage tree since your last check
- Could be used with an argument, i.e chkportage xfree will return yes or no if there is an update in the Portage tree.
- Did not update the Portage tree |
If I had this app I wouldn't have rsynced this week.
I guess you have to define what "list of changes" mean though. Returning information about everything that had changed would be equivalent to a rsync... |
No, it really shouldn't return all the information that rsync does. Remember, rsync also downloads all the ebuilds and other files that has changed in the Portage tree.
My main thing about such an apps would be that it only asks for package changes since your last check, i.e it would return a list of type:
Code: |
xfree-4.3.0-r2
mm-sources-2.5.69
|
Nothing else than this. Doing so would allow people who are waiting on updates to find their way into Portage to check and see if they are indeed updated rather than having to update the whole Portage tree. It should also be faster as it doesn't change anything, it's just a query tool.
Any other ideas/suggestions/comments?
Erik _________________ 'Yes, Firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.' |
|
Back to top |
|
|
puddpunk l33t
Joined: 20 Jul 2002 Posts: 681 Location: New Zealand
|
Posted: Tue May 06, 2003 4:25 am Post subject: |
|
|
It, perhaps, looks best for Gentoo to design it's own system for syncing? I'm currently learning sockets in python, and I know dev time/effort is stretched enough as it is, but perhaps an idea for Gentoo 2.0? (By then, the rsync traffic would be unbareable).
I've decided, every bonus I get from work automatically goes to Gentoo (yay for paypal) but unfortunately, my last bonus was $3.75
When I feel more comfortable with python, I will look at designing a new system for rsyncing (and perhaps even downloading distfiles? maybe the only way to get distfile diffs implimented reliably).
Anyway, just chucking things around here. Any comments? I know there is going to be the whole "Don't re-invent the wheel" thing going on, but the wheel we need needs to be quite specialist. |
|
Back to top |
|
|
Yak Tux's lil' helper
Joined: 01 Sep 2002 Posts: 107
|
Posted: Tue May 06, 2003 6:14 am Post subject: |
|
|
Obviously the problem here is that the gentoo folks have done too good a job and just made it too easy to update every software with a few keystokes. Now these top few percent are hopelessly addicted to 'emerge rsync'. Better send them to redmond for rehab. |
|
Back to top |
|
|
fishhead Apprentice
Joined: 07 Mar 2003 Posts: 162 Location: Pasadena, CA
|
Posted: Tue May 06, 2003 10:39 am Post subject: |
|
|
a) From what I can tell, `emerge rsync` gets a list of files from the server and then checks your system to see what it needs to update. This itself is already very efficent as it puts most of the load on the user machine. I think it also might make the developer's lives a bit eaiser.
b) The problem with returning only changes for a single package can be summed up in one word: dependancies.
I think the real effort here should be on making it easy for people to set up on-site mirrors (if they have multiple boxes) and making them aware of how much this helps by easing the rsync server load. The current rsync distrobution scheme is like a two level tree - there is some authoritative list and then there are mirrors of that list. The problem with this is that as the comunity grows linearly (and the gentoo comunity may be growing faster than this), so does the load on the servers. By using a multi-level tree, server load grows with aprox. log(number of users) instead.
Remember, the early internet had this problem too. One machine had a "hosts" file that the rest of the network used. Every other machine got it's host file from that machine. The problem was that this had to be distributed to each of N machines on the network. The amount of data that needed to be sent grew proportional to N^2. After a while, this became prohibitively expensive as far as update time. The solution is the domain name tree structure that is used today.
Reducing the amount of data sent to / from the servers can only reduce the load linearly, so this is far less critical than further distributing the load. |
|
Back to top |
|
|
betatim n00b
Joined: 18 Apr 2002 Posts: 28
|
Posted: Tue May 06, 2003 11:32 am Post subject: |
|
|
i know we discussed using bittorrent for rsync/distfile distribution before and sort of decided that it has some flaws , i'm sorry i can't recall what exactly were the problems but i think it had to do with havign a way to find out if an ebuild was the real one and not a bogus one from a "hacker" same problem with the distfiles
what i know about p2p networks is that they all user hashs of soemsort to make sure one file even with different filenames has always the same content.
my question is how difficult is it or is it almost impossible to get fake ebuilds/sources into the protage tree if we use bittorrent for rsyncing and/or distfiles ?
lets assume the ebuilds are signed(that feature is coming up and will be implemented very soon i hope) and you have a md4 hash from a trusted server(as you trust the rsync mirrors at the moment)and a file name
as bittorrent relies on users sharign the files they downloaded would it be possible to make a ebuidl that has evil code inside and still noone will notice even if it is signed AND you would quite surely have a diffenrent hash value _________________ Never underestimate the power of stupid people in large groups. |
|
Back to top |
|
|
Mystilleef Guru
Joined: 27 Apr 2003 Posts: 561 Location: Earth
|
Posted: Tue May 06, 2003 12:41 pm Post subject: RSYNC once a week... |
|
|
Hello Gents,
Jesus Christ! rsyncing 1 - 2 times a day?!? Why on earth should anyone do that? The only time I rsync is at the end of the week, usually an early saturday morning, before updating my system.
I think for the sake of being considerate, it will only be fair if we rsynced and/or update our Gentoo system at most twice a week and at best once a week.
Remember if Microsoft was offering this same service, you'd probably be paying $50.00 a month for it. Public and network responsibility can only benefit all of us.
Mystilleef _________________ simple, sleek and sexy text editor for gnome
"My logic is undeniable." |
|
Back to top |
|
|
Lovechild Advocate
Joined: 17 May 2002 Posts: 2858 Location: Århus, Denmark
|
Posted: Tue May 06, 2003 1:09 pm Post subject: |
|
|
The problem as I inderstand it is not the bandwidth usage (portage after all doesn't push to much data) but the fact that rsync sucks up every available cpu cycle on the server end - rsync is simply not suited for this kind of task.
If anything we should spare the servers CPU power, that would limit the problems we have with connection bottlenecks.
Rsync needs to die a quick, horrible, painful death, even though is has some nice features like proxy usage, it's just to damanding.
This is also why the "check for updates program" won't do much, even though it's a nice idea - it doesn't solve the problem that is rsyncs ressource waste, it only delays it... which might be nice now, but our userbase is contanstly expanding and we will hit the celling, so better to change the system now than later - less people will be affected. (yes I do know that as userbase expands the amounts of mirror might increase - but it's probably not proportional)
And while we are being considerate, let's just kill all the cvs ebuilds in portage - if you want to run CVS software you should need to get the "CVS powerpack" tarball containing all the CVS ebuilds, instead of sucking up server CPU power and bandwidth - tarballs can be distributed easily over several servers, CVS can't. Another thing speaking against CVS ebuilds in the fact that they are impossible to maintain, an error can seem to be in the ebuild, but infact be current CVS state or the other way around (just look at Dan Armaks KDE CVS changelog). |
|
Back to top |
|
|
fishhead Apprentice
Joined: 07 Mar 2003 Posts: 162 Location: Pasadena, CA
|
Posted: Tue May 06, 2003 2:13 pm Post subject: |
|
|
In that case it might be better to do something like the following ...
1) Have a server based, gzip'd file list. Checksum the list. Name checksum file something like Portage-20030505-r0.md5
2) Client downloads checksum and compares with local checksum.
3) If the checksums are diffrent, client makes sure date and revision of the checksum are for a later version of the portage tree.
4) If the checksum is for a later version, download the file list.
5) Do a `diff` on the latest list with your list.
6) Since the diff will (more or less) produce a set of changes consistant with changes to the portage tree (i.e. get all ebuilds with a +xxxxxx-1.23.4-r5 line) we'll know what to get.
7) Grab nesicary ebuild files.
Of course, steps 5 and 6 could be done any number of ways besides using diff. Also, you could do this all via http or ftp - in fact, if the program it uses to get the information is chooseable by the user, you could do this any number of ways, like via NFS or something. This should prevent people from sucking up all the server CPU time even if they do something for which they should be shot, like:
Code: | #!/bin/sh
while true
do
emerge rsync
done |
edit: <+3 hours>
The basic idea with the above plan (which I failed to mention) is to transfer nearly all the CPU load to the user's machine. All the nessacary comparisons are done there and all the sync mirror does is function as a repository for the ebuild scripts. With rsync, the mirror is busy matching checksums sent from the original machine. It is very bandwidth efficent, but it is CPU intensive. If bandwidth realy is a concern, then rsync is a little better; however, when CPU usage is a concern, it's better to have the server just handing out gzip'd files (and even a modest machine can do this).
Last edited by fishhead on Tue May 06, 2003 6:34 pm; edited 1 time in total |
|
Back to top |
|
|
PowerFactor Veteran
Joined: 30 Jan 2003 Posts: 1693 Location: out of it
|
Posted: Tue May 06, 2003 3:27 pm Post subject: |
|
|
Lovechild wrote: | The problem as I inderstand it is not the bandwidth usage (portage after all doesn't push to much data) but the fact that rsync sucks up every available cpu cycle on the server end - rsync is simply not suited for this kind of task.
If anything we should spare the servers CPU power, that would limit the problems we have with connection bottlenecks.
Rsync needs to die a quick, horrible, painful death, even though is has some nice features like proxy usage, it's just to damanding. | I don't really agree wiht that. From my understanding cpu cycles are generally cheaper than bandwidth. Yes the current problem with excessive rsyncing seems to be cpu hogging. But does that mean we should waste bandwidth to save cpu cycles? If we could come up with something non cpu intensive that used a little bandwidth as rsync then I would be all for it. But I'm not ready to toss rsync to the wolves just yet. |
|
Back to top |
|
|
|