Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
HOWTO:Local Rsync Mirror
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5, 6  Next  
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
KpR2000
n00b
n00b


Joined: 18 Aug 2003
Posts: 55

PostPosted: Thu Jun 10, 2004 10:09 am    Post subject: Reply with quote

flybynite wrote:

The easiest fix for the name lookup failures in your logs is to list the ip's and hostsnames in /etc/hosts.

The ip's for the rsync server and all other pc's in my network are listed there.
I have also not used an hostname in /etc/make.conf for the SYNC variable:
SYNC="rsync://192.168.3.84/gentoo-portage"

Quote:

I noticed that you seem to be comparing your rsync server speed with someones distfile cache speed in this thread. Two different things.

Now that you have your config file fixed, what speeds are you getting?

Thats my current state:

Code:

receiving file list ...
1 file to consider
timestamp.chk
          32 100%   31.25kB/s    0:00:00  (1, 100.0% of 1)

Number of files: 1
Number of files transferred: 1
Total file size: 32 bytes
Total transferred file size: 32 bytes
Literal data: 32 bytes
Matched data: 0 bytes
File list size: 32
Total bytes written: 226
Total bytes read: 437

wrote 226 bytes  read 437 bytes  442.00 bytes/sec
total size is 32  speedup is 0.05


and the collected packages counter is really slow in contrast to the "internet sync". > like the previous answer by "CarpJA"

Greetings
Back to top
View user's profile Send private message
dhurt
Apprentice
Apprentice


Joined: 14 May 2003
Posts: 278
Location: Davis, CA

PostPosted: Thu Jun 10, 2004 4:04 pm    Post subject: Reply with quote

KpR2000 wrote:

Code:

receiving file list ...
1 file to consider
timestamp.chk
          32 100%   31.25kB/s    0:00:00  (1, 100.0% of 1)

Number of files: 1
Number of files transferred: 1
Total file size: 32 bytes
Total transferred file size: 32 bytes
Literal data: 32 bytes
Matched data: 0 bytes
File list size: 32
Total bytes written: 226
Total bytes read: 437

wrote 226 bytes  read 437 bytes  442.00 bytes/sec
total size is 32  speedup is 0.05




This is way to little data to get the speed from. The time stampfile contains just this:
Code:

Thu Jun 10 15:06:57 UTC 2004


Downloading from the lan or the internet even over a 56K modem on something this small will be the same. That is not the bottleneck. The problem with trying to see a speed increase on the emerge sync is that you will not see one. You are transfering lots of very small files. This is NOT a bandwidth intensive process. Look at my results from the main portage tree sync. As you can see the internet sync is faster. Maybe by about 5 seconds out of 40. Nothing big. Also the amount of data transfered is 300K which is tiny.
Code:


<Internet>
------------------------------------------------------------
Number of files: 85256
Number of files transferred: 182
Total file size: 69887332 bytes
Total transferred file size: 341443 bytes
Literal data: 341443 bytes
Matched data: 0 bytes
File list size: 1944462
Total bytes written: 3825
Total bytes read: 2083098

wrote 3825 bytes  read 2083098 bytes  46897.15 bytes/sec
total size is 69887332  speedup is 33.49

<Local Mirror>
------------------------------------------------------------
Number of files: 86006
Number of files transferred: 180
Total file size: 69896735 bytes
Total transferred file size: 340442 bytes
Literal data: 340442 bytes
Matched data: 0 bytes
File list size: 2036955
Total bytes written: 3785
Total bytes read: 2177599

wrote 3785 bytes  read 2177599 bytes  37937.11 bytes/sec
total size is 69896735  speedup is 32.04


This process I found is HIGHLY dependent on server load. I tried syncing while the main server was caching the portage files and the process was slow as can be. I know that rsyncing causes major load on the server. Not just disk activity, but also computational activity. So the bottleneck can be somewhere else, probably the speed of the computer. The gentoo servers are usally dual processor servers optimized to be a rsync mirror and your desktop/local server is probably not and so the process will be slower. But it will not be an unbearably longder time, the actual rsync time is very quick and caching the portage tree takes much longer than the rsync process anyway. The reason for creating a local mirror is not so that it is faster, as you can see mine was slower. But to reduce load on the gentoo servers. You really do not need to have two copies of the same information downloaded. That is wasteful. If you want a speed increase Take a look here:

https://forums.gentoo.org/viewtopic.php?t=173226&highlight=

This http-replicator and allows you to cache all the portage files and will then serve them up at lan speeds :D
_________________
"And isn't sanity really just a one-trick pony, anyway? I mean, all you get is one trick, rational thinking, but when you're good and crazy, ooh ooh ooh, the sky's the limit!" -- The Tick
Back to top
View user's profile Send private message
KpR2000
n00b
n00b


Joined: 18 Aug 2003
Posts: 55

PostPosted: Thu Jun 10, 2004 4:29 pm    Post subject: Reply with quote

You are right with your arguments. I will give http-replicator a try.

Thx
Back to top
View user's profile Send private message
mxc
Guru
Guru


Joined: 05 Mar 2003
Posts: 442
Location: South Africa

PostPosted: Sat Jun 12, 2004 6:31 am    Post subject: Reply with quote

Is it possible to set the client up to fallback to an external rsync server if it cannout find the file it needs on the local server? I have an adsl connection with a cap limit. I often need to install machines over night and I would prefer the machine to finish compiling than save bandwidth in this case.

thanks
Back to top
View user's profile Send private message
dhurt
Apprentice
Apprentice


Joined: 14 May 2003
Posts: 278
Location: Davis, CA

PostPosted: Sat Jun 12, 2004 6:42 am    Post subject: Reply with quote

You are confusing an rsync mirror and a package mirror. The rsync mirror which this post is about will not have any packages in it. It allows you to:
Code:

# emerge sync

on just one machine and then replicate that effort to other machines on the lan to reduce the load on the Gentoo mirrors. It just syncronizes /usr/portage, but excludes /usr/portage/distfiles and /usr/portage/packages. So there are no files that are skipped unless you have a funky setup.

I think you are refering to setting up a local package mirror which you would want to use http-replicator which is another part to the system. It basically caches all the files that you have downloaded for building purposes locally. If it cannot find a file it then it downloads it from the internet.

A link to it is about 3 posts above.
_________________
"And isn't sanity really just a one-trick pony, anyway? I mean, all you get is one trick, rational thinking, but when you're good and crazy, ooh ooh ooh, the sky's the limit!" -- The Tick
Back to top
View user's profile Send private message
mxc
Guru
Guru


Joined: 05 Mar 2003
Posts: 442
Location: South Africa

PostPosted: Sun Jun 13, 2004 7:02 am    Post subject: Reply with quote

Thanks KillBill,

In the one post I found the poster had set up a rsync 'link' to the portage/distfiles directory. Would just sinking this with another machine not mean that I have all the files the other has and there will only be a need to download ones which I don't already have?

Would rsyncing the distfiles dir skip some important step that emerge needs?
I will look into setting up the http proxy as a longer term solution later.
thanks
Back to top
View user's profile Send private message
dhurt
Apprentice
Apprentice


Joined: 14 May 2003
Posts: 278
Location: Davis, CA

PostPosted: Sun Jun 13, 2004 4:11 pm    Post subject: Reply with quote

The problem with the rsync solution is that there is not a fall back if the package is not on the main server. A poor solution because if you are upgrading alot of packages and it cannot download the file halfway through it, the ebuild will fail. You then have to change your mirror and download the file manually, or go to your server, download the file manually, change back your mirror so it is pointed at your local server. Finally continue the ebuild.

I know I used it for about 2 months. It was a pain to keep up in the long run and not transparent at all.

http-replicator on the other hand is very seemless. It is a proxy between you and the internet just for the purposes of getting distfiles. It does not mess with your traffic in anyother way. How it works, is all requests for files come through the proxy now. If it has the file locally, it serves them up at LAN speeds. If it does not have the file, it fetches the file locally to the proxy and sending it to the requesting machine at the same time. It is very seemless. So if you have the file on your proxy it, comes in at LAN speeds, if not it comes in at the speed of your connection. I have been using it for 2-3 weeks now and it is excellent.
_________________
"And isn't sanity really just a one-trick pony, anyway? I mean, all you get is one trick, rational thinking, but when you're good and crazy, ooh ooh ooh, the sky's the limit!" -- The Tick
Back to top
View user's profile Send private message
seringen
Apprentice
Apprentice


Joined: 03 Aug 2003
Posts: 163
Location: berkeley, california

PostPosted: Tue Jun 15, 2004 6:29 am    Post subject: just adding my data Reply with quote

Well, other than a stupid carriage return error in a config file, everything worked immediately and beautifully.

To give people an idea about what would be typical performance for most people here's an example from my network:

First the rsync server over ssh, a VIA Nehemiah computer
Code:
# hdparm -tT /dev/hda

/dev/hda:
 Timing buffer-cache reads:   520 MB in  2.01 seconds = 258.10 MB/sec
 Timing buffered disk reads:  122 MB in  3.06 seconds =  39.89 MB/sec


Now the connecting computer, a PIII Laptop with a slow, normal harddrive
Code:
# hdparm -tT /dev/hda

/dev/hda:
 Timing buffer-cache reads:   416 MB in  2.00 seconds = 207.51 MB/sec
 Timing buffered disk reads:   54 MB in  3.12 seconds =  17.33 MB/sec


Over fast ethernet it gets
Code:
39833.50 bytes/sec


All in all not bad and without any optimizations of any sort, and it really is a good thing to take some of the weight off of the main mirrors - it's easy to forget how heavy rsync is on servers.
Back to top
View user's profile Send private message
Cetanu
n00b
n00b


Joined: 16 Jun 2004
Posts: 1

PostPosted: Wed Jun 16, 2004 7:53 pm    Post subject: Local mirror outside /usr/portage Reply with quote

Is there any reason to keep portage for unofficial mirror outside server's /usr/portage directory?

I am asking because I installed app-admin/gentoo-rsync-mirror package today and portage is kept in separate directory by default (/opt/gentoo-rsync/portage/). I use configuration with portage keep in /usr/portage for half of year and I haven't experienced any problems yet...
Back to top
View user's profile Send private message
dhurt
Apprentice
Apprentice


Joined: 14 May 2003
Posts: 278
Location: Davis, CA

PostPosted: Wed Jun 16, 2004 9:57 pm    Post subject: Reply with quote

Works great with the directory /usr/portage/. Maybe on the server configuration they like to mount the /usr directory read only until update times and storing this in opt would allow them to do this and still have an upto date mirror.
_________________
"And isn't sanity really just a one-trick pony, anyway? I mean, all you get is one trick, rational thinking, but when you're good and crazy, ooh ooh ooh, the sky's the limit!" -- The Tick
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Thu Jun 17, 2004 9:03 am    Post subject: Reply with quote

New HOWTO version 1.2 !


I added a note about the rsync daemon nicelevel that my /etc/init.d/rsyncd script sets on starting. This applies only if you use my script on your machine.

My script sets the nicelevel to a lower priority (15) than normal (0) because I spend time logged in on my rsync server box and use it as a normal desktop. If you do also, leave it set as is. If you only use your rsync server as a server go ahead and set the nicelevel to 0 to give normal priority to make rsync run at normal speed.
Back to top
View user's profile Send private message
JSharku
Apprentice
Apprentice


Joined: 09 Feb 2003
Posts: 189
Location: Belgium

PostPosted: Thu Jun 17, 2004 8:53 pm    Post subject: Reply with quote

Just a quick note on packages and distfiles; it's better to put the following in your rsyncd.conf:
Code:

# excluding packages is optional, if you don't use --buildpkg you don't need it
exclude = distfiles/ packages/

instead of
Code:

exclude = distfiles packages

NOTE THE TRAILING /'s
If you don't add the slashes rsync will exclude anything ending in either distfiles or packages, not just those directories. Not that big a deal you might say, were it not that every /usr/portage/profiles/<specific profile>/ directory has a file in it called packages which portage uses to determine what to build when you bootstrap or emerge system. Those files get deleted by rsync on the client machines if you don't add the trailing slashes, resulting in rebuilds, rebootstraps, resyncs and tons of frustration... at least it did for me until I finally figured this out.

Sharku
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Fri Jun 18, 2004 5:41 am    Post subject: Reply with quote

JSharku wrote:
Just a quick note on packages and distfiles; it's better to put the following in your rsyncd.conf:
Code:

# excluding packages is optional, if you don't use --buildpkg you don't need it
exclude = distfiles/ packages/



I see your point about what a trailing / does and you are correct, but I'd bet you're doing this for the wrong reasons and you don't need it either!!

First, you're correct about the trailing slash in the exclude pattern ensuring it only excludes directories and not files. I've updated my howto just to make it clear what is being excluded, but it probably doesn't matter if any user changes their config.

The reason it doesn't matter is were dealing with the SERVER. The exclude in the SERVER config makes it impossible for a client to TRY and get distfiles (or packages, in your config) by rsync.

But portage will NOT request those files!!!!!


Look at file:/usr/lib/portage/bin/emerge for rsync_flags and you'll find that the portage CLIENT sets rsync options that automatically skips distfiles, local, and packages.

Code:

         "--exclude='distfiles/*'",   # Exclude distfiles from consideration
         "--exclude='local/*'",       # Exclude local     from consideration
         "--exclude='packages/*'",    # Exclude packages  from consideration




So to wrap this up:

1. Gentoo's portage automatically skips distfiles, local, and packages when syncing so you don't have to exclude these in the SERVER config, and they won't ever appear on clients when you 'emerge sync'.

2. Excluding distfiles in the SERVER config only serves to prevent anyone from abusing the server using their own rsync command. It is possible to create your own rsync request that would try to suck down all of distfiles from the public rsync servers. Excluding distfiles on the SERVER prevents a user from doing this. I left this in as protection for those running my local rsync server in a college campus, for example.

However, if you're running a semi public server on a gentoo box with alot of packages and your afraid someone might try to craft an rsync command to get all your packages, exclude distfiles/ packages/ per JSharku's example above.
Back to top
View user's profile Send private message
JSharku
Apprentice
Apprentice


Joined: 09 Feb 2003
Posts: 189
Location: Belgium

PostPosted: Fri Jun 18, 2004 6:40 pm    Post subject: Reply with quote

When I first set up my local rsync server, an emerge sync would try to pull in the distfiles and packages, so I added that line to my rsyncd.conf, which worked at the time (portage 2.0.4x, 1-1.5 years ago ) so I kept it in there. I didn't know it had been added to portage, so I kept the line thinking it was necessary. It's only very recently that I discovered it was messing with emerge system, but I still didn't know that the exclude line itself had become obsolete. :oops:

Sharku
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Mon Jun 21, 2004 9:44 am    Post subject: Reply with quote

No problem, Its hard to keep up with all the changes. Thanks for helping make the syntax clear in the config.
Back to top
View user's profile Send private message
_sparks_
n00b
n00b


Joined: 12 Jan 2004
Posts: 1

PostPosted: Thu Jun 24, 2004 9:31 am    Post subject: Reply with quote

try turning logging off

/etc/rsync/rsyncd.conf:

Code:

#This will log every file transferred - up to 85,000+ per user, per sync
transfer logging = no


speeds up things in my configuration by a factor of 100 or so


Last edited by _sparks_ on Fri Jun 25, 2004 12:05 pm; edited 1 time in total
Back to top
View user's profile Send private message
fvant
Guru
Guru


Joined: 08 Jun 2003
Posts: 328
Location: Leiden, The Netherlands

PostPosted: Thu Jun 24, 2004 10:04 am    Post subject: Reply with quote

My local rsync server seems to sync in blocks of 200 files only. Where as the internet download filecounter can barely be followed, rsync from my local server steps slowly in steps of 200

CPU on the server i rsync form is not busy and rsync process only uses 2.3%, HD use DMA
Back to top
View user's profile Send private message
Marwin
n00b
n00b


Joined: 27 Oct 2002
Posts: 58

PostPosted: Thu Jun 24, 2004 11:01 am    Post subject: Reply with quote

Take your samba-server and make a directory that you call 'distfiles'.
Share it and make the clients mount it at /usr/portage/distfiles.
And Wolla! You've got a shared distfiles :-)
_________________
[ Never trust an operationsystem you don't have sources for ]
Back to top
View user's profile Send private message
quill18
n00b
n00b


Joined: 20 Jan 2004
Posts: 50

PostPosted: Thu Jun 24, 2004 7:38 pm    Post subject: Reply with quote

fvant wrote:
My local rsync server seems to sync in blocks of 200 files only. Where as the internet download filecounter can barely be followed, rsync from my local server steps slowly in steps of 200

CPU on the server i rsync form is not busy and rsync process only uses 2.3%, HD use DMA


Ditto on this. Very similar performance. Lots of spare CPU, bandwidth, and harddrive speed but terrible throughput.

Made sure that the hostnames are setup properly, and tried it with the original startup script as well as the one posted above.
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Thu Jun 24, 2004 10:30 pm    Post subject: Reply with quote

It appears some users are using other rsync.conf file and startup scripts and are having problems. Someone even posted a bad rsync.conf in this thread!!

The reason I posted this HOWTO is to eliminate the junk floating around!!

Everyone check that you are using the exact config and startup scripts in the HOWTO!! That will eliminate many problems!!
Back to top
View user's profile Send private message
Nekkrist
n00b
n00b


Joined: 09 Oct 2003
Posts: 33

PostPosted: Mon Jun 28, 2004 3:32 am    Post subject: Reply with quote

For everyone having speed related issues with your sync'ing, this is probably not a network problem, configuration problem, or anything of the sort. It is probably simply an aspect of computer hardware.

The reason the rsync server's appear to be so fast is that all they do all day is offer syncing services. Your local mirror, however, does not do this all day, in fact it probably very rarely is sync'd against.

Since the inner workings of the rsync algorithm are somewhat detailed, if you are interested, read http://samba.org/~tridge/phd_thesis.pdf (the rsync author's PhD thesis which includes a few chapters on rsync).

Otherwise, the basic result is that the rsync protocol operations are cached by the CPU cache of the main rsync mirrors, so that they don't actually need to be performed every single time. If you happened to be the very first person to sync against a main server after it was turned on, you would see very similar results to your own server. Your own server does not have these operations in the CPU cache since when you sync, that is likely the first time it has been sync'd against since its update.

If you have three or four computers, let one sync to the server, then after that one has completed, do another sync, chances are it will be a bunch faster than your previous findings.
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Thu Jul 01, 2004 8:20 am    Post subject: Reply with quote

Thanks for some more info Nekkrist!

There are many things to consider about your rsync speed:

1. CPU/Memory - your old Pent II 233mhz isn't going to be as fast as an official rsync server such as crane.gentoo.org with it's Dual 1.7GHZ Xeon's and 2GB ram.

2. Filesystem/Disk Speed - Rsync has to consider about 85,000 small files in many dir's. Put your /usr/portage on a fast disk with a filesystem that has high small file performance.

3. Disk Cache - The second rsync will be faster than the first.

4. Logging - My config has logging turned off because every client rsync will generate 85,000+ lines in the log file!

5. More....
Back to top
View user's profile Send private message
dmitrio
Tux's lil' helper
Tux's lil' helper


Joined: 10 Dec 2002
Posts: 115
Location: Pago Pago

PostPosted: Thu Jul 01, 2004 12:29 pm    Post subject: :. copied to gentoo-wiki.com Reply with quote

I have copied this HOWTO, with permission of flybynite, to gentoo-wiki.com
http://gentoo-wiki.com/HOWTO_Local_Rsync_Mirror
If you see anything that should be added or changed, feel free to do so.

Thank you for a great HOWTO.
_________________

... Leaving ground, destination is unknown,
into the darkness and far away from home,
Will your dream come true and what will you find,
when fate is your guide ...
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Sun Jul 04, 2004 8:17 am    Post subject: Reply with quote

I appreciate that dmitrio, the wiki should help get the word out!!

I also submitted the howto's for possible inclusion in Gentoo Weekly Newsletter as suggested by monkeywrench on the http-Replicator thread https://forums.gentoo.org/viewtopic.php?t=173226
Back to top
View user's profile Send private message
dmitrio
Tux's lil' helper
Tux's lil' helper


Joined: 10 Dec 2002
Posts: 115
Location: Pago Pago

PostPosted: Sun Jul 04, 2004 12:20 pm    Post subject: :. copied to gentoo-wiki.com Reply with quote

flybynite wrote:
I also submitted the howto's for possible inclusion in Gentoo Weekly Newsletter as suggested by monkeywrench on the http-Replicator thread https://forums.gentoo.org/viewtopic.php?t=173226

Thank you for good HOWTO
please look at
http://gentoo-wiki.com/HOWTO_Download_Cache_for_LAN-Http-Replicator

If you see anything that should be added or changed, feel free to do so.
_________________

... Leaving ground, destination is unknown,
into the darkness and far away from home,
Will your dream come true and what will you find,
when fate is your guide ...
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6  Next
Page 2 of 6

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum