Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
HOWTO:Download Cache for your LAN-Http-Replicator (ver 3.0)
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3 ... 21, 22, 23, 24  Next  
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Sun May 13, 2007 12:48 am    Post subject: repcachman beta 4.0 Reply with quote

I've waded through the depths of portage, and returned with a new version of repcacheman.


I'm continuing to develop and add new features, but now is a good time to test and get some feedback from users. Since I have new features still in development, I chose an older revision to test, but it should work much better than the previous version and have all the old features.


My tests show this beta 4.0 uses only 10% of the resident memory of the previous version and runs 4 times faster!!


It can be run from any dir as root or dropped in place of the old /usr/bin/repcacheman.py ( not /usr/bin/repcacheman )


The code can be downloded from this temp location:

http://home.earthlink.net/~poplawtm/rep4.py.tar.gz


Last edited by flybynite on Tue May 15, 2007 12:39 am; edited 2 times in total
Back to top
View user's profile Send private message
mkzelda
n00b
n00b


Joined: 22 Aug 2004
Posts: 32

PostPosted: Mon May 14, 2007 5:06 pm    Post subject: Reply with quote

On the first run, it fails and the directory it creates is not where I specified in /etc/conf.d/http-replicator.

Begin Http-Replicator Setup....
created /var/cache/http-replicator/
Traceback (most recent call last):
File "/usr/bin/repcacheman", line 73, in ?
print "\tchange owner " + dir + " to " + user + " failed:"
NameError: name 'user' is not defined

When I run repcacheman again, it works properly other than ignoring my desired directory in the conf

!!! Digest verification failed:
!!! /usr/portage/distfiles/gem_plugin-0.2.2.gem
!!! Reason: Failed on RMD160 verification
!!! Got: 9715b571202ebe33d72bfd6384305b555da6f2b6
!!! Expected: 4759f2ccb75081ebe46ffffc3ad5c7ba2e20c3bc

!!! Digest verification failed:
!!! /usr/portage/distfiles/file-4.20.tar.gz
!!! Reason: Filesize does not match recorded size
!!! Got: 548412
!!! Expected: 548393

!!! Digest verification failed:
!!! /usr/portage/distfiles/mongrel-1.0.1.gem
!!! Reason: Filesize does not match recorded size
!!! Got: 159232
!!! Expected: 160256

SUMMARY:
Found 0 duplicate file(s).
Deleted 0 dupe(s).
Found 1103 new file(s).
Added 1040 of those file(s) to the cache.
Rejected 60 File(s) not in Portage.

Oops, I did that on the wrong machine.

The second results are the same. The first attempt fails, rep4.py ignores my desired cache dir, and it picked up a few more bad files. Now, I'd be happy if it'd just use my ftp pub dir instead of /var/cache/http-replicator.

--update
I also have to set the conf back to /var/cache/http-replicator for the time being or http-replicator fails.
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Tue May 15, 2007 1:00 am    Post subject: Reply with quote

mkzelda wrote:
On the first run, it fails and the directory it creates is not where I specified in /etc/conf.d/http-replicator.


Thanks for the bug report mkzelda, I've fixed the 'user' is not defined problem.

Now did you run repcacheman from a dir or did you copy over your old /usr/bin/repcacheman?
Back to top
View user's profile Send private message
mkzelda
n00b
n00b


Joined: 22 Aug 2004
Posts: 32

PostPosted: Tue May 15, 2007 1:38 am    Post subject: Reply with quote

i replaced /usr/bin/repcacheman
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Tue May 15, 2007 2:16 am    Post subject: Reply with quote

mkzelda wrote:
i replaced /usr/bin/repcacheman


Ooops, I'm sorry. I've corrected my post above but I meant to say replace /usr/bin/repcacheman.py

/usr/bin/repcacheman just calls /usr/bin/repcacheman with the correct options which is why you had the other problems.

Either re-emerge http-replicator which won't disturb your config or edit /usr/bin/repcacheman to look like this

Code:

#! /bin/bash
source /etc/conf.d/http-replicator
/usr/bin/repcacheman.py $GENERAL_OPTS


and replace /usr/bin/repcacheman.py with the beta script.

Again, sorry for the inconvenience.

I've uploaded beta 4.1 with two typo's fixed. There is still something going on with the core code. Right now I think it is a filename collision in portage itself. I didn't change the download link but you will see rep41.py inside.
Back to top
View user's profile Send private message
mkzelda
n00b
n00b


Joined: 22 Aug 2004
Posts: 32

PostPosted: Tue May 15, 2007 5:00 pm    Post subject: Reply with quote

Okay, that worked, with verbose output of the portage tree. Is there a trigger to avoid the verbosity? My server performs slower when outputting scrolling text.

I'm wondering how files in the cache are treated. Are they assumed to be good, and thus unchecked? For example, if I have an overlay on another machine that my server does not, the files it grabs are stored in the cache as they are fetched, and they remain there indefinitely? So, can I put any files in the cache that my client machines might fetch, such as livecd .iso's, and so long as the client used wget with http_proxy specified it can fetch that locally?
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Tue May 15, 2007 6:41 pm    Post subject: Reply with quote

mkzelda wrote:
Okay, that worked, with verbose output of the portage tree. Is there a trigger to avoid the verbosity? My server performs slower when outputting scrolling text.


The verbose output is just my debugging going on, it won't be in the final version. I've uploaded beta revision 4.3 that removes the scrolling and fixes the problem in the core code I mentioned earlier.

mkzelda wrote:

I'm wondering how files in the cache are treated. Are they assumed to be good, and thus unchecked? For example, if I have an overlay on another machine that my server does not, the files it grabs are stored in the cache as they are fetched, and they remain there indefinitely? So, can I put any files in the cache that my client machines might fetch, such as livecd .iso's, and so long as the client used wget with http_proxy specified it can fetch that locally?


Thanks for asking! I've been trying to decide some possible options to add and who might need them. I also want the greatest possible options for users.

replicator is a general purpose proxy at heart, It will serve and cache anything that goes through it, even web browsing. There is an "alias" option to serve files from a dir of your choice in addition to the cache. It defaults to serving BINARY packages from gentoo's default location but you can add to or replace that default.

/etc/conf.d/http-replicator
Code:

## Local dir to serve clients.  Great for serving binary packages
## See PKDIR and PORTAGE_BINHOST settings in 'man make.conf'
## --alias /path/to/serve:location will make /path/to/serve
## browsable at http://http-replicator.com:port/location
DAEMON_OPTS="$DAEMON_OPTS --alias /var/tmp/packages/All:All"


So if you want to serve random files you can keep them in a separate dir for easy management by fetching them with the alias url or keep them in the cache and fetch them with the http_proxy setting. Multiple alias options are allowed. Http-replicator was designed to be a secure, high performance web server with a cache.

replicator doesn't check its own cache for this reason. It won't touch anything in it's cache because it may contain user files.

The question is should replicator check it's cache?

I say no right now because it can be done better by other means. But adding that feature would be convenient for many users?


1. If replicator is a gentoo only cache, there are other distfile checking scripts that will delete files based on many tests such as not in portage, not the most current version, older than a certain date, exceed a maximum cache size, not accessed in the last 3 months, etc etc.


2. If replicator is used for other files I can't even guess how to prune the cache.


What I do is this. It could be a cron script but I do it manually by choice.

Code:

mv /var/cache/http-replicator/* /var/tmp/distfiles/
repcacheman
rm -rf /var/tmp/distifles/*


This moves the cache files to the distfile dir. This is fast because it only renames the files, it doesn't move anything on disk.
repcacheman runs which moves all good files back to the cache.
then I delete all the remaining files which are not in portage or corrupt/incomplete.


You could also move the files, run the distfile cleaning script to prune based on your desires, then run repcacheman!

There was a time when distfile cleaning scripts were hard to find, now eclean is part of gentoolkit.

I know that was probably more than you wanted to know but I hope it helped you and some lurkers :-)
Back to top
View user's profile Send private message
dahoste
Tux's lil' helper
Tux's lil' helper


Joined: 01 Dec 2005
Posts: 138
Location: Maryland, USA

PostPosted: Tue Jun 19, 2007 11:27 pm    Post subject: Reply with quote

flybynite: does your new beta version address the MD5 problem when computing checksums?
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Wed Jun 20, 2007 5:52 am    Post subject: Reply with quote

dahoste wrote:
flybynite: does your new beta version address the MD5 problem when computing checksums?



Yes, the new version is fully portage manifest2 compliant and is much faster than the previous version.
Back to top
View user's profile Send private message
golding
Apprentice
Apprentice


Joined: 07 Jun 2005
Posts: 230
Location: Adelaide / South Australia

PostPosted: Sat Jul 07, 2007 4:53 am    Post subject: Reply with quote

flybynite

Some time ago (early '06 I think) I posted here that http-replicator would be started in the rc init scripts, but when I went to emerge anything I had to restart it. This behaviour has remained until yesterday.

Before then I was using a login manager of varying types from gdm to xdm and even the Enlightenment greeter, but yesterday I decided I had had enough and wanted to properly secure my lan by using proper console login procedures.

Surprise, surprise! Suddenly http-replicator did not have to be re-started after login, now it works without that annoying restart before I emerge anything.

I do not know if this is a bug, however, I thought you might like to know.
_________________
Regards, Robert

..... Some people can tell what time it is by looking at the sun, but I have never been able to make out the numbers.
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Mon Jul 09, 2007 3:47 am    Post subject: Reply with quote

golding wrote:
flybynite
I do not know if this is a bug, however, I thought you might like to know.


Yes, I remember :-)

A bug was filed similar to yours (I don't think you filed it) , but I could never reproduce it. Please check if you can help maurice here
https://bugs.gentoo.org/show_bug.cgi?id=177428
Back to top
View user's profile Send private message
BernieKe
Tux's lil' helper
Tux's lil' helper


Joined: 02 Jul 2002
Posts: 130
Location: California/Bangalore/Belgium

PostPosted: Mon Jul 16, 2007 8:50 am    Post subject: Reply with quote

Quick question: is it ok for me to set the http-replicator cache to /usr/portage/distfiles?
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Wed Jul 18, 2007 6:49 am    Post subject: Reply with quote

no.
Back to top
View user's profile Send private message
BernieKe
Tux's lil' helper
Tux's lil' helper


Joined: 02 Jul 2002
Posts: 130
Location: California/Bangalore/Belgium

PostPosted: Wed Jul 18, 2007 7:06 am    Post subject: Reply with quote

That's what I thought, but I had this working for the past few days like this, and everything seemed ok. (It happened by accident by the way, I discovered what I'd done after already having emerged a number of packages.)

But I now got why it was working, apparently I had commented out the http_proxy setting on the machine that's running http-replicator.

Considering that the replicator host doesn't use the replicator, and only serves the distfiles directory out, would there still be an issue with a setup like this?

It seems to make life easier, removing the need for double writes, and repcacheman (or manual alternatives.)

If you could explain why the above would be a bad thing to do (if it actually is), I'd appreciate it.

Thanks,
Bernie
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Wed Jul 18, 2007 9:28 am    Post subject: Reply with quote

Scroll up a couple of posts and see the link for the new, improved version of repcacheman that is much faster and much less of a memory hog and actually works with the latest portage.

BernieKe wrote:
Considering that the replicator host doesn't use the replicator, and only serves the distfiles directory out, would there still be an issue with a setup like this?


All boxes should have http_proxy set to point to replicators cache, even the box hosting replicator. Anything less means the cache isn't being used and maximized.


BernieKe wrote:

If you could explain why the above would be a bad thing to do (if it actually is), I'd appreciate it.


The short answer is portage is a bad neighbor. It leaves half downloaded and corrupt files laying around the distfile dir and as a bonus leaves those junk files owned by root. Http-replicator will serve those corrupt, incomplete files to other clients because it doesn't checksum the files. replicator doesn't do the checksums because it isn't gentoo specific plus it streams files to clients as they are received. It couldn't even try to do checksums till the whole file was received which means it couldn't simultaneously stream files as they are downloaded. repcacheman is gentoo specific and does the checksums but only does this when requested, not continuously.


repcacheman deletes dups, and imports files to the cache that pass the checksum test, if any. Test your system out and see how often it actually has to do checksums in actual use. I haven't had to do checksums in many months. This isn't true for all parts of the world, some areas have better ftp mirrors closer to them. Just make sure you have a full, complete set of http mirrors defined and no ftp mirrors if you can. Portage will still download by ftp even if no ftp mirrors are listed in GENTOO_MIRRORS.

checksums are kinda expensive to do, but rarely happen for most users. If you didn't mind losing the mostly rare ftp downloads, you could just rm -rf /usr/portage/distfiles on the server. But some users are on dialup and take days to download openoffice etc. rm -rf will lose the partial download, repcacheman won't.
Back to top
View user's profile Send private message
BernieKe
Tux's lil' helper
Tux's lil' helper


Joined: 02 Jul 2002
Posts: 130
Location: California/Bangalore/Belgium

PostPosted: Thu Jul 19, 2007 2:50 am    Post subject: Reply with quote

Thanks a lot for the clear reply!

I've installed the updated repcacheman, and everything works fine once again.

I do however have one small patch for you, in order to also ignore git and mercurial sources.

Code:

93c93
< dc=filecmp.dircmp (distdir,dir,['cvs-src','git-src','hg-src','.locks'])
---
> dc=filecmp.dircmp (distdir,dir,['cvs-src','.locks'])
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Fri Aug 03, 2007 7:27 am    Post subject: Reply with quote

BernieKe wrote:

I do however have one small patch for you, in order to also ignore git and mercurial sources.


Ignoring git sounds good, I'm not familiar with mercurial but since you asked I added it also. Thanks!
Back to top
View user's profile Send private message
neosimago
n00b
n00b


Joined: 07 Mar 2006
Posts: 9

PostPosted: Thu Nov 22, 2007 9:10 am    Post subject: repcacheman fails Reply with quote

I'm having troubles with the repcacheman python script. Re-installing python doesn't fix the problem, and other python script seems to work fine. Here's the output:

Code:
Checking authenticity and integrity of new files...
Searching for ebuilds...
Done!

Found 25230 ebuilds.

Extracting the checksums....
Done!

Verifying checksum's....
/usr/portage/distfiles/GDM-FlyAway.tar.gz
Traceback (most recent call last):
  File "/usr/bin/repcacheman.py", line 203, in ?
    if t["MD5"]:
KeyError: 'MD5'


more of the output can be found at : http://rafb.net/p/FmewYB48.html

=> I have tried removing the problem files to no avail, and supposedly the script it designed to handle them anyways. I'll check in on the forums to see if this gets fixed later. --kudos!


Last edited by neosimago on Thu Nov 22, 2007 6:46 pm; edited 1 time in total
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Thu Nov 22, 2007 9:28 am    Post subject: Re: repcacheman fails Reply with quote

neosimago wrote:
I'm having troubles with the repcacheman python script. Re-installing python doesn't fix the problem, and other python script seems to work fine. Here's the output:


fixed a long time ago but not updated by the gentoo maintainer.

See this post

https://forums.gentoo.org/viewtopic-t-173226-postdays-0-postorder-asc-start-539.html
Back to top
View user's profile Send private message
neosimago
n00b
n00b


Joined: 07 Mar 2006
Posts: 9

PostPosted: Thu Nov 22, 2007 7:39 pm    Post subject: Re: repcacheman fails Reply with quote

Thanks a bunch! I'm using your 4.3 beta of repcacheman and it's working out fine.

flybynite wrote:
neosimago wrote:
I'm having troubles with the repcacheman python script. Re-installing python doesn't fix the problem, and other python script seems to work fine. Here's the output:


fixed a long time ago but not updated by the gentoo maintainer.

See this post

https://forums.gentoo.org/viewtopic-t-173226-postdays-0-postorder-asc-start-539.html
Back to top
View user's profile Send private message
neosimago
n00b
n00b


Joined: 07 Mar 2006
Posts: 9

PostPosted: Thu Nov 22, 2007 7:47 pm    Post subject: Reply with quote

flybynite:

I'm using a gentoo local server with squid running. Will this be conflicting or redundant with http-replicator also installed? I'm not running into any problems now, and i hope not in the future, so I'll keep you posted on this issue if anything shows up.

I would also like to add Apt-get ubuntu package and source mirrors to my gentoo box, because I run a mixed environment. How would I do that? I haven't been able to find a good source on line to do that. Apparently the changes in /etc/make.conf for http_proxy doesn't work the same way as it does in ubuntu distros. More insight as to how http-replicator works would help definately in this situation. I don't suppose repcacheman would be of any help in an Apt-get http source mirror, because it's gentoo specific right? Any thoughts about writing a py script for Apt-get mirrors?

And, yes the rep4.py script does run faster with less resouces. Keep up the good work!
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Fri Nov 23, 2007 5:20 am    Post subject: Reply with quote

neosimago wrote:

I'm using a gentoo local server with squid running. Will this be conflicting or redundant with http-replicator also installed?


Squid doesn't know anything about portage so it is actually very inefficient with portage. When I was looking for a cache I first tried squid and was totally unsatisfied so that helped spur me on to seeing replicator developed. Don't get me wrong, squid is good at some things, just not with portage.

neosimago wrote:

I don't suppose repcacheman would be of any help in an Apt-get http source mirror, because it's gentoo specific right?


Actually it's the opposite. replicator was developed first as a debian apt-get style cache with some general purpose http caching as well. I worked with the developer gertjan to add gentoo specific features later when I was looking to develop a better cache and not wanting to start from scratch.


I've not run debian style in a long time so forgive me if I miss something, but removing the -s and -f options from /etc/init.d/http-replicator should return replicator to a debian style cache. There may be other options that are helpful, check the man page, but I know these two specific options are only usable with gentoo.


The problem is, without those two options, replicator doesn't work well with portage and suffers some of the same problems as squid, namely the miss rate skyrockets.

So I'd bet you would have to run two instances of http-replicator on different ports to serve both gentoo and debian style clients. That setup should would work well and be very efficient :-)
Back to top
View user's profile Send private message
neosimago
n00b
n00b


Joined: 07 Mar 2006
Posts: 9

PostPosted: Fri Nov 23, 2007 7:31 am    Post subject: Reply with quote

flybynite

thanks for the reply. I'm looking through the man pages for http-replicator, and tried the home page: http://gertjan.freezope.org/replicator <=apparently i'm getting a message of bad gateway trying to access only that particular page. I would very much like to find more information about the best ways to run two instances of replicator, one for gentoo, which i have working well now, and one for ubuntu's apt-get. I suppose i could re-create another /etc/init.d/http-replicator script to start under another name, but that would be like having a rogue package loose on my system. What would be the best way to port replicator available on the ubuntu platform so that it provides the same functions on a gentoo system for apt-caching?
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Fri Nov 23, 2007 8:12 am    Post subject: Reply with quote

neosimago wrote:
I suppose i could re-create another /etc/init.d/http-replicator script to start under another name, but that would be like having a rogue package loose on my system. What would be the best way to port replicator available on the ubuntu platform so that it provides the same functions on a gentoo system for apt-caching?



What kind of box is it ubuntu or gentoo? If gentoo use /usr/local/portage to create a duplicate package changing the ebuild to not conflict. You could make the new version 99 and use slotting for example.

This way the package isn't rogue....

There is some info about ebuilds here http://www.gentoo.org/proj/en/devrel/handbook/handbook.xml?part=2&chap=1#doc_chap2

If ubuntu, sorry, I can't help you.
Back to top
View user's profile Send private message
neosimago
n00b
n00b


Joined: 07 Mar 2006
Posts: 9

PostPosted: Fri Nov 23, 2007 4:54 pm    Post subject: Reply with quote

flybynite wrote:
neosimago wrote:
I suppose i could re-create another /etc/init.d/http-replicator script to start under another name, but that would be like having a rogue package loose on my system. What would be the best way to port replicator available on the ubuntu platform so that it provides the same functions on a gentoo system for apt-caching?


Quote:

What kind of box is it ubuntu or gentoo? If gentoo use /usr/local/portage to create a duplicate package changing the ebuild to not conflict. You could make the new version 99 and use slotting for example.


--flybynite

it's a gentoo box. I can see to create a duplicate ebuild to not conflict. Where can the new version 99 found? The idea would be to slot another install to handle caching on a local gentoo machine that would be able to serve to ubuntu Apt clients on the local network. How would replicator work on an ubuntu machine? Has replicator been packaged for Apt? If so, I could use that model and adapt it to the gentoo machine to serve ubuntu machines.

--thanks

This way the package isn't rogue....

There is some info about ebuilds here http://www.gentoo.org/proj/en/devrel/handbook/handbook.xml?part=2&chap=1#doc_chap2

If ubuntu, sorry, I can't help you.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Goto page Previous  1, 2, 3 ... 21, 22, 23, 24  Next
Page 22 of 24

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum