Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Selective emerge rsync to cut down on bandwidth usage
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
OdinsDream
Veteran
Veteran


Joined: 01 Jun 2002
Posts: 1057

PostPosted: Thu Nov 21, 2002 10:06 pm    Post subject: Selective emerge rsync to cut down on bandwidth usage Reply with quote

When you execute emerge rsync, files are downloaded for every single ebuild in portage.

I myself, however, don't use nearly a quarter of these files, and some of them I'll never use (like the KDE stuff...), but I still have to download these files.

Is it necessary for portage to operate this way? Once the scale of portage continues to grow, the time required for an emerge rsync will increase significantly.

For instance, I emerg'ed samba, but halfway through the compile, noticed the GLSA announcement. I cancelled the compile, and then had to wait another 10 minutes for rsync to update everything, just to grab the latest Samba.

Why not have the ability to say emerge --latestVersion samba, then emerge checks with an rsync server, downloads the latest samba ebuild, and continues from there?

Dependencies for the selected package would also be checked against the server, and if outdated, their ebuilds would be downloaded as well.
Back to top
View user's profile Send private message
masseya
Bodhisattva
Bodhisattva


Joined: 17 Apr 2002
Posts: 2602
Location: Baltimore, MD

PostPosted: Fri Nov 22, 2002 3:08 am    Post subject: Re: Selective emerge rsync to cut down on bandwidth usage Reply with quote

OdinsDream wrote:
When you execute emerge rsync, files are downloaded for every single ebuild in portage.

I myself, however, don't use nearly a quarter of these files, and some of them I'll never use (like the KDE stuff...), but I still have to download these files.

Is it necessary for portage to operate this way? Once the scale of portage continues to grow, the time required for an emerge rsync will increase significantly.

This sounds like a really good idea. Perhaps you should check into filing a bug report. https://bugs.gentoo.org I would certainly use this. :)
Quote:
For instance, I emerg'ed samba, but halfway through the compile, noticed the GLSA announcement. I cancelled the compile, and then had to wait another 10 minutes for rsync to update everything, just to grab the latest Samba.

Why not have the ability to say emerge --latestVersion samba, then emerge checks with an rsync server, downloads the latest samba ebuild, and continues from there?

Dependencies for the selected package would also be checked against the server, and if outdated, their ebuilds would be downloaded as well.

I think the reason this isn't done is that it would require a lot more calls to the rsync server and mirrors. There's already enough traffic that they decided to add a RSYNC_RETRIES variable to make.conf. :|
_________________
if i never try anything, i never learn anything..
if i never take a risk, i stay where i am..
Back to top
View user's profile Send private message
mooman
Apprentice
Apprentice


Joined: 06 Nov 2002
Posts: 175
Location: Vancouver, WA

PostPosted: Fri Nov 22, 2002 5:59 am    Post subject: Reply with quote

Or at least let us rsync up "branches" at a time. Like I'm working on getting my spam proxy box up and running... for me, something like emerge rsync net-mail would be nice, since there are several packages in there that I would like the latest of. Then, since I never run gnome or kde, I could skip ever rsyncing their directories again....
_________________
Linux user off and on since circa 1995
Back to top
View user's profile Send private message
masseya
Bodhisattva
Bodhisattva


Joined: 17 Apr 2002
Posts: 2602
Location: Baltimore, MD

PostPosted: Fri Nov 22, 2002 6:01 am    Post subject: Reply with quote

mooman wrote:
Then, since I never run gnome or kde, I could skip ever rsyncing their directories again....

That sounds like a rsync.mask file in the making. ;)
_________________
if i never try anything, i never learn anything..
if i never take a risk, i stay where i am..
Back to top
View user's profile Send private message
mooman
Apprentice
Apprentice


Joined: 06 Nov 2002
Posts: 175
Location: Vancouver, WA

PostPosted: Fri Nov 22, 2002 7:17 am    Post subject: Reply with quote

Tristam,
You saying that 'rsync.mask' is something that works now? or we're just coming up with some compelling reasons for it to be added in?
_________________
Linux user off and on since circa 1995
Back to top
View user's profile Send private message
masseya
Bodhisattva
Bodhisattva


Joined: 17 Apr 2002
Posts: 2602
Location: Baltimore, MD

PostPosted: Fri Nov 22, 2002 7:22 am    Post subject: Reply with quote

mooman wrote:
Tristam,
You saying that 'rsync.mask' is something that works now? or we're just coming up with some compelling reasons for it to be added in?

No, I'm not saying that it exists. I would have given a full path name and a description of how to use it or possibly a link or something. I really think this is a good idea. I don't know a whole lot about gentoo's rsync mirrors, but I think it wouldn't be all that hard to pull part of an rsync or set up a mask file that would never pull unnecessary packages. Perhaps this is something OdinsDream would like to post as a bug on https://bugs.gentoo.org
_________________
if i never try anything, i never learn anything..
if i never take a risk, i stay where i am..
Back to top
View user's profile Send private message
jondkent
Apprentice
Apprentice


Joined: 26 Jul 2002
Posts: 289
Location: London

PostPosted: Fri Nov 22, 2002 1:23 pm    Post subject: Reply with quote

Hiya,

this all sounds like a good idea. Another possibility is to not keep any of these files locally at all, just have a db or flat file that is sync with the apps/versions/etc that are available listed. Then if you wanted to install something you would then sync down the ebuild script and go on from there. That would make life easier for people over 56k lines and remove massive amounts of traffic from the rsync servers (thereby lowering Gentoo costs). After all, as has been said, most people don't use any way near half of the build that are available.

The reason why I suggest this is because people can be lazy (I know I can be) and won't bother filtering down their rsync, esp if they have DSL. The method ensure you only download what is applicable to you :)

Comments?

Jon
Back to top
View user's profile Send private message
OdinsDream
Veteran
Veteran


Joined: 01 Jun 2002
Posts: 1057

PostPosted: Fri Nov 22, 2002 6:24 pm    Post subject: Reply with quote

jondkent wrote:
Hiya,

this all sounds like a good idea. Another possibility is to not keep any of these files locally at all, just have a db or flat file that is sync with the apps/versions/etc that are available listed. Then if you wanted to install something you would then sync down the ebuild script and go on from there. That would make life easier for people over 56k lines and remove massive amounts of traffic from the rsync servers (thereby lowering Gentoo costs). After all, as has been said, most people don't use any way near half of the build that are available.

The reason why I suggest this is because people can be lazy (I know I can be) and won't bother filtering down their rsync, esp if they have DSL. The method ensure you only download what is applicable to you :)

Comments?

Jon


After more thought on the matter, I agree with this suggestion absolutely. The traditional idea of rsync can't last for long, the bandwidth stress is too great.

As you say, there should only be a single file on the client machine, and this file should list all available package names and descriptions. This way you can get logical responses from emerge SomethingNonExistent and emerge samba.

I'm not sure how dependencies will be managed, but I assume it would be on the client machine. So, the client calculates what samba depends on, and then rsync's those ebuilds directly from a server (preferably the way sourceforge does it, with several mirrors to choose from, and one to set as the default, going by geographical location)

The ebuilds are then saved and executed.

Submitted to bugs: https://bugs.gentoo.org/show_bug.cgi?id=11093
Back to top
View user's profile Send private message
masseya
Bodhisattva
Bodhisattva


Joined: 17 Apr 2002
Posts: 2602
Location: Baltimore, MD

PostPosted: Fri Nov 22, 2002 6:54 pm    Post subject: Reply with quote

OdinsDream wrote:
I'm not sure how dependencies will be managed, but I assume it would be on the client machine. So, the client calculates what samba depends on, and then rsync's those ebuilds directly from a server (preferably the way sourceforge does it, with several mirrors to choose from, and one to set as the default, going by geographical location)

The ebuilds are then saved and executed.

The problem with this is that the ebuilds themselves are used to calculate dependancies. You can't use an ebuild to calculate dependancies to see if you need to download an ebuild. It's a chicken and an egg problem. This wouldn't work on a per ebuild basis. However, if someone wants to make a system that doesn't use a whole class of ebuilds, then you can save space on each rsync. If someone wants to never use Gnome then there is no point in always downloading updated Gnome stuff.

I think the concept that you are thinking of is basically what they have already implemented. They calculate dependancies and then download the source needed to install something when that something it called upon to be installed. The ebuilds are what is used to do that. Correct me if I'm not understanding you properly. :)

The closest system that I can see to what you are talking about would require a lot of network thrashing. You would emerge KDE and the system would download the main KDE ebuild. Then it would see what that depends on and download that ebuild. This would continue until all the ebuilds are downloaded and then it would start again with the source. I'm not sure that this is all that efficient. It would probably cause a lot of problems on the server side.
_________________
if i never try anything, i never learn anything..
if i never take a risk, i stay where i am..
Back to top
View user's profile Send private message
OdinsDream
Veteran
Veteran


Joined: 01 Jun 2002
Posts: 1057

PostPosted: Fri Nov 22, 2002 7:15 pm    Post subject: Reply with quote

I admit I don't know much about how dependencies are calculated. The suggestion I had would be...in this flat file, you'd have, say,


kde-base
--kde-arts
--kde-libs

gnome-base
--gnome-libs


Or something similar to this, where dependencies are listed as part of the package entry. This is, of course, absolutely necessary. I always pass --pretend to emerge before I actually install a program. To lose that functionality just wouldn't be acceptable. That's what would happen, it seems, if the ebuild were first downloaded, and then dependencies were calculated.
Back to top
View user's profile Send private message
jondkent
Apprentice
Apprentice


Joined: 26 Jul 2002
Posts: 289
Location: London

PostPosted: Fri Nov 22, 2002 8:46 pm    Post subject: Reply with quote

Quote:
The suggestion I had would be...in this flat file, you'd have, say,


kde-base
--kde-arts
--kde-libs

gnome-base
--gnome-libs


Or something similar to this, where dependencies are listed as part of the package entry.


yeh that exactly my thinking on how to handle dependancies in this situation. Is simple and elegant and logical, everything Gentoo stands for :).

Sure for today, rsync may still OK (I'm not sure I agree mind), but a year? 5 years? I remember when Debian only has 2500 packages now it has over 8000 (not that I ever want Gentoo to get to that extreme). But when the portage tree grows bigger in the future rsync will not be the way to manage this. The solution I've outlined (and enhanced by OdinsDream) scales very well and seems to fit the Gentoo way of doing things. I've no doubt that it would need improvement, but I feel its a good starting point.

Jon
Back to top
View user's profile Send private message
KiTaSuMbA
Guru
Guru


Joined: 28 Jun 2002
Posts: 430
Location: Naples Italy

PostPosted: Fri Nov 22, 2002 9:12 pm    Post subject: Reply with quote

My 2 cents on the matter:
I'm totally against the db thing for a series of reasons:
- adding more complexity to the system, double entries for deps (both the db and the ebuild, or perhaps taking the deps from the ebuild and putting on the db which would then make it a pain for rapid or "DIY" ebuild write-ups)
- losing transparency: right now I can just cd to /usr/portage and have a feeling of what is built, what could be built and what is fresh out of the oven... Even more importantly I sometimes modify ebuilds before emerging to change some configure switches that are not taken into consideration using the USE flags. NowIf I had to do that with a db... I really don't know how... You are in definite need of some front end which you can't be sure it will cover everybody's needs (I'd bet it wont actually).
- a flat file is the way to slugginess... so a real db will be necessary (berkleydb terrors??? Don't even say SQL just to use portage...)

What I'm all for is the "branch rsync" idea. Either with a .mask file or at command line option (like emerge rsync net-misc). Unfortunatelly some updated packages will require updated others in another branch that, since they are not rsynced, the emerge will fail to find and satisfy. A solution to that would be, that if emerge finds no adequate version in the tree for a dependency, it forces an rsync in the dep's branch as well before moving on. So it's a bit more work behind the curtains, but it keeps things pretty much as they are while adding a nice feature.

So, what do you think?
_________________
Need to flame people LIVE on IRC? Join #gentoo-otw on freenode!
Back to top
View user's profile Send private message
puddpunk
l33t
l33t


Joined: 20 Jul 2002
Posts: 681
Location: New Zealand

PostPosted: Fri Nov 22, 2002 11:06 pm    Post subject: Reply with quote

I thought rsync worked by:
  1. Downloading a file list of the dir to be sync'd from the remote server
  2. comparing that file list to the local dir
  3. deleting/downloading the differences


If it doesnt work like that...why not? :D
Back to top
View user's profile Send private message
psharp
Tux's lil' helper
Tux's lil' helper


Joined: 16 Sep 2002
Posts: 76
Location: London, UK

PostPosted: Sat Nov 23, 2002 9:14 am    Post subject: Reply with quote

OdinsDream wrote:
The suggestion I had would be...in this flat file, you'd have, say,


kde-base
--kde-arts
--kde-libs

gnome-base
--gnome-libs


But dependancies are more complex, depending on use flags etc. In the end you would still have one big file :(
Back to top
View user's profile Send private message
dreamer3
Guru
Guru


Joined: 24 Sep 2002
Posts: 553

PostPosted: Tue Jan 07, 2003 12:13 pm    Post subject: Reply with quote

I have a 56k winmodem (yuck) and doing a emerge rsync takes less than 2 minutes with a few updates out there, maybe 3-5 with a LOT of updates (read: I haven't synced up in a while).

Where is the problem? You guys on high speed links should have it WAY better than this.

What gets to me is the apparent complexity increases in portage with each new release. I like the new features and I know we're accomplishing good stuff but every time I upgrade it seems that searching for an ebuild or calucating dependencies takes longer than before, and I know I'm not just imagining things.
Back to top
View user's profile Send private message
scocou
Apprentice
Apprentice


Joined: 16 Aug 2002
Posts: 184
Location: Pacific NW, Canada

PostPosted: Sun Jan 12, 2003 6:30 am    Post subject: Reply with quote

I think it would be interesting if a user could decide which branches of the portage tree were sync'd with the USE variables. It's a system already in place that abstractly describes what 'type' of err... package support the user needs/wants, and it wouldn't require the user to make reduntant selections in two configs (not to whine it would just be more convenient). Sorry to be so non-technical, it's just a casual thought....
Back to top
View user's profile Send private message
scocou
Apprentice
Apprentice


Joined: 16 Aug 2002
Posts: 184
Location: Pacific NW, Canada

PostPosted: Sun Jan 12, 2003 6:31 am    Post subject: Reply with quote

I think it would be interesting if a user could decide which branches of the portage tree were sync'd with the USE variables. It's a system already in place that abstractly describes what 'type' of err... package support the user needs/wants, and it wouldn't require the user to make reduntant selections in two configs (not to whine it would just be more convenient). Sorry to be so non-technical, it's just a casual thought....
Back to top
View user's profile Send private message
scocou
Apprentice
Apprentice


Joined: 16 Aug 2002
Posts: 184
Location: Pacific NW, Canada

PostPosted: Sun Jan 12, 2003 6:31 am    Post subject: Reply with quote

hmm.. said message failed to send but duplicated it 3 times instead! No delete button for some reason, aargh! Sorry all, not my fault...

Last edited by scocou on Sun Jan 12, 2003 6:35 am; edited 2 times in total
Back to top
View user's profile Send private message
ahurst
Tux's lil' helper
Tux's lil' helper


Joined: 23 Oct 2006
Posts: 88
Location: Sheffield, UK

PostPosted: Tue Feb 13, 2007 3:55 pm    Post subject: Reply with quote

Hello,

I share the original poster's idea, and wrote about it elsewhere:
https://forums.gentoo.org/viewtopic-p-3904172.html#3904172

Portage can really be optimised by cutting out the stuff it doesn't need to consider.



If we only locally store the ebuilds for installed packages, but also locally store a full list of valid package atoms in the server's tree then:

- 99% of the tree needn't be stored locally; emerge --sync need only consider installed packages and re-read the tree's package atom list.

- calling 'emerge -p' on a new package reads its portage subtree from rsync server into memory (tiny bw), and can calculate all dependencies recursively thus

- doing an 'emerge <newpackage>' downloads the package's subtree into the local tree, and thus recursively the same for its dependencies

- 'emerge -C' takes away the package's local subtree


Much lower bandwidth, and storage requirements all round.
Can someone defeat this?

Andy
Back to top
View user's profile Send private message
herka
n00b
n00b


Joined: 08 Aug 2003
Posts: 23

PostPosted: Tue Feb 13, 2007 5:52 pm    Post subject: Reply with quote

What about this http://gentoo-wiki.com/TIP_Exclude_categories_from_emerge_sync?
Herka
Back to top
View user's profile Send private message
ahurst
Tux's lil' helper
Tux's lil' helper


Joined: 23 Oct 2006
Posts: 88
Location: Sheffield, UK

PostPosted: Tue Feb 13, 2007 6:56 pm    Post subject: Reply with quote

Yes, quite,

the method in question is hard work. Why not just grep -v your installed packages from the tree (complete package atom list). It does cut bandwidth and processing requirements though.

the second (linked) tip, to store the portage tree in a squashfs is really overkill, because you're still storing 99% of the tree you don't need.

Combine the two and you've got it.

But my idea would kill the two birds with one stone. The trick would be to only ever rsync a subset of the tree, as defined by your request (sync <installed package list>, or emerge <new package>).

I don't know whether rsync can do this. Perhaps I should find out!

Andy
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum