Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Discussion & Documentation Gentoo Chat
  • Search

Benchmark: RSync VS. Git for Syncing Portage

Opinions, ideas and thoughts about Gentoo. Anything and everything about Gentoo except support questions.
Post Reply
  • Print view
Advanced search
41 posts
  • Previous
  • 1
  • 2
Author
Message
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Sun Feb 01, 2015 5:12 am

After a bit of experience with git and svn, I'm thinking that the sync times is not the main benefit of using git...

Finally Gentoo can have "real" version numbers and branches! Being on trunk is bleeding edge!

Then again, I'm not quite sure about this... maintenance nightmare from the devs, or is it...
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Ant P.
Watchman
Watchman
Posts: 6920
Joined: Sat Apr 18, 2009 7:18 pm
Contact:
Contact Ant P.
Website

  • Quote

Post by Ant P. » Tue Feb 03, 2015 6:23 pm

That's an interesting thought actually - with the ability to create an infinite number of branches, ~arch isn't needed. Debian already kind of does the same thing but by hand.

The main thing, for me, that the git move will allow, is a chance to break up the gigantic main portage tree into more appropriately-sized chunks. Right now it's already mostly a pile of ebuilds already present in several git overlays; with git as the main distribution method those could stay where they are. Things like half of app-dicts could live in a separate overlay too.
Top
steveL
Watchman
Watchman
Posts: 5153
Joined: Wed Sep 13, 2006 1:18 pm
Location: The Peanut Gallery

  • Quote

Post by steveL » Tue Feb 03, 2015 11:04 pm

Ant P. wrote:That's an interesting thought actually - with the ability to create an infinite number of branches, ~arch isn't needed. Debian already kind of does the same thing but by hand.
That sounds like a nightmare to me.
The main thing, for me, that the git move will allow, is a chance to break up the gigantic main portage tree into more appropriately-sized chunks. Right now it's already mostly a pile of ebuilds already present in several git overlays; with git as the main distribution method those could stay where they are. Things like half of app-dicts could live in a separate overlay too.
That seems like an odd choice for a distribution to make.

Note none of this has to do with the vcs used; rather it's about not using a new vcs as an excuse to throw away working methodologies that are nothing to do with version control, and everything to do with sensible maintenance.
Top
Ant P.
Watchman
Watchman
Posts: 6920
Joined: Sat Apr 18, 2009 7:18 pm
Contact:
Contact Ant P.
Website

  • Quote

Post by Ant P. » Wed Feb 04, 2015 12:27 am

steveL wrote:That sounds like a nightmare to me.
You're reading the word "infinite" too literally.

But besides that, you're saying Git branching is a nightmare compared to the current workflow. And it is! You don't have to care about branches, changesets or atomicity in CVS because those concepts don't exist in the 1980s.

People who've rsynced at the wrong time and ended up with a broken tree probably do care though. (Are the automated bans for rsyncing more than 4 times a day still there? What's rsync doing so horribly wrong that it needs that kind of rate limiting?)
That seems like an odd choice for a distribution to make.

Note none of this has to do with the vcs used; rather it's about not using a new vcs as an excuse to throw away working methodologies that are nothing to do with version control, and everything to do with sensible maintenance.
Yeah, none of what I said needs Git.

It could just as easily be done in CVS! Better yet, let's just fix Portage and Paludis and everything else to not run like molasses with one hundred thousand ebuilds in a single tree. Let's get gpg signatures for eclasses while we're at it. No need to abandon ship when you can just ration out scuba gear...
Top
mgorny
Developer
Developer
Posts: 83
Joined: Fri Apr 27, 2007 11:20 am

  • Quote

Post by mgorny » Wed Feb 04, 2015 10:03 am

I don't think this can be benchmarked easily. Git and rsync are too different, and they react differently to bottlenecks :). Note that syncing is generally intensive on network and disk I/O.

Now, RSync does network activity throughout most of the syncing process. The point is, it can start working with partial results -- so if network is slow, disk is used efficiently. If disk is slow, network is used efficiently. If both are poor, you get some combination of the two.

Git is pretty much two-part split process. First it fetches, with network being the bottleneck. Then it updates, with disk I/O being the bottleneck. If both are poor, you have two separate bottlenecks one after the other.

Of course, in a longer term there's the fact that git also gives you intermediate commits that may not be relevant anymore :).
Top
Al-Caveman
n00b
n00b
Posts: 39
Joined: Sun Sep 21, 2014 2:28 pm

  • Quote

Post by Al-Caveman » Fri Feb 06, 2015 10:03 am

mgorny wrote:...
Of course, in a longer term there's the fact that git also gives you intermediate commits that may not be relevant anymore :).
That's an asymptotic analysis of Git. To be fair, we should also apply the
same kind of analysis to RSync too.

Yes, if total number of intermediate commits (IC) grows very large, then at
some point Git can be slower than RSync.

However, RSync is not free of such asymptotic issues. Yes RSync doesn't have
IC as a parameter, but it has a far worse set of parameters: total number of
files (FN) and the size of each file (FS).

So generally, Git "pull" has probably a time complexity that looks like
O(IC), while RSync has a time complexity that looks like O(FN * FS).

Which one is better? In reality, the negative effects of IC is far smaller
than the negative effects of growing FN*FS. Currently portage is large enough
that having a value of IC that is the result of over a month worth of Git
intermediate commits is easily going to be faster than running RSync.
Top
ct85711
Veteran
Veteran
Posts: 1791
Joined: Tue Sep 27, 2005 8:54 pm

  • Quote

Post by ct85711 » Fri Feb 06, 2015 12:51 pm

One thing that may want to point out on differences, is how sensitive changing the portage's tree is to the operation. On rsync, it just removed any files all extra files, so it doesn't care if there was a change done locally to the tree or not, it'll just wipe the change and go on; while git locks up. For a lot of people this isn't too big of an issue, for me it can get to be a annoyance. The reason being for like times that the checksum on a ebuild is bad (happens once and a while on ~amd64 tree). On rsync, I can just go ahead and rebuild the checksum and continue on with building my system and syncing will have no issue later on. On git, if I did this, I'd later have to wipe my entire tree to resync again; though on the other hand, I could possibly push a updated ebuild/patch to the tree to be reviewed when the associated bug report isn't getting looked at.
Top
Roman_Gruber
Advocate
Advocate
Posts: 3854
Joined: Tue Oct 03, 2006 8:43 am
Location: Austro Bavaria

  • Quote

Post by Roman_Gruber » Fri Feb 06, 2015 1:02 pm

I really love benchmarks when they are over the net.

Benchmark is only valid when it is done 5-10 times and it is reproduceable.

So a setup with local boxes with no load and other influences

AFAIK git provides more security as the approach we use now. AS discussed months ago about git / rsync portage here
Top
EmaRsk
Apprentice
Apprentice
Posts: 158
Joined: Tue Sep 07, 2004 9:14 am
Location: Italy

  • Quote

Post by EmaRsk » Fri Feb 06, 2015 1:05 pm

ct85711 wrote:[…] for like times that the checksum on a ebuild is bad (happens once and a while on ~amd64 tree). On rsync, I can just go ahead and rebuild the checksum and continue on with building my system and syncing will have no issue later on. On git, if I did this, I'd later have to wipe my entire tree to resync again; though on the other hand, I could possibly push a updated ebuild/patch to the tree to be reviewed when the associated bug report isn't getting looked at.
Either way, you can do it on a local overlay and it won't interfere. This has the advantage that if the checksum is still wrong after the new sync, you don't lose your fix.
Top
Ant P.
Watchman
Watchman
Posts: 6920
Joined: Sat Apr 18, 2009 7:18 pm
Contact:
Contact Ant P.
Website

  • Quote

Post by Ant P. » Fri Feb 06, 2015 4:55 pm

Wait, are you talking about checksums for distfiles or checksums on the .ebuild itself? With git trees (or any repo with thin-manifests=true) the latter simply doesn't exist.

(Take that ~400 byte line of hashes for a single ebuild. Multiply that for every ebuild and patch in the tree. Then remove it. Think about what that'll do for your update speed...)
Top
mgorny
Developer
Developer
Posts: 83
Joined: Fri Apr 27, 2007 11:20 am

  • Quote

Post by mgorny » Fri Feb 06, 2015 5:52 pm

ct85711 wrote:One thing that may want to point out on differences, is how sensitive changing the portage's tree is to the operation. On rsync, it just removed any files all extra files, so it doesn't care if there was a change done locally to the tree or not, it'll just wipe the change and go on; while git locks up. For a lot of people this isn't too big of an issue, for me it can get to be a annoyance. The reason being for like times that the checksum on a ebuild is bad (happens once and a while on ~amd64 tree). On rsync, I can just go ahead and rebuild the checksum and continue on with building my system and syncing will have no issue later on. On git, if I did this, I'd later have to wipe my entire tree to resync again; though on the other hand, I could possibly push a updated ebuild/patch to the tree to be reviewed when the associated bug report isn't getting looked at.
It's just a matter of settings, you know. We can easily make Portage do a hard git reset before pulling :).
Top
ct85711
Veteran
Veteran
Posts: 1791
Joined: Tue Sep 27, 2005 8:54 pm

  • Quote

Post by ct85711 » Fri Feb 06, 2015 7:39 pm

mgorny: Just a question, but if you configure portage to do a hard reset before pulling every time, wouldn't that end up pulling the whole tree all over and cause a lot more traffic on the server? I'm not too experienced with git, to know exactly how that would work. From my understanding doing a hard reset on the tree wipes the tree and starts over, so cause more traffic on the server. I'm not so worried about the traffic on my side, but I would think we don't want to put an excessive amount of load on the server side (I am thinking more ppl than just me doing this).

Ant: I would assume it would be distfile checksum, as that would be the most common checksum issue most times. I only encountered a invalid checksum on ebuild one time (affected multiple ebuilds, but on same day, so counted as one) and most of that was from using a old rsync server (switched server and it was resolved completely). The main reason why I don't put this ebuild in a local overlay, is that I don't see a reason to maintain it myself when it's maintained in the regular portage tree, patch is already committed in upstream tree (I can find and give you their commit number/link if desired for that patch), and the bug affects multiple ppl on gentoo.
Top
mgorny
Developer
Developer
Posts: 83
Joined: Fri Apr 27, 2007 11:20 am

  • Quote

Post by mgorny » Fri Feb 06, 2015 7:43 pm

ct85711 wrote:mgorny: Just a question, but if you configure portage to do a hard reset before pulling every time, wouldn't that end up pulling the whole tree all over and cause a lot more traffic on the server? I'm not too experienced with git, to know exactly how that would work. From my understanding doing a hard reset on the tree wipes the tree and starts over, so cause more traffic on the server. I'm not so worried about the traffic on my side, but I would think we don't want to put an excessive amount of load on the server side (I am thinking more ppl than just me doing this).
No, 'git reset --hard' just cleans up all changes from the working tree and index (i.e. those added via 'git add'). Likely would need to be accompanied by 'git clean -dfx' to remove extraneous files.
Top
ct85711
Veteran
Veteran
Posts: 1791
Joined: Tue Sep 27, 2005 8:54 pm

  • Quote

Post by ct85711 » Fri Feb 06, 2015 7:57 pm

Ok, thanks. I just didn't want to inadvertently put a heavier load on the server unnecessarily. Guess it seems I'm going need to learn more on how to use git before I switch to using git instead rsync to sync, as I tend to make changes in the portage tree on my computer for temporary changes (since I know those changes are undone on next sync while using rsync).
Top
Al-Caveman
n00b
n00b
Posts: 39
Joined: Sun Sep 21, 2014 2:28 pm

  • Quote

Post by Al-Caveman » Sat Feb 07, 2015 9:41 pm

tw04l124 wrote:I really love benchmarks when they are over the net.

Benchmark is only valid when it is done 5-10 times and it is reproduceable.

So a setup with local boxes with no load and other influences

AFAIK git provides more security as the approach we use now. AS discussed months ago about git / rsync portage here
I am not sure what you mean, however here is some clarification on my side:
  • I did two types of tests, locally on my slow network, and over the
    Internet..
  • Results are reproducible on my testing bed in all cases.
  • There is nothing special about 5-10 tests. What matters is statistical
    significance which could require far less or far more number of tests. But
    almost no one does such tests in this community and I have no time for this to
    be honest. If you wish to trust me, I'd tell you that my results are
    consistent every time I repeated them. If you have the time, I'd be interested
    in seeing your benchmarks with statistical significance measures.
Top
Ant P.
Watchman
Watchman
Posts: 6920
Joined: Sat Apr 18, 2009 7:18 pm
Contact:
Contact Ant P.
Website

  • Quote

Post by Ant P. » Sun Feb 08, 2015 12:29 am

mgorny wrote:It's just a matter of settings, you know. We can easily make Portage do a hard git reset before pulling :).
Will it also be straightforward to --verify-signatures on each sync? When I can exchange webrsync with a single line of extra config, it'll be perfect.
Top
Post Reply
  • Print view

41 posts
  • Previous
  • 1
  • 2

Return to “Gentoo Chat”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic