ct85711 wrote:I do know they did try to go through the effort to port the history over, when this was originally thought up. I couldn't really say on what happen on that part, the most I can do is guess on the reasons. I suspect they more of had in issue where they can't get the commits from CVS to line up to git (most likely the case) or more of space usage.
A few years ago at work I wrote and ran a script to grab our most ancient CVS history and bring it into a Mercurial repository. One factor that made it more difficult is one to which you allude: lining up CVS commits in order to make Mercurial commits (Mercurial has been our main VCS since 2007). The problem, of course, is that CVS commits are simply per-file; there is no good way to find everything that happened for the commit of a large changeset. My solution was to look for all the CVS commits from a given author within a short time interval and stage them into a single Mercurial commit.
In general, this is a bit less of an issue with Portage than it was in our system: while we needed the changes to many files to appear coherently in changsets, in Portage the commits are often just for single ebuilds.
The process is certainly doable. With git it would be faster than with Mercurial and would take up less space. One thing that made the conversion easer for me was that I had direct access to the RCS files themselves.
I did this because our main Mercurial repository encompassed only those changes since we switched from CVS to Subversion in 2005. That switch involved a packaging change from a set of a couple of dozen separate repositories into a single repository. My work also coalesced those separate CVS repositories into a directory structure like the one we adopted at the time of the switch to Subversion. (Funny, we were on Subversion only 2 years before switching to Mercurial.)
There are many times I've found my reconstructed repository to be useful--even if it is for commits more than 10 years old.
The whole Portage CVS repository manages to sit on a Gentoo server. It certainly is of finite size. As to how large and how long it would take to transmit this if converted to git, I have no good guess.
Our main system at work has maybe 10,000 files in version control; the average size is around 10Kb. It does not take unmanagably long to get a checkout with that 10 years of history behind it (my reconstruction of the CVS history is not in that main repository). Portage has about 15 times that number of files, but the average size is more like 5Kb. (No wonder syncing takes so long!)
The whole of Portage history might be more than most users might want to slurp up, but at least it ought to be available. Would not have it been good to obtain the whole converted repository but be able to slice off part of it for distribution? At reasonable-enough intervals (say once a year), grab a patch of the last n years or months from the end of the big repository. For normal syncing, send that.
--Or-- (I think this would work but I haven't used git enough to be sure), make users' the initial clones be shallow clones at whatever convenient depth so that subsequent pull requests would pick up only newer revisions. That would involve no periodic splitting from the big repository. The pull requests at every normal sync would pick up only the newer changes.
There would still be a good basis for people who wanted to pull the whole authoritative history and use it (perhaps for routine uses like displaying good reconstructed-from-commit-history changelogs on sites like znurt.org).
After all, even those developers complaining it would take too long to transmit and store lots of really old Portage history and therefore took the expedient of making this unhappy August 2015 chop-point are going to be up against a new reality three or four years hence: the Portage repo will have accumulated lots of old history and those initial clones are going to take longer than maybe they want it to take.
I think that either of my proposals for using a grand authoritative repository is much more sustainable in the long term. It would be more honest, at least.