$ du -h ~/projects/Git/ruby.git/
29M /Users/chris/projects/Git/ruby.git/
Amazing, but true: above directory contains the whole history of the Ruby CVS—from January 1998 until today, in less than 30 megabytes. That’s 9325 commits and about 44332 different file versions.
How is this possible? I used Git, the version control system that was written to keep the Linux source, which is “designed to handle absolutely massive projects with speed and efficiency”. And most of the parts are actually pretty efficient and fast.
Not among them is importing from CVS. Not yet, at least. Git includes
a Perl script, git-cvsimport
which essentially works like that:
Checkout each revision from CVS, commit to Git, checkout the next revision,
commit again, water, rinse and repeat.
Hopelessy slow, especially if the CVS is remote. So let’s fix that,
we make a local CVS mirror first. Luckily, the Ruby CVS supports
cvsup, which is essentally like a fast rsync
for CVS repositories, but also can be used to mirror complete
CVSROOTs. Unluckily, this is not documented at the Ruby CVS page.
However, with help from Shugo Maeda, I was able to locally mirror the
Ruby CVS. You need a cvsup
file like this:
*default base=/Users/chris/mess/current/cstest/sups
*default compress delete use-rel-suffix
*default release=cvs
*default host=cvs.ruby-lang.org
*default prefix=/Users/chris/mess/current/cstest/ruby
# Ruby and other modules
cvs-src
Adjust the paths to your local needs, of course. Then, you need to fetch cvsup. If you are lucky, your distribution will have it packaged, else you need to bootstrap a Modula 3 compiler(!) to compile it. Have fun. *sigh* (The compiler is pretty quick, though.) Anyway, at the end of the day, I had my local CVS mirror—let the experiments start.
git-cvsimport
depends on cvsps, a tool to analyze
CVSROOTs and figure the actual revisions. This is needed because CVS
is a bunch of clunky shit that has no conscience of its commits. After
that, an almost endless loop of checkout and commit will start. If
you want to try it yourself, get a fast computer, a fast, big disk and
an efficient file-system. No, doing it on an iBook with only a few
gigs free and HFS+ is not a good idea. Actually, it took four days,
and I had to do it stepwise.
There could be a better solution in the future, parsecvs by Keith Packard of X.org fame. It’s in very early alpha stage, and will need even more disk space as of now, but ought to be a lot faster in the future. At least one can hope.
After this, you’ll have a Git controlled tree full of the actual file revisions, it’s hard to estimate how big it would be. To make the handy file shown above, you need to pack the tree. For this, you run:
git repack -d
This will compute a few minutes/hours/days and spit out a nice file,
of about 70 megabytes in size. If you want the handy file above, you
either need to figure out how to patch git repack
to pass the
optimization options --window=50 --depth=50
to git-pack-objects
,
or call the latter low-level tool directly. This way, you’ll get the handy
file. Higher argument values will slow down the process a lot, and
not result in packages that are maybe half a megabyte smaller. I
tried.
The great thing about git-cvsimport
is that it can work
incrementally, so once we have the pack, we can update directly from
Ruby CVS—the changes are small if you do that regularily. For this,
I included a small script in the pack, update-ruby-git
:
git-cvsimport -d :pserver:anonymous@cvs.ruby-lang.org:/src \
-k -u -v -m -p -Z,9 ruby
Run this script regularily to keep your tree recent. You don’t need the CVSROOT or cvsup anymore.
Now, how is this all of this useful? Obviously, you enjoy all the benefits Git provides for your daily hacking: atomic actions, distributed development, zero cost (almost!) branches and good merges. Also, you have the nice gitk repository browser that allows you to keep track of recent development. Since you can fetch every file at every revision easily, it’s just a matter of time someone starts datamining… “how many percents of Ruby are really written by matz”?
You can use
git bisect
to find bugs in Ruby by marking some revision as good, some as bad,
and let Git figure which revision you try next to find the faulty
patch.
And if you really want to use CVS, you even can emulate a CVS server
(read and write!), with git-cvsserver
. Isn’t that impressive?
I probably will make the pack available on the net, but I haven’t yet found a good way to allow others to efficiently (and incrementally) fetch it… hopefully more about that later.
NP: Meat Puppets—Up on the Sun