SVN-Git migration plan

Jamon Camisso jamonation at gmail.com
Sat Jan 29 02:59:11 UTC 2011


On 1/25/2011 4:07 PM, Jamon Camisso wrote:
> For anyone who is interested, here is a list of all relevant directories
> from SVN, including those that were deleted at some point in the past.
> The plan is to map what Colin has outlined below to directories in this
> file, and then to convert each to a tag, branch, or master branch
> depending on where it needs to live in Git.

Responding to myself here, and would like to hear from people about the 
following:

Justin and I have been working on importing SVN into Git this week, with 
a fair amount of success. We managed to cut infusion down to about 
22-24mb by removing extraneous psd files from the repository.

However, in shuffling repositories and branches around, we have 
discovered that the tool being used svn-all-fast-export[1][2] does not 
incorporate SVN commits to empty directories into the git repository. 
This behaviour is by design - both Git and Mercurial explicitly do not 
support tracking directories.

This feature (or bug depending on which side of the fence is most 
attractive or comfortable) means that where historical changes to SVN 
like the move from /utoronto/fluid to /fluid occurred, the particular 
commit tracking that change is not present in Git.

One of goals during this migration to Git is to preserve as much history 
in the various repositories that are being forked as possible. This 
attempt at maintaining the historical integrity of Fluid's source code 
repositories will ensure that future members or external participants in 
the Fluid community will have access to relevant information about the 
historical development of various projects.

With all that in mind, Justin and I can think of a few options that are 
or will be more or less palatable to those who have read this far:

Option 1) Stick with SVN. Unlikely. This choice would not be in keeping 
with the distributed collaborative nature of Fluid. As such it would be 
a very unsavory outcome.

Option 2) Use svn-all-fast-export as it currently runs, with the proviso 
that any SVN commit of an empty directory or directories will be elided 
from the history of the repository. This option is semi-palatable in 
that the final repositories would look and behave exactly as if they 
were created in Git in the first place.

Option 3) Convert repositories using svn-all-fast-export and run "git 
commit --append" on each commit in question. Said commits can be found 
using the output of the svn-all-fast-export tool with full rule 
debugging output enabled and piped to a log file or extracted directly 
using grep:

grep -E "Exporting revision ([0-9]{4,5})?{4,5}(.*)nothing to do" import.log

That output (of 4286 commits) could then be matched to specific commits 
that solely affected A/D changes to directories in SVN. For example, 
r4124-4126 is one such series of commits.

Whereas each Git commit would initially look like the following:

commit ec2571d0833cbd72fa42d471ba2acdbe9ece71dd
Author: Joseph Scheuhammer <jscheuhammer at ocad.ca>
Date:   Fri May 18 15:56:36 2007 +0000

     Initial Fluid branch of Berkeley's Gallery Tool

     svn path=/utoronto/fluid/gallery/; revision=4126

The affected commits can then be edited to look like this:

     svn path=/utoronto/fluid/gallery/; revision=4124,4125,4126
     Extra comment here pointing to Wiki, or SVN, or a file in Git
     outlining changes to the repository

Option 4) Hack on svn-all-fast-export to make it do something with 
directory modifications. This option would likely take a fair amount of 
time and work to get it working just right, and is not in keeping with 
the fundamental design of Git.

Option 5) Use a different tool altogether, like git-svn, or the original 
svn2git tool. These tools are not nearly as sophisticated as 
svn-all-fast-export in that they are a) incredibly slow and b) unable to 
track changes to a file's location between directories historically 
deleted directories the same way that svn-all-fast-export does.

My first preference would be Option 3. However, successfully mapping 
commits of empty directories to preceding commits depends on how much 
information can be extracted and correlated programmatically. If there 
is too much manual work required then my other preference would be Option 2.

Option 2 is viable and would be the fastest of the two. This optiont 
akes into account the fact that SVN will still be online. I would 
imagine that anyone who is interested enough in who created an empty 
directory would probably be willing to do the work of quickly doing and 
svn log -r0001 on the repository and extracting the information that way.

The fact that not all information is being imported from SVN to Git 
(Photoshop psd files for example) makes option 2 that much more 
compelling in that it would take very little time to freeze SVN and just 
do the conversion.

In the end options 2 and 3 both preserve information about empty 
directories, albeit in two different locations. Whereas the former 
retains an intact record in SVN, the latter entails taking small 
liberties with the historical record in Git. However, in both cases, the 
fact that committer X created directory Y will still be easily gleaned 
from some easily found and well documented location for those who are 
interested in such information.

tl;dr there is no easy way to import empty directories into Git. Option 
2 is less disruptive and faster, while leaving information in multiple 
locations. Option 3 will require some small amount of historical 
revisionism, while retaining what history and files are deemed important 
in one repository format.

Feedback is welcome at this point. I imagine Colin and Antranig will be 
especially interested in sharing their thoughts.

Regards, Jamon

[1] http://packages.debian.org/testing/main/svn-all-fast-export
[2] svn-all-fast-export has been forked and named svn2git, the confusing 
part being that there is a Ruby project that precedes the fork with the 
same name..)



More information about the fluid-work mailing list