SVN-Git migration plan

Colin Clark colinbdclark at gmail.com
Sun Jan 30 18:11:39 UTC 2011


Hi Jamon,

Wow, quite the essay. Option #2 is just fine with me. I think there are a couple of reasons why this approach is okay:

a) We've committed to maintaining our SVN repository as a full archive of our commit history prior to the switch to Git. Everything we migrate to Git will be removed from the head of SVN, but all the history will remain accessible. We'll keep it around as part of the standard record keeping that open source communities have a responsibility to maintain.

b) Trying to tack on directory change tracking to a system that fundamentally doesn't support it is, as you point out, ill-advised.

Assuming this issue is entirely limited to directory restructuring where no changes occurred to actual source code files, then I don't feel a strong motivation to inject history changes in Git to cover this sort of thing, as you suggest in option #3.

Other thoughts from the community?

Colin

On 2011-01-28, at 9:59 PM, Jamon Camisso wrote:

> Justin and I have been working on importing SVN into Git this week, with a fair amount of success. We managed to cut infusion down to about 22-24mb by removing extraneous psd files from the repository.
> 
> However, in shuffling repositories and branches around, we have discovered that the tool being used svn-all-fast-export[1][2] does not incorporate SVN commits to empty directories into the git repository. This behaviour is by design - both Git and Mercurial explicitly do not support tracking directories.
> 
> This feature (or bug depending on which side of the fence is most attractive or comfortable) means that where historical changes to SVN like the move from /utoronto/fluid to /fluid occurred, the particular commit tracking that change is not present in Git.
> 
> One of goals during this migration to Git is to preserve as much history in the various repositories that are being forked as possible. This attempt at maintaining the historical integrity of Fluid's source code repositories will ensure that future members or external participants in the Fluid community will have access to relevant information about the historical development of various projects.
> 
> With all that in mind, Justin and I can think of a few options that are or will be more or less palatable to those who have read this far:
> 
> Option 1) Stick with SVN. Unlikely. This choice would not be in keeping with the distributed collaborative nature of Fluid. As such it would be a very unsavory outcome.
> 
> Option 2) Use svn-all-fast-export as it currently runs, with the proviso that any SVN commit of an empty directory or directories will be elided from the history of the repository. This option is semi-palatable in that the final repositories would look and behave exactly as if they were created in Git in the first place.
> 
> Option 3) Convert repositories using svn-all-fast-export and run "git commit --append" on each commit in question. Said commits can be found using the output of the svn-all-fast-export tool with full rule debugging output enabled and piped to a log file or extracted directly using grep:
> 
> grep -E "Exporting revision ([0-9]{4,5})?{4,5}(.*)nothing to do" import.log
> 
> That output (of 4286 commits) could then be matched to specific commits that solely affected A/D changes to directories in SVN. For example, r4124-4126 is one such series of commits.
> 
> Whereas each Git commit would initially look like the following:
> 
> commit ec2571d0833cbd72fa42d471ba2acdbe9ece71dd
> Author: Joseph Scheuhammer <jscheuhammer at ocad.ca>
> Date:   Fri May 18 15:56:36 2007 +0000
> 
>    Initial Fluid branch of Berkeley's Gallery Tool
> 
>    svn path=/utoronto/fluid/gallery/; revision=4126
> 
> The affected commits can then be edited to look like this:
> 
>    svn path=/utoronto/fluid/gallery/; revision=4124,4125,4126
>    Extra comment here pointing to Wiki, or SVN, or a file in Git
>    outlining changes to the repository
> 
> Option 4) Hack on svn-all-fast-export to make it do something with directory modifications. This option would likely take a fair amount of time and work to get it working just right, and is not in keeping with the fundamental design of Git.
> 
> Option 5) Use a different tool altogether, like git-svn, or the original svn2git tool. These tools are not nearly as sophisticated as svn-all-fast-export in that they are a) incredibly slow and b) unable to track changes to a file's location between directories historically deleted directories the same way that svn-all-fast-export does.
> 
> My first preference would be Option 3. However, successfully mapping commits of empty directories to preceding commits depends on how much information can be extracted and correlated programmatically. If there is too much manual work required then my other preference would be Option 2.
> 
> Option 2 is viable and would be the fastest of the two. This optiont akes into account the fact that SVN will still be online. I would imagine that anyone who is interested enough in who created an empty directory would probably be willing to do the work of quickly doing and svn log -r0001 on the repository and extracting the information that way.
> 
> The fact that not all information is being imported from SVN to Git (Photoshop psd files for example) makes option 2 that much more compelling in that it would take very little time to freeze SVN and just do the conversion.
> 
> In the end options 2 and 3 both preserve information about empty directories, albeit in two different locations. Whereas the former retains an intact record in SVN, the latter entails taking small liberties with the historical record in Git. However, in both cases, the fact that committer X created directory Y will still be easily gleaned from some easily found and well documented location for those who are interested in such information.
> 
> tl;dr there is no easy way to import empty directories into Git. Option 2 is less disruptive and faster, while leaving information in multiple locations. Option 3 will require some small amount of historical revisionism, while retaining what history and files are deemed important in one repository format.
> 
> Feedback is welcome at this point. I imagine Colin and Antranig will be especially interested in sharing their thoughts.
> 
> Regards, Jamon
> 
> [1] http://packages.debian.org/testing/main/svn-all-fast-export
> [2] svn-all-fast-export has been forked and named svn2git, the confusing part being that there is a Ruby project that precedes the fork with the same name..)

---
Colin Clark
Technical Lead, Fluid Project
http://fluidproject.org




More information about the fluid-work mailing list