SVN-Git migration plan

Antranig Basman antranig.basman at colorado.edu
Sun Jan 30 21:20:52 UTC 2011


Thanks for all this careful analysis and presentation of alternatives, Jamon. To me, option 2 sounds fine. An important part of our responsibility in keeping a source history is keeping track of IP - however, empty revisions could could carry no IP. I think it is important to get our work into git as soon as is practicable and option 2 seems the fastest approach which will yield an acceptable result.
One issue might be that we have some implementation which depended on the existence of a particular empty directory, and that a git checkout of this history would produce a non-working image. I believe there has only ever been one instance of this issue, in the mid-period of Engage... but as I understand it, none of the other options would help with this in any case, since empty directories cannot be represented in git.
That's my opinion - Colin, what are your thoughts?


Jamon Camisso <jamonation at gmail.com> wrote:

>On 1/25/2011 4:07 PM, Jamon Camisso wrote:
>> For anyone who is interested, here is a list of all relevant directories
>> from SVN, including those that were deleted at some point in the past.
>> The plan is to map what Colin has outlined below to directories in this
>> file, and then to convert each to a tag, branch, or master branch
>> depending on where it needs to live in Git.
>
>Responding to myself here, and would like to hear from people about the 
>following:
>
>Justin and I have been working on importing SVN into Git this week, with 
>a fair amount of success. We managed to cut infusion down to about 
>22-24mb by removing extraneous psd files from the repository.
>
>However, in shuffling repositories and branches around, we have 
>discovered that the tool being used svn-all-fast-export[1][2] does not 
>incorporate SVN commits to empty directories into the git repository. 
>This behaviour is by design - both Git and Mercurial explicitly do not 
>support tracking directories.
>
>This feature (or bug depending on which side of the fence is most 
>attractive or comfortable) means that where historical changes to SVN 
>like the move from /utoronto/fluid to /fluid occurred, the particular 
>commit tracking that change is not present in Git.
>
>One of goals during this migration to Git is to preserve as much history 
>in the various repositories that are being forked as possible. This 
>attempt at maintaining the historical integrity of Fluid's source code 
>repositories will ensure that future members or external participants in 
>the Fluid community will have access to relevant information about the 
>historical development of various projects.
>
>With all that in mind, Justin and I can think of a few options that are 
>or will be more or less palatable to those who have read this far:
>
>Option 1) Stick with SVN. Unlikely. This choice would not be in keeping 
>with the distributed collaborative nature of Fluid. As such it would be 
>a very unsavory outcome.
>
>Option 2) Use svn-all-fast-export as it currently runs, with the proviso 
>that any SVN commit of an empty directory or directories will be elided 
>from the history of the repository. This option is semi-palatable in 
>that the final repositories would look and behave exactly as if they 
>were created in Git in the first place.
>
>Option 3) Convert repositories using svn-all-fast-export and run "git 
>commit --append" on each commit in question. Said commits can be found 
>using the output of the svn-all-fast-export tool with full rule 
>debugging output enabled and piped to a log file or extracted directly 
>using grep:
>
>grep -E "Exporting revision ([0-9]{4,5})?{4,5}(.*)nothing to do" import.log
>
>That output (of 4286 commits) could then be matched to specific commits 
>that solely affected A/D changes to directories in SVN. For example, 
>r4124-4126 is one such series of commits.
>
>Whereas each Git commit would initially look like the following:
>
>commit ec2571d0833cbd72fa42d471ba2acdbe9ece71dd
>Author: Joseph Scheuhammer <jscheuhammer at ocad.ca>
>Date:   Fri May 18 15:56:36 2007 +0000
>
>     Initial Fluid branch of Berkeley's Gallery Tool
>
>     svn path=/utoronto/fluid/gallery/; revision=4126
>
>The affected commits can then be edited to look like this:
>
>     svn path=/utoronto/fluid/gallery/; revision=4124,4125,4126
>     Extra comment here pointing to Wiki, or SVN, or a file in Git
>     outlining changes to the repository
>
>Option 4) Hack on svn-all-fast-export to make it do something with 
>directory modifications. This option would likely take a fair amount of 
>time and work to get it working just right, and is not in keeping with 
>the fundamental design of Git.
>
>Option 5) Use a different tool altogether, like git-svn, or the original 
>svn2git tool. These tools are not nearly as sophisticated as 
>svn-all-fast-export in that they are a) incredibly slow and b) unable to 
>track changes to a file's location between directories historically 
>deleted directories the same way that svn-all-fast-export does.
>
>My first preference would be Option 3. However, successfully mapping 
>commits of empty directories to preceding commits depends on how much 
>information can be extracted and correlated programmatically. If there 
>is too much manual work required then my other preference would be Option 2.
>
>Option 2 is viable and would be the fastest of the two. This optiont 
>akes into account the fact that SVN will still be online. I would 
>imagine that anyone who is interested enough in who created an empty 
>directory would probably be willing to do the work of quickly doing and 
>svn log -r0001 on the repository and extracting the information that way.
>
>The fact that not all information is being imported from SVN to Git 
>(Photoshop psd files for example) makes option 2 that much more 
>compelling in that it would take very little time to freeze SVN and just 
>do the conversion.
>
>In the end options 2 and 3 both preserve information about empty 
>directories, albeit in two different locations. Whereas the former 
>retains an intact record in SVN, the latter entails taking small 
>liberties with the historical record in Git. However, in both cases, the 
>fact that committer X created directory Y will still be easily gleaned 
>from some easily found and well documented location for those who are 
>interested in such information.
>
>tl;dr there is no easy way to import empty directories into Git. Option 
>2 is less disruptive and faster, while leaving information in multiple 
>locations. Option 3 will require some small amount of historical 
>revisionism, while retaining what history and files are deemed important 
>in one repository format.
>
>Feedback is welcome at this point. I imagine Colin and Antranig will be 
>especially interested in sharing their thoughts.
>
>Regards, Jamon
>
>[1] http://packages.debian.org/testing/main/svn-all-fast-export
>[2] svn-all-fast-export has been forked and named svn2git, the confusing 
>part being that there is a Ruby project that precedes the fork with the 
>same name..)
>_______________________________________________________
>fluid-work mailing list - fluid-work at fluidproject.org
>To unsubscribe, change settings or access archives,
>see http://fluidproject.org/mailman/listinfo/fluid-work


More information about the fluid-work mailing list