Feb 01
Permalink

Fixing SVN Merge History in Git Repositories

Assuming you use the fabulous svn2git to convert your SVN repository to a Git one, you might still run into a problem: merge history.

Why And How To Fix Merge History

Things Might Be Just Fine

If you have always used SVN 1.5 or newer and thus have svn:mergeinfo properties for all your merges, you can ignore this part, as git-svn will correctly identify merges from, say, a branch back to trunk (or master once the conversion is done) and attach two parents to that commit: the previous commit in master, and the last commit from the branch that’s being merged.

In such a case, your history will look something like this:

$ git log --pretty=format:\"%h %ad | %s%d [%an]\" --graph --date=short
* 0d34dbe 2009-08-12 | allow passing of arbitrary arguments to ant (HEAD, master) [felix]
*   a199e71 2009-07-06 | merging the changes from [3853:4186/branches/felix-tutorial-app-rewrite] major rewrite of tutorial apps major rewrite of build scripts [felix]
|\  
* | 2121341 2009-07-06 | removing staging apps for backmerge, somehow svn mucks up if I don't [felix]
| *   145e1aa 2009-07-06 | merging the changes from [4165:4184/branches/felix-build-scripts-rewrite] fixes #1115 (felix-tutorial-app-rewrite) [felix]
| |\  
| | * 2a9b5cc 2009-07-06 | fix excludes for svn directories (felix-build-scripts-rewrite) [felix]
| | * ec374ec 2009-07-02 | reorganized the stylesheets a little, make a custom base that does not set doctypes on the output and a stylesheet for the website as well as for the xhtml documentation that inherit from it [felix]
| | * f119381 2009-07-01 | modified the toolkits build system to allow overriding the xsl for the toc-generation [felix]
| | * 576b71b 2009-06-29 | creating branch for build scripts rewrite, fixing build problems in dita-ot 1.4.2 and greater refs #1115 [felix]
| |/  
| * 6e8e682 2009-06-26 | added a note about Action::executeRead() to the chapter about module creation fixes #1091 [felix]
| * 976c384 2009-02-18 | creating a branch for the rewrite of the tutorial apps to incorporate latest best practices. [felix]
* | 2121341 2009-07-06 | removing staging apps for backmerge, somehow svn mucks up if I don't [felix]
|/  
* fd87653 2008-10-31 | dump current versions of refguide [mikeseth]

But You’re Probably Screwed

If, however, you did not have merge tracking yet, the history will look like this:

$ git log --pretty=format:\"%h %ad | %s%d [%an]\" --graph --date=short
* 0d34dbe 2009-08-12 | allow passing of arbitrary arguments to ant (HEAD, master) [felix]
* a199e71 2009-07-06 | merging the changes from [3853:4186/branches/felix-tutorial-app-rewrite] major rewrite of tutorial apps major rewrite of build scripts [felix]
* 2121341 2009-07-06 | removing staging apps for backmerge, somehow svn mucks up if I don't [felix]
* fd87653 2008-10-31 | dump current versions of refguide [mikeseth]

The reason becomes apparent when we use git-rev-list:

$ git rev-list --all --graph --oneline --date-order
* 0d34dbe allow passing of arbitrary arguments to ant
* a199e71 merging the changes from [3853:4186/branches/felix-tutorial-app-rewrite] major rewrite of tutorial apps major rewrite of build scripts
* 2121341 removing staging apps for backmerge, somehow svn mucks up if I don't
| * 145e1aa merging the changes from [4165:4184/branches/felix-build-scripts-rewrite] fixes #1115
| | * 2a9b5cc fix excludes for svn directories
| | * ec374ec reorganized the stylesheets a little, make a custom base that does not set doctypes on the output and a stylesheet for the website as well as for the xhtml documentation that inherit from it
| | * f119381 modified the toolkits build system to allow overriding the xsl for the toc-generation
| | * 576b71b creating branch for build scripts rewrite, fixing build problems in dita-ot 1.4.2 and greater refs #1115
| |/  
| * 6e8e682 added a note about Action::executeRead() to the chapter about module creation fixes #1091
| * 976c384 creating a branch for the rewrite of the tutorial apps to incorporate latest best practices.
|/  
* fd87653 dump current versions of refguide

The merge commits (145e1aa and a199e71) each have only one parent and do not point to the merged commits. This ancestry is then lost when running git log. As a result, it will later be really difficult or impossible to find the commits from those branches again, especially if the corresponding branches have been removed from the origin.

Grafts To The Rescue

You can override parents of a commit in Git using a file called grafts, which needs to reside in .git/info/. In consists of several lines with commit hashes separated by a space character, where the first hash is the commit you want to define parents for, and the following hashes are the parents.

Note: you must use full hashes, not abbreviated ones.

To fix the history above, we must define the parents in .git/info/grafts:

a199e7194c5a4382c7d1057c84a55fcb535c8ed2 2121341005ef4fee0c1755d9ee083b6f0fbaf295 145e1aa687c7b1277c683b8116324a1552ba6454
145e1aa687c7b1277c683b8116324a1552ba6454 6e8e6828f7be352571faccef981568f4acc9ccb1 2a9b5ccf4b0870973508b37bef8d7759889a4f27

Note: for a merge from, say, master to a branch (to sync the branch with master/trunk), you would use $synccommit $lastmastercommit $lastbranchcommit.

If we now run git log --pretty=format:\"%h %ad | %s%d [%an]\" --graph --date=short, things will look just like we expected.

We can then run git filter-branch to make the grafts permanent (see step “Manual Conversion” below). Note that this will change commit hashes.

Manual Conversion

  1. Run svn2git http://svn.example.org/repos --no-minimize-url --metadata --authors authors.txt
  2. Create a file .git/info/grafts (you can use git svn find-rev r$rev $branch to find a Git commit hash for an SVN revision number)
  3. Run git filter-branch --tag-name-filter cat -- --all to make the grafts permanent
  4. Check if the history looks fine
  5. Run git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d to clean up the backups created by git-filter-branch
  6. Remove the now obsolete .git/info/grafts

Automatic Conversion

For the Agavi migration, we wrote a script to automate this process; it uses a grafts file that contains SVN revision numbers (and their branch names) and converts it to a grafts file fit for consumption by Git. We will put that script up on GitHub together with the rest of our migration toolkit in a few days.

Broken Branches

The Agavi documentation repository contains a broken branch, where the branch was made from only a portion of trunk, not from the whole of trunk, in revision 4285. When running this through svn2git, the whole history would be terribly confused and a lot of commits would appear as duplicates since git-svn interpreted the branch point wrong.

The fix in this case was to simply ignore that branch and only use its merge back to trunk (in revision 4308), as the loss of history information in this case was negligible. We told svn2git top stop at the revision before the branch was created, and resume at the revision where the branch gets merged back to trunk (this required a patch to svn2git):

$ svn2git http://svn.agavi.org/documentation --revision 2721:4284 --no-minimize-url --metadata --authors authors.txt
$ git svn fetch -q --revision 4308:HEAD
$ git merge -q svn/trunk

We’re only merging svn/trunk in this case as all the commits from 4308 onwards were made to trunk. Also, between 4284 and 4308, all commits in this repository were in the branch we’re skipping - had there been commits to trunk or other branches, things would have been a tiny bit more complicated, with several calls to git svn fetch and git merge to put the pieces together correctly.