Monday, June 13, 2011

Git - and some of it's more problematic interfaces

(Warning - this post contains some mildly technical content surrounding git)

As some of you may already know - KDE's Git hooks are not capable of detecting copying or moving of files at the moment. They have also been known to show some rather strange and confusing codes concerning file changes too (The "I" code). The origin of these deficiencies lies deep within git itself, and why git is like this is a mystery to me.

When the hooks first start to consider sending out emails, they have a number of bits of information available to them. One of the most vital pieces of information is the "name-status" of files that changed in the commit. This tells the hooks if the file was added, modified and so forth. Unfortunately it misses out one crucial detail - the number of lines added or removed from the file. This information is a must have for the message contained in the emails it sends - so it needs it.

In order to get this information, the hooks call out to git, and recieve a "numstat" in return. This is where the problems begin. Whilst the numstat may look at first sight to be normal, it is only when copies, renames or merges come along that the trouble begins. To see how things can be problematic, let's compare the output - when moving files.

First, from name-status:
R100 repo-configs/audit/kaudiocreator.git/skip-author repo-configs/audit/scratch/nalvarez/mobipocket-migration.git/skip-author
R100 repo-configs/audit/kaudiocreator.git/skip-eol repo-configs/audit/scratch/nalvarez/mobipocket-migration.git/skip-eol
R100 repo-configs/audit/kaudiocreator.git/skip-filename repo-configs/audit/scratch/nalvarez/mobipocket-migration.git/skip-filename

Looks normal enough, and my initial parser in the hooks handled this fine. Now, let's look at the numstat:
0 0 repo-configs/audit/{kaudiocreator.git => scratch/nalvarez/mobipocket-migration.git}/skip-author
0 0 repo-configs/audit/{kaudiocreator.git => scratch/nalvarez/mobipocket-migration.git}/skip-eol
0 0 repo-configs/audit/{kaudiocreator.git => scratch/nalvarez/mobipocket-migration.git}/skip-filename

As you can see, it is completely different. When I first tried to parse this, I came across a number of obstacles, one of which concerned files containing { or } characters. This made the parsing process dangerous and unreliable at best unfortunately.

The other major difference in the numstat is much more confusing however. With a merge, unless you had to intervene by hand to fix a conflict - the name-status will show nothing. Yet the numstat will show a number of changes. When I first tested the hook code, this caused it to bail out, which is where the "I" code comes in. This code is quite simply a hack which works around a weakness in git, as the hooks try to glue all this information together.

This unusual - and very different behaviour of two git commands is not documented anywhere in the man pages, and no option exists to give a behaviour more like the other. The only reason I can guess for this unusual behaviour is a hack in git itself - to give a nice output when you merge or fast-forward. That is just speculation however.

For the future, I can't see the current state of the hooks changing too much unless git itself changes. Until that happens the support for detecting file moves or copies, and giving completely consistent output will not happen unfortunately.


  1. I saw Linus Torvalds' talk about git at Google and he said there that git is not optimized for a workflow where you track files and moves across files since, according to him git tracks content. That is to say git sees your source code as a hole and doesn't care if a given function is in file A or B. I guess you will have to rethink your hooks a bit

  2. 1) don't diff a merge. It's not a regular diff.

    2) git show, git diff and git whatchanged all have --numstat. You say "very different behaviour of two git commands" but you never said which two commands they are, so I'm assuming it's two of those three.