Recently I had to fix a Git repository where something unfortunate happened: probably due to accessing a NTFS partition that was still mounted in a hibernated alternative operating system, several files became corrupted (and actually had their contents exchanged with different files on the disk!).
git fsck
discovered corrupted blobs, which we tried to recover
first, when we detected their content does not make sense at all.
These blobs were irrevocably lost, but we still wanted to get
out the rest of the Git history alive.
Usually, applying the following technique is not necessary, because you can either just clone again from your Git upstream or recover the repository from the last backup—but both did not exist in this case.
The corrupted blobs actually all belonged to a single commit that happened a few months ago. The solution was thus to remove this commit from the history, keeping all other trees intact (of course, commit ids would change, but the content won’t).
I first tried to do this with git rebase
, but it is of course the
wrong tool, since it will try to remove the change of the defect in all
following history.
Finally, I had a use-case for git filter-branch
. To make it short,
we can filter out the defect commit using:
git filter-branch --commit-filter \
'[ $GIT_COMMIT = badbadbadbad ] && skip_commit "$@" ||
git commit-tree "$@"'
This will rewrite all commits after badbadbadbad
, but not touch
their actual content.
git fsck
still was not happy, we thus made a clean copy using
git clone --no-local --no-hardlinks mybrokengit myfixedgit
Now git fsck
reported no errors and all other revisions were still ok.
(Also, the blobs have been packed, so the next data corruption will be
more fatal… ;))
I cannot think of any other version control system where a repair like this would have been possible. Thanks, Git!
NP: Against Me!—Exhaustion & Disgust