Recently I had to fix a Git repository where something unfortunate happened: probably due to accessing a NTFS partition that was still mounted in a hibernated alternative operating system, several files became corrupted (and actually had their contents exchanged with different files on the disk!).
git fsck discovered corrupted blobs, which we tried to recover
first, when we detected their content does not make sense at all.
These blobs were irrevocably lost, but we still wanted to get
out the rest of the Git history alive.
Usually, applying the following technique is not necessary, because you can either just clone again from your Git upstream or recover the repository from the last backup—but both did not exist in this case.
The corrupted blobs actually all belonged to a single commit that happened a few months ago. The solution was thus to remove this commit from the history, keeping all other trees intact (of course, commit ids would change, but the content won’t).
I first tried to do this with
git rebase, but it is of course the
wrong tool, since it will try to remove the change of the defect in all
Finally, I had a use-case for
git filter-branch. To make it short,
we can filter out the defect commit using:
git filter-branch --commit-filter \ '[ $GIT_COMMIT = badbadbadbad ] && skip_commit "$@" || git commit-tree "$@"'
This will rewrite all commits after
badbadbadbad, but not touch
their actual content.
git fsck still was not happy, we thus made a clean copy using
git clone --no-local --no-hardlinks mybrokengit myfixedgit
git fsck reported no errors and all other revisions were still ok.
(Also, the blobs have been packed, so the next data corruption will be
more fatal… ;))
I cannot think of any other version control system where a repair like this would have been possible. Thanks, Git!
NP: Against Me!—Exhaustion & Disgust