Testing Git Repo Backup Strategies

Kyle Bowman

Overview

This writeup contains a notes about testing what is and what isn’t backed up when you choose certain Git backup strategies.

For my use case, I found no practical difference between using git bundle and git clone --mirror. In both cases, the repository contents are faithfully preserved. They do not preserve anything in the .git directory, so you must backup relevant files from that anyways. Allegedly, both backups are atomic operations and support incremental backup, but I didn’t know how to test those two claims.

Inventory of Needs

Per this this Stack Overflow post, there are a couple of ways proposed to backup a repository, but its not unanimous. I want to make sure that everything that I want backed up actually is backed up and that I can recover.

I appreciate that Maddes qualifies what “Full backup” means. I think that’s wise and that it’s important to define what you need in a “full backup.” Here are the things that I want to back up:

The last one is basically anything that I might forget that I have modified, but affects the behavior of the project.

Most of my stuff is single user. Downtime is acceptable; I don’t need hot copy.

Furthermore, Maddes says

git is a developer tool and leaves this to the admin. Backup of the git configuration and OS configuration should be seen as separated from the backup of the content.

After my experiments, I reached the same conclusion.

Strategies Considered

Setup

First, I create a directory called git_backups to house all the cruft that these experiements will create.

Initialize Origin

# From git_backups/ directory.
mkdir origin
cd origin
git init

Populate with Content to Test

# From origin/ directory.
touch test
ln -s test link
echo 'echo Pre-commit works!' > .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit
git add .
git commit -m "Add content."
# Output: Pre-commit works! (and other stuff)

Oops… I forgot to add a branch and a tag.

git branch newbranch
git switch newbranch
touch newfile
git add .
git commit -m "Added newfile to newbranch." # Prints Pre-commit works! (and other stuff)
git switch main
git branch # Shows main and newbranch
git tag latest
git tag --list # Shows latest
cd ..

The cd at the end takes me back to the git_backups/ directory.

Mirror Approach

Create the Mirror

# From git_backups/ directory.
git clone --miror origin/ mirror

This looks mostly like a bare repository. I see a lot of stuff in the objects/ directory, but not in branches, hooks, ref/tags, or ref/head. Maybe that stuff is mostly used by repos that have an active working directory. Let’s see what the “restored” repository looks like.

Restore from a Mirror

From the stack overflow, a mirror is a new repository that is populated with the current git template.

# From git_backups/ directory
git clone mirror/ restored_mirror
cd restored_mirror/

Some notes:

Bundle Approach

Create the Bundle

cd ../origin
git bundle create origin.bundle --all
mv origin.bundle ..
cd ..

Restoring from the Bundle

git clone origin.bundle bundle_clone
cd bundle_clone

Some notes:

Conclusion

I have a slight preference for mirroring instead of bundling if only because updating is as simple as git remote update.

There doesn’t seem to be much difference between them in my mind. Neither seems more comprehensive. I’ll have to dive into the .git directory and explicitly copy those files if I need to preserve them. Allegedly, both bundle and mirroring are both atomic.

The one advantage I see of git bundle over mirror is that you can specify very precise refs to include in the bundle in case, for example, you only need the most recent three commits. Maybe you can also do that with clone, but I know I saw it with bundle. For backups, I’m not sure when I would ever need that. Both mirror and bundle do incremental backups.