Using GitWeb to Publish Personal Git Repos

Kyle Bowman

Summary

In this post, I explore and set up a way to publicly host a selection of my privately hosted repositories. See the results at my Git site.

There are three parts to this setup:

Configuring the webserver to securely serve CGI content.
Configuring GitWeb (a CGI script) to serve the repositories.
Using Git to push to a remote mirror.

Introduction

Motivation

There’s something oddly self-empowering about managing your own services. Even though I don’t host any trendy services from my box, I use it extensively to host my Git repositories. Using Git, I can turn digital text - the ephemeral medium, into something more permanent. But the text becomes permanent in an curious way. It’s not etched in stone, but rather, it’s endemic. Any one instance of the text is fragile, but as an idea, the text is persistent.

Currently, I use bare Git repositories with SSH access. It works completely fine for version control. However, there are a few problems with my setup compared to say, using a GitHub profile.

My setup is confined to my network. Sharing the repos is tricky.
My setup is less endemic than if I stashed multiple copies over the web.¹
There’s currently no way to browse code without cloning the repo.

I want to fix all of these problems in one swoop. Enter GitWeb.

Why GitWeb

There are plenty of other solutions to host your Git repos as a webpage. Gitea is particularly popular among the self-hosting community. GitLab can also be self hosted. I’m sure there are others in this class. The problem with these is that they are bloated for my purposes. I don’t need CI/CD (aside from a few shell scripts and some git hooks). I don’t need issue tracking, fancy collaboration tools, or granular access control. These extra features become a liability for me because they add complexity to the project and make it less likely that I’ll complete it. I want something simpler.

GitWeb does exactly what I want. It’s a CGI script that, when coupled with a webserver like Apache, can read a directory full of Git repositories and serve them like a webpage. Using that interface, you can navigate code, and depending on the configuration, use Git blame, search or whatever. That’s it. No extra frills. It’s delightfully minimal.

I should mention that there are alternatives at the GitWeb class of tools. CGit is the most notable competitor at this level. CGit is a fine choice. I admit that I chose GitWeb for two trivial reasons. First, GitWeb has the aesthetic that I’m used to for these minimal webhosted repositories. Second, the git instaweb command made it seem really easy to get set up.

Why Use a Mirror Push Strategy

I have a private machine that I regularly interact with. I want to my public host to mirror a selection of those repositories. There are two ways that I can think to accomplish that:

From the public host, contact the private host and fetch results.
From the private host, contact the public host with updates.

I don’t want my public repository to point to my private repository because, well, I want that to be private. Not to mention that it’s not necessarily on a static IP like the public server is. So pushing from private makes more sense than pulling from public. That also simplifies the setup because, for normal Git uses, I only really need my private git user to interact with my public server on my behalf.

Setting Up the Web Server

There’s a few generic web hosting tasks that I won’t go into detail about, but I’ll list them here. They all vary slightly based on which tools you’re using and they are easily found with a web search. The steps are:

Set up a DNS record to point to your subdomain. For me, that ensures that git.rocketbowman.com points to my server’s static IP address.
Configure your web server. This varies greatly based on what you’re trying to do. For example, I host my normal website and the new git subdomain.
Use certbot to set up Let’s Encrypt certficates. This installs and configures SSL/TLS certificates so that your site uses https to encrypt traffic. The certbot utility, with the appropriate plugin also edits the configuration that you set up in the previous step.

Your web server configuration varies greatly. It also has some security-critical features, so I’m not showing my actual configuration here. But these are some of the important directives that you need to consider and what they do.

This port 80 block routes requests intended for http://git.yourserver.com to https://git.yourserver.com. By rerouting requests, we ensure that everyone uses HTTPS.

<VirtualHost *:80>
  ServerName git.yourserver.com
  RewriteEngine on
  RewriteCond %{SERVER_NAME}=git.yourserver.com
  RewriteRule ^ https://${SERVER_NAME}%{REQUEST_URI} [NE, R=Permanent]
</VirtualHost>
<VirtualHost *:443>
  ...
</VirtualHost>

This block uses uses Rewrite instead of Redirect because we are accounting for yourserver.com in addition to git.yourserver.com.
You can have a parallel set of VirtualHosts for your other domains.
You can put all your GitWeb-specific configurations in the HTTPS block.

There are important details to stuff into your HTTPS block as well. When you install the GitWeb package, it includes, among other things, a CSS file, a JavaScript file, and a CGI script that’s written in Perl. It’s important that your webserver has access to all those files. Here are the main steps to consider:

Ensure that you can run CGI scripts. (Ensure mod_cgi is enabled.)
Point to your gitweb.cgi script.
Ensure that your site can access the static resources that come with GitWeb, like the CSS and javascript.
Based on how you do the previous steps, you may need options like FollowSymLinks and ExecCGI.

You can find advice about setting up your webserver in The GitWeb docs.

Setting Up GitWeb

The GitWeb configuration is pretty straightforward. The default configuration comes commented. Uncomment configurations that you want to enable and change values that you want to change. Here’s a snippet (comments are indicated by #):

# path to git projects (<project>.git)
$projectroot = "/var/www/git";

# directory to use for temp files
$git_temp = "/tmp";

$site_name = "git.yourserver.com";

# target of the home link on top of all pages
$home_link = $my_uri || "/";

$projects_list_description_width = 80;

There are some additional features that you can enable, such as enabling git blame or syntax highlighting. These features consume more server resources, but they’re also cool.

Warning: Syntax highlighting caused problems for .bats files. With syntax highlighting enabled, you cannot view those files in the typical tree interface. Instead, you must read .bats files as raw. I suspect there are many other such cases.

Using Git to Mirror Push from a Private Repo to Public

If you have performed the steps above, here’s what you have achieved:

Someone can go to git.yourserver.com and make a request to your server via HTTPS.
Your server queries the gitweb.cgi script for information to put in its response.
The gitweb.cgi script looks in /var/www/git for content and finds… nothing.

This section describes how you can populate that directory and keep it up to date with your repositories.

One-time Setup

On the public machine, we need a git user. Technically, git@private is the only user that will be accessing git@public. But the only operations it needs are Git operations, so let’s lock it down by defining git@public’s shell to be git-shell. You can do this one of two ways:

Add --shell=/usr/bin/git-shell to your adduser command when you create the git user.
Use usermod --shell=/usr/bin/git-shell git if you’ve already created the git user.

After the git user is created, you can add the git@private’s public key to git@public:/home/git/.ssh/authorized_keys. Now, git@private can run Git commands as git@public via the git-shell.

Per-repo Setup

Public and private repositories need to be set up to communicate with one another.

The following snippet shows how you can initialize an empty repository on the public host and add a description that is used by GitWeb.

# Context: git@public:/var/www/git or /srv/git or whatever
git init --bare <project.git>
echo 'description of project' > <project.git>/description

The following snippet shows how you can configure your private repository so that running git push submits changes to the public facing repository.

# Context: git@private:/srv/git/project.git or whatever 
git remote add --mirror=push <remote-name> git@<public>:</var/www/git>/<project.git>

There are a few other git remote add options that I’m not totally sure about. Like tagging. Is --tags or --no-tags the default? I currently don’t use tags, but it seems like it would be useful for versioning. That is something that I might do in the future.

One enhancement to consider is that this could be scripted. Since you can run remote commands by using ssh, you should be able to do the whole thing from the private side. (See the appendix below that investigates this approach.)

Pushing changes from private repo to public mirror

Once it’s set up, you can run git push from the private repo to push changes to the public mirror. However, it can get annoying to push a set of changes from your workspace to your private repo, then switch to your private repo to push the same changes to your public repo. There are some enhancements you can make to automate that part.

Use a cron job to push changes. This is a great choice if you want to build in a delay.
Use a git hook. This is a great choice if you want it to be automatic as soon as your private push goes through.

Note: You could also include this setup in the repo setup script.

Using Cron

As git user, invoke crontab -e and add the following line to mirror the repo at midnight. Adjust the schedule however you want.

0 0 * * * /usr/bin/git -C <path/to/repo> push

The -C option changes to <path/to/repo> before invoking the subsequent git subcommand.

Using a Git Hook

There are several hooks that run at various points of the Git workflow. It looks like a post-recieve hook on the private repository is the way to go. My understanding is that the following happens:

A set of changes lands on the private repository.
Some checks and hooks trigger, such as pre-recieve.
The changes are accepted.
The post-recieve hook runs.

The relevant part of the doc says this:

The [post-recieve] hook executes on the remote repository once after all the proposed ref updates are processed and if at least one ref is updated as the result. (source)

Here’s another post recommending the post-receive hook for mirroring.

You should be able to set up the hook with the following applied to the private repo:

echo "git push" > <project>.git/hooks/post-recieve \
    && chmod +x <project>.git/hooks/post-recieve

Notes:

I’m assuming that this is a bare repo named with the <project>.git convention.
git push doesn’t work from an empty bare repository. It will complain about “no refs in common” or something. You must have at least one branch or tag or something. That’s fine. Why mirror a totally empty repository anyways?

Conclusion

I’m pleased with everything so far. It’s very likely that I’ll iterate on the process in the future. I expect that

I’ll want to collaborate, so I’ll need to change the access controls so that the public repos are canonical.
I’ll scale up to the point that it’s worthwhile to script some more of the setup.

However, after being burnt too many times by premature optimization in the past, I’m leaning into Adam Savage’s philosophy that it’s important not to start with a specialized tool kit, but to come to it as a consequence of meaningful experience. For now, I’m happy to get something working, use it, and iterate on it later.

Appendix: Scripting the Per-Repo Setup

This is the script that I wanted to define. I wanted to be able to switch to my private git server, and run git-init.sh <repo-name> <description> and have it set up the whole thing. But there’s a subtle, fundamental problem with the approach in the mirror_repo_setup step. It’s a pretty interesting problem. I challenge interested readers to find it for themselves before reading it below. Hint: Think about who is doing what.

#!/bin/bash

# Use repo.git convention for bare repositories. 
# All references herein are to bare repositories. 
REPO_NAME="$1".git
REPO_DESCRIPTION=${2:-"No Description provided."}

MIRROR_GITUSER=...
MIRROR_HOST=...
MIRROR_SRV_DIR=...

LOCAL_SRV_DIR=...
LOCAL_REMOTE_NAME=...

mirror_repo_setup () {
    local cmd="git -C $MIRROR_SRV_DIR init --bare $REPO_NAME"

    # Leave $cmd unquoted. $cmd should be unpacked for use with ssh. 
    ssh "$MIRROR_GITUSER"@"$MIRROR_HOST" $cmd
    ssh "$MIRROR_GITUSER"@"$MIRROR_HOST" echo "$REPO_DESCRIPTION" > "$REPO_NAME"/description
}

# The indentation clash from the heredoc looked really ugly, so I factored it
# out of the private_repo setup.
_write_hook () {
cat << EOF > "$1"
#!/bin/bash
git push
EOF
}

private_repo_setup () {
    local repo_path="$LOCAL_SRV_DIR/$REPO_NAME"
    local hook_file="$repo_path"/hooks/post-receive
    git -C "$LOCAL_SRV_DIR" init --bare "$REPO_NAME"
    git -C $repo_path remote add \
        --mirror=push "$LOCAL_REMOTE_NAME" \
        "$MIRROR_GITUSER"@"$MIRROR_HOST":"$MIRROR_SRV_DIR"/"$REPO_NAME"
    _write_hook "$hook_file"
    chmod +x "$hook_file"
}

# MAIN
mirror_repo_setup
private_repo_setup

The problem is with $MIRROR_GITUSER. If we are using SSH authentication, then any user with push access can assume the role of $MIRROR_GITUSER. For security purposes, that user better have a restricted shell. Namely, git-shell so that they can still use git push and whatnot.

I see a few options:

Do additional setup to allow the git user to use the commands needed. Then the script should work as written.
Run the script as another user. You’d need to ensure proper SSH access and all that, but you’d have access to a shell with broader capabilities.
Do mirror_repo_setup by hand.

Number three is calling my name for now. I only have a few repositories. Although I’d much rather have a script than a document telling me which commands to run, it’s not worth the setup cost.

On the bright side, the private_repo_setup part is still valid.

From a 3-2-1 backup perspective, you should have three copies on two different media, with one offsite. Hosting my Git repositories outside my network satifisfies the “one offsite” criteria.↩︎