Testing Shell Scripts in a Sandbox

Kyle Bowman

I’m crafting a spellbook full of shell scripts. They more or less codify my setup steps for my various machines. By their very nature, these scripts involve state change. State change causes two problems:

I don’t want to apply a bunch of side-effects to my working setup.
The tests themselves should be reproducible. Stacking side effects seems like a sure way to unreliable tests. Testing on a single machine, I’d have to be annoyingly meticulous about cleaning up after tests.

To avoid these problems, my initial instinc was to copy the scripts into a container, run them, and check the results. Spoiler: That turned out to be a pretty good idea.

This post describes my thought process and some of the setup. But I eventually made a template repository with all the resources that you can simply write your tests then run them in a container with make test.

Considering Alternatives

Though I have significant reservations about using LLMs for a lot of tasks, they are really good for the brainstorming phase. They let you explore a lot of ideas in a very short time period where the stakes are relatively low. In this case, I prompted Qwen/QwQ-32-B (as hosted on Hugging Face), which is one of their smaller reasoning models.

I’m interested in testing shell scripts in a sandboxed environment. The shell scripts are mostly for installing and configuring software. Are containers a good approach of doing that?¹ If not, what other ideas can you suggest?

The model responded that containers are great, but also provided some alternatives approaches to achieve isolation:

Chroot + tmpfs
systmend-nspawn
Virtual Machines
User namespaces (without containers)
Shell-testing framework (Bats or shellspec)
tmpfs with sudo / root

This was a great list! I knew about most of them, but didn’t know about systemd-nspawn. I also hadn’t considered using Bats (Bash Automated Testing System), but now, it seemed essential. I think systemd-nspawn could work, but Docker/Podman has the advantage because I’ve already got a huge chunk of that learning curve behind me and I know it has a healthy ecosystem and I can easily find help.

Podman

These steps should work for any Debian-based distribution. For non-Debian distributions, change the package manager commadn. You might also need to look somewhere else for the podman configuration.

Get podman with apt install podman.

Pull and run a pre-built container to test that it works:

podman pull docker.io/library/http
podman run -dt -p 80:80 docker.io/library/http

If you open your web browser to localhost:80, you should see a stock “it works!” kind of page.

That test shows that we can pull and run a container from the docker registry. However, notice the docker.io/library/ prefix in the pull command. I want my podman Dockerfile’s to match Docker’s Dockerfiles. Docker’s Dockerfile’s don’t include the docker.io/libary prefix. Turns out, we can do the same by changing a configuration.

In /etc/containers/registries.conf, add the following line:

unqualified-search-registries = ["docker.io"]

Now, you can test a Dockerfile (or Containerfile as the podman community calls it²). This Dockerfile defines the same container that we ran before, but pins it to the 2.4 version whereas before the version was implied to be latest. (Those were two labels for the same version when I ran this.)

FROM httpd:2.4

Build and run the new container with the following commands:

# This command is run from the directory with the Dockerfile. Note the dot.
podman build -t my_image . 
podman run -dt -p 80:80 my_image

If that shows the same “it works” page, the test passes! You can now build your own images using base images from the Docker registry.

Regarding “Unqualified Search Registry”

It appears that “unqualified” is contrasted with fully qualified, or alternatively aliased. My understanding is that, when looking for a container that isn’t fully qualified, podman will search through unqualified list in order and there is a chance of someone editing that list to slip in a registry that contains a nefarious image that they want you to pull.

Red Hat recommends using fully qualified image names including registry, namespace, image name, and tag. When using short names, there is always an inherent risk of spoofing. Add registries that are trusted, that is, registries that do not allow unknown or anonymous users to create accounts with arbitrary names. For example, a user wants to pull the example container image from example.registry.com registry. If example.registry.com is not first in the search list, an attacker could place a different example image at a registry earlier in the search list. The user would accidentally pull and run the attacker image rather than the intended content. Red Hat Reference

Bats

Get Bats from your package manager: apt install bats.

Define tests with the @test decorator. The test passes or fails based on the exit code of the command, but there are lots of more elegant things that you can do with the framework.

# tests/mytest.bats
@test "description of test" {
  command_to_run
}

Run the test by using bats test/mytest.bats.

Another really handy fact is that you can define setup() and teardown() in your test file, which bats invokes per @test. Similarly, setup_file() and teardown_file() are run when the test file loads and ends.

More information can be foun din the following resources:

Conclusion

I ended up making a repository that abstracts away almost everything. (When I get GitWeb running, I’ll post a link.)

In a nutshell, the repository has a lib directory in which you define your shell scripts, a test directory, in which you define your tests. There’s also a Dockerfile and a Makefile. When you run make tests, (if needed) make builds the image, runs a container, mounts your repository, then runs bats against all files in your test repository. You can run make clean to clean up the container and image (though it won’t clean up the base image).

This project came together a lot quicker than I expected. My takeaway is that these tools work really well together.

In hindsight, I wonder if that was a leading question. Are LLMs susceptible to leading questions? I suspect yes.↩︎
The podman software doesn’t seem to care that I name the file Dockerfile.↩︎