This is my blog now

Moving to Infrastructure as Code — A Work in Progress

As alluded to before, the number of servers I manage recently outgrew my ability or rather willingness to maintain all of them manually so, naturally, the desire to introduce automation arose. I've worked with Ansible before and, also as mentioned before, I went and ported my setup into Ansible playbooks to manage everything.

The thing with Ansible is that it's horribly complex. There are at least a dozen ways to accomplish any given task and there's always ways to simplify, abstract or improve the playbooks you have so it hardly ever feels like you're actually done. But what I found much more annoying is how opaque playbook runs are by default.

Case in point: I wanted a simple playbook to run OS upgrades on all my systems but I didn't want to run them blindly, I wanted to see the available upgrades first. However, Ansible doesn't print the output of apt update to stdout by default, you have to configure a different callback for displaying output. This is far from obvious.

Another road block I hit was when managing all the containers I run. Ansible has a module for docker compose files but when you run a playbook containing one, you don't really get any information about what is changing. Being used to Terraform, I wanted something like a diff to show me what was actually happening behind the scenes but Ansible doesn't really seem to be made for this. By now there is a --diff flag that does this but it feels a bit like an afterthought. As a matter of fact, I think it was one.

So I decided to have a look at alternatives. I briefly considered Pulumi but then I realized that I could actually use Terraform for it.

Side note: in practice I am using OpenTofu, the FOSS fork of Terraform but I will be using "Terraform" as a substitute.

Terraform needs so-called providers which are the glue between your code and the actual API you want to talk to. In my case, I found an actually usable provider for docker and one or two somewhat decent options for a few Linux-based tasks so I decided to go ahead and use these to model my setup. The biggest draw for me is that I'm much more familiar with the tooling since I use it at work every day and also: I like Terraform's workflow where it first shows you a diff of the planned changes which you can then review and choose whether to apply or not. Given that I host a bunch of stuff I and some of my family members have come to rely on, I wanted to guarantee a modicum of stabilty and availability.

And so I went and ported all the Ansible code I had for setting up the various services I run to Terraform. Very quickly two things became clear:

The first point did not turn out to me a big hindrance but it's relevant nonetheless. The thing is that Docker comes with various default settings for containers/services that are deployed via Docker Compose which. The Terraform provider for Docker, on the other hand sets different default so porting existing compose files to Terraform is not actually hard but when you start planning them surprising things can happen and even end up breaking existing setups.

The second issue turned out to be a much bigger hassle. In order to run Docker services there are various files that need to be in place. Files with settings, environment variables and of course the compose file itself. These need to be copied over with the proper file permissions, a container directory created, if necessary, some of the files contain secrets which needs to be handled. And this is the kind of thing that Terraform is simply bad at. There are ways to do this but they're either quite immature, brittle, unreliable and/or too restricted. Incidentally, this kind of thing is something Ansible excels at which is hardly surprising since this kind of thing is why it exists in the first place.

So after experimenting with it for a while, I came full circle and went back to Ansible. I spent more time tuning my playbooks, pruning unnecessary stuff, and generally trying to make things more ergonomic to run for me.

After a while I arrived at a setup that I am reasonably happy with. The features include:

I've been using this setup for a few months now and it works fairly well. The playbook managing my MagicMirror instance is a little wonky but since this is a NodeJS project with weird Node-specific process management that's probably not surprising.

Good automation is difficult but in my opinion is worth it. And at some point, unavoidable.

Software, Self-hosting, Linux, Privacy

⬅ Previous post
A New Hope (where Hope == e-reader)

Next post ➡
But What About AI?