nixos-in-production
nixos-in-production
Gabriella Gonzalez
This book is for sale at http://leanpub.com/nixos-in-production
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools
and many iterations to get reader feedback, pivot until you have the right book and build
traction once you do.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
• What real-world use cases does NixOS address better than the alternatives?
• What does a mature NixOS enterprise look like?
• How do I smoothly migrate an organization to adopt NixOS?
• What potential pitfalls of NixOS should I be mindful to avoid?
• How can I effectively support and debug NixOS when things go wrong?
I’m writing this book because I cultivated years of professional experience doing all of the above,
back when no such resource existed. I learned NixOS the hard way and I’m writing this book so
that you don’t have to make the same mistakes I did.
Currently, most educational resources for NixOS (including the NixOS manual) are written with
desktop users in mind, whereas I view NixOS as far better suited as a production operating
system. This book attempts to fill that documentation gap by catering to professional NixOS
users instead of hobbyists.
Continue reading on if you want to use NixOS “for real” and build a career around one of the
hottest emerging DevOps technologies. This book will improve your NixOS proficiency and
outline a path towards using NixOS to improve your organization’s operational maturity and
reliability.
2. What is NixOS for?
Some NixOS users might try to “convert” others to NixOS using a pitch that goes something like
this:
NixOS is a Linux distribution built on top of the Nix package manager. It uses
declarative configuration and allows reliable system upgrades.
Source: Wikipedia - NixOS¹
This sort of feature-oriented description explains what NixOS does, but does not quite explain
what NixOS is for. What sort of useful things can you do with NixOS? When is NixOS the best
solution? What types of projects, teams, or organizations should prefer using NixOS over other
the alternatives?
Come to think of it, what are the alternatives? Is NixOS supposed to replace Debian? Or Docker?
Or Ansible? Or Vagrant? Where does NixOS fit in within the modern software landscape?
In this chapter I’ll help you better understand when you should recommend NixOS to others
and (just as important!) when you should gently nudge people away from NixOS. Hopefully this
chapter will improve your overall understanding of NixOS’s “niche”.
• NixOS expects users to be developers who are more hands-on with their system
NixOS does not come preinstalled on most computers and the installation guide assumes
quite a bit of technical proficiency. For example, NixOS is typically configured via text files
and upgrades are issued from the command line.
• The NixOS user experience differs from what most desktop users expect
Most desktop users (especially non-technical users) expect to install packages by either
downloading the package from the publisher’s web page or by visiting an “app store” of
some sort. They don’t expect to modify a text configuration file in order to install package.
¹https://en.wikipedia.org/wiki/NixOS
What is NixOS for? 3
However, the above limitations don’t apply when using NixOS as a server
operating system:
• End users can more easily self-serve if they stray from the beaten path
Server-oriented software is more likely to be open source than desktop-oriented software
and therefore easier to package.
NixOS is better suited for SaaS than on-prem deployments, because NixOS fares worse in
restricted network environments where network access is limited or unavailable.
You can still deploy NixOS for on-prem deployments and I will cover that in a later chapter, but
you will have a much better time using NixOS for SaaS deployments.
Virtualization
You might be interested in how NixOS fares with respect to virtualization or containers, so I’ll
break things down into these four potential use cases:
• Application containers
Containers technically do not need to run an entire operating system and can instead run
a single process (e.g. one service). You can do this using Nixpkgs, which provides support
for building application containers.
So which use cases are NixOS/Nixpkgs well-suited for? If I had to rank these deployment models
then my preference (in descending order) would be:
If your deployment model matches that outline then NixOS is not only a safe choice, but likely
the best choice! You will be in great company if you use NixOS in this way.
You can still use NixOS in other capacities, but the further you depart from the above “killer app”
the more you will need to roll up your sleeves.
DevOps is more of a set of cultural practices than a team, but some organizations
explicitly create a DevOps team or hire engineers for their DevOps expertise in order to
support tools (like NixOS) that enable those cultural practices.
You can use NixOS in conjunction with Docker containers since NixOS supports declaratively
launching containers, but you probably want to avoid buying further into the broader Docker
ecosystem if you use NixOS. You don’t want to be in a situation where your engineering
organization fragments and does everything in two different ways: the NixOS way and the
Docker way.
For those familiar with the Gentoo Linux distribution, NixOS is like Gentoo, but for
Docker⁶. Similar to Gentoo, NixOS is an operating system that provides unparalleled
control over the machine while targeting use cases and workflows similar to the Docker
ecosystem.
Here I’ll do my best to answer those questions so that you can get a better idea of what you
would be signing up for.
I say “at most one command” because some activities (like continuous deployment) should ideally
require no human intervention at all. However, activities that do require human intervention
should in principle be compressible into a single Nix command.
I can explain this by providing an example of a development workflow that disregards this master
cue:
Suppose that you want to test your local project’s changes within the context of some larger
system at work (i.e. an integration test¹). Your organization’s process for testing your code might
hypothetically look like this:
Now what if I told you that the entire integration testing process from start to finish could be:
¹https://en.wikipedia.org/wiki/Integration_testing
The big picture 9
In other words:
Some of these potential improvements are not specific to the Nix ecosystem. After all, you could
attempt to create a script that automates the more painstaking multi-step process. However, you
would likely need to reinvent large portions of the Nix ecosystem for this automation to be
sufficiently robust and efficient. For example:
• Do you generate unique labels for build products to isolate parallel workflows?
In the best case scenario, you label build products by a hash of their dependencies and
you’ve reinvented the Nix store’s hashing scheme. In the worst case scenario you’re doing
something less accurate (e.g. using timestamps in the labels instead of hashes).
• Do you have a custom script that updates references to these build products?
This would be reinventing Nix’s language support for automatically updating dependency
references.
You can save yourself a lot of headaches by taking time to learn and use the Nix ecosystem as
idiomatically as possible instead of learning these lessons the hard way.
The big picture 10
GitOps
NixOS exemplifies the Infrastructure as Code (IaC)² paradigm, meaning that every aspect of your
organization (including hardware/systems/software) is stored in code or configuration files that
are the source of truth for how everything is built. In particular, you don’t make undocumented
changes to your infrastructure that cause it to diverge from what is recorded within those files.
This book will espouse a specific flavor of Infrastructure of Code known as GitOps³ where:
DevOps
NixOS also exemplifies the DevOps⁴ principle of breaking down boundaries between software
developers (“Dev”) and operations (“Ops”). Specifically, NixOS goes further in this regard than
most other tools by unifying both software configuration and system configuration underneath
the NixOS option system. These NixOS options fall into roughly three categories:
• Systems configuration
These are options that are mostly interesting to operations engineers, such as:
– log rotation policies
– kernel boot parameters
– disk encryption settings
• Software configuration
These are options that are mostly interesting to software engineers, such as:
– Patches
– Command-line arguments
– Environment variables
In extreme cases, you can even embed non-Nix code inside of Nix and do “pure software
development”. In other words, you can author inline code written within another language inside
of a NixOS configuration file. I’ll include one example of this later on in the “Our first web server”
chapter.
Architecture
A NixOS-centric architecture tends to have the following key pieces of infrastructure:
• Version control
If you’re going to use GitOps then you had better use git! More specifically, you’ll likely use
a git hosting provider like GitHub⁵ or GitLab⁶ which supports pull requests and continuous
integration.
Most companies these days use version control, so this is not a surprising requirement.
• Product servers
These are the NixOS servers that actually host your product-related services.
• A cache
In simpler setups the “hub” can double as a cache, but as you grow you will likely want to
upload build products to a dedicated cache.
⁵https://github.com/
⁶https://about.gitlab.com/
The big picture 12
Moreover, you will either need a cloud platform (e.g. AWS⁷) or data center for hosting these
machines. In this book we’ll primarily focus on hosting infrastructure on AWS.
These are not the only components you will need to build out your product, but these should be
the only components necessary to support DevOps workflows, including continuous integration
and continuous deployment.
Notably absent from the above list are:
• Container-specific infrastructure
A NixOS-centric architecture already mitigates some of the need for containerizing services,
but the architecture doesn’t change much even if you do use containers, because containers
can be built by Nixpkgs, distributed via the cache, and declaratively deployed to any NixOS
machine.
• Programming-language-specific infrastructure
If Nixpkgs supports a given language then we require no additional infrastructure to support
building and deploying that language. However, we might still host language-specific
amenities on our utility server, such as generated documentation.
• Continuous-deployment services
NixOS provides out-of-the-box services that we can use for continuous deployment, which
we will cover in a later chapter.
Scope
So far I’ve explained NixOS in high-level terms, but you might prefer a more down-to-earth
picture of the day-to-day requirements and responsibilities for a professional NixOS user.
To that end, here is a checklist that will summarize what you would need to understand in order
to effectively introduce and support NixOS within an organization:
• Infrastructure setup
– Continuous integration
– Builders
– Caching
• Development
– NixOS module system
– Project organization
– NixOS best practices
– Quality controls
• Testing
– Running virtual machines
– Automated testing
• Deployment
– Provisioning a new system
– Upgrading a system
– Dealing with restricted networks
• System administration
– Infrastructure as code
– Disk management
– Filesystem
– Networking
– Users and authentication
– Limits and quotas
• Security
– System hardening
– Patching dependencies
• Diagnostics and Debugging
– Nix failures
– Test failures
– Production failures
– Useful references
• Fielding inquiries
– System settings
The big picture 14
– Licenses
– Vulnerabilities
• Non-NixOS Integrations
– Images
– Containers
This book will cover all of the above topics and more, although they will not necessarily be
grouped or organized in that exact order.
4. Setting up your development
environment
I’d like you to be able to follow along with the examples in this book, so this chapter provides
a quick setup guide to bootstrap from nothing to deploying a blank NixOS system that you can
use for experimentation.
Installing Nix
In order to follow along with this book you will need the following requirements:
You’ve likely already installed Nix if you’re reading this book, but I’ll still cover how to do this
because I have a few tips to share that can help you author a more reliable installation script for
your colleagues.
Needless to say, if you or any of your colleagues are using NixOS as your development operating
system then you don’t need to install Nix and you can skip to the Running a NixOS Virtual
Machine section below.
Default installation
If you go to the download page for Nix² it will tell you to run something similar to this:
$ sh <(curl --location https://nixos.org/nix/install)
Throughout this book I’ll use consistently long option names instead of short names (e.g.
--location instead of -L), for two reasons:
For example, tar --extract --file is clearer and a better mnemonic than tar xf.
You may freely use shorter option names if you prefer, though, but I still highly
recommend using long option names at least for non-interactive scripts.
¹https://nixos.org/manual/nix/stable/command-ref/conf-file.html
²https://nixos.org/download.html
Setting up your development environment 16
Depending on your platform the download instructions might also tell you to pass the --daemon
or --no-daemon option to the installation script to specify a single-user or multi-user installation.
For simplicity, the instructions in this chapter will omit the --daemon / --no-daemon flag, but
keep in mind the following platform-specific advice:
$ VERSION='2.11.0'
$ URL="https://releases.nixos.org/nix/nix-${VERSION}/install"
$ sh <(curl --location "${URL}")
… and you can find the full set of available releases by visiting the release file server³.
Feel free to use a Nix version newer than 2.11.0 if you want. The above example
installation script only pins the version 2.11.0 because that’s what happened to be the
latest stable version at the time of this writing. That’s also the Nix version that the
examples from this book have been tested against.
The only really important thing is that everyone within your organization uses the same
version of Nix, if you want to minimize your support burden.
However, there are a few more options that the script accepts that we’re going to make good use
of, and we can list those options by supplying --help to the script:
$ VERSION='2.11.0'
$ URL="https://releases.nixos.org/nix/nix-${VERSION}/install"
$ sh <(curl --location "${URL}") --help
³https://releases.nixos.org/?prefix=nix/
Setting up your development environment 17
--daemon: Installs and configures a background daemon that manages the store,
providing multi-user support and better isolation for local builds.
Both for security and reproducibility, this method is recommended if
supported on your platform.
See https://nixos.org/manual/nix/stable/installation/installing-binary.html#multi-user-i\
nstallation
--no-daemon: Simple, single-user installation that does not require root and is
trivial to uninstall.
(default)
You might wonder if you can use the --tarball-url-prefix option for distributing a
custom build of Nix, but that’s not what this option is for. You can only use this option
to download Nix from a different location (e.g. an internal mirror), because the new
download still has to match the same integrity check as the old download.
Don’t worry, though; there still is a way to distribute a custom build of Nix, and we’ll
cover that in a later chapter.
• --nix-extra-conf-file
This lets you extend the installed nix.conf if you want to make sure that all users within
your organization share the same settings.
• --no-channel-add
You can (and should) enable this option within a professional organization to disable the
preinstallation of any channels.
These two options are crucial because we are going to use them to systematically replace Nix
channels with flakes.
Setting up your development environment 18
Nix channels are a trap and I treat them as a legacy Nix feature poorly suited for
professional development, despite how ingrained they are in the Nix ecosystem.
The issue with channels is that they essentially introduce impurity into your builds by
depending on the NIX_PATH and there aren’t great solutions for enforcing that every Nix
user or every machine within your organization has the exact same NIX_PATH.
Moreover, Nix now supports flakes, which you can think of as a more modern alternative
to channels. Familiarity with flakes is not a precondition to reading this book, though:
I’ll teach you what you need to know.
$ VERSION='2.11.0'
$ URL="https://releases.nixos.org/nix/nix-${VERSION}/install"
$ CONFIGURATION="
extra-experimental-features = nix-command flakes repl-flake
extra-trusted-users = ${USER}
"
$ sh <(curl --location "${URL}") \
--no-channel-add \
--nix-extra-conf-file <(<<< "${CONFIGURATION}")
The prior script only works if your shell is Bash or Zsh and all shell commands
throughout this book assume the use of one of those two shells.
For example, the above command uses support for process substitution (which is not
available in a POSIX shell environment) because otherwise we’d have to create a
temporary file to store the CONFIGURATION and clean up the temporary file afterwards
(which is tricky to do 100% reliably). Process substitution is also more reliable than a
temporary file because it happens entirely in memory and the intermediate result can’t
be accidentally deleted.
macOS-specific instructions
If you are using macOS, then follow the instructions in the Nixpkgs manual⁴ to set up a local
Linux builder. We’ll need this builder to create other NixOS machines, since they require Linux
build products.
In particular, you will need to leave that builder running in the background while following the
remaining examples in this chapter. In other words, in one terminal window you will need to
run:
… and you will need that to be running whenever you need to build a NixOS system. However,
you can shut down the builder when you’re not using it by giving the builder the shutdown now
command.
The nix run nixpkgs#darwin.builder command is not enough to set up Linux builds
on macOS. Read and follow the full set of instructions from the Nixpkgs manual linked
above.
If you are using Linux (including NixOS or the Windows Subsystem for Linux) you can skip to
the next step.
Platform-independent instructions
Run the following command to generate your first project:
{ inputs = {
flake-utils.url = "github:numtide/flake-utils/v1.0.0";
nixpkgs.url = "github:NixOS/nixpkgs/f1a49e20e1b4a7eeb43d73d60bae5be84a1e7610";
};
# https://github.com/utmapp/UTM/issues/2353
networking.nameservers = lib.mkIf pkgs.stdenv.isDarwin [ "8.8.8.8" ];
⁴https://nixos.org/manual/nixpkgs/unstable/#sec-darwin-builder
Setting up your development environment 20
virtualisation = {
graphics = false;
machine = nixpkgs.lib.nixosSystem {
system = builtins.replaceStrings [ "darwin" ] [ "linux" ] system;
${machine.config.system.build.vm}/bin/run-nixos-vm
'';
in
{ packages = { inherit machine; };
apps.default = {
type = "app";
program = "${program}";
};
}
);
}
# module.nix
{ services.getty.autologinUser = "root";
}
Then run this command within the same directory to run our test virtual machine:
$ nix run
…
[root@nixos:~]#
You can then shut down the virtual machine by entering shutdown now.
Setting up your development environment 21
If you’re unable to shut down the machine gracefully for any reason you can shut down
the machine non-gracefully by typing Ctrl-a + c to open the qemu prompt and then
entering quit to exit.
If you were able to successfully launch and shut down the virtual machine then you’re ready to
follow along with the remaining examples throughout this book. If you see an example in this
book that begins with this line:
# module.nix
… then that means that I want you to save that example code to the module.nix file and then
restart the virtual machine by running nix run.
For example, let’s test that right now; save the following file to module.nix:
# module.nix
{ services.getty.autologinUser = "root";
services.postgresql.enable = true;
}
… then start the virtual machine and log into the machine. As the root user, run:
postgres=#
Hello, world!
We’ll begin from the template project from “Setting up your development environment”. You can
either begin from the previous chapter by running the following command (if you haven’t done
so already):
… or if you want to skip straight to the final result at the end of this chapter you can run:
Let’s modify module.nix to specify a machine that serves a simple static “Hello, world!” page on
http://localhost:
# module.nix
{ pkgs, ... }:
{ services = {
getty.autologinUser = "root";
nginx = {
enable = true;
virtualHosts.localhost.locations."/" = {
index = "index.html";
networking.firewall.allowedTCPPorts = [ 80 ];
virtualisation.forwardPorts = [
{ from = "host"; guest.port = 80; host.port = 8080; }
];
system.stateVersion = "22.11";
}
You always want to specify a system state version that matches the starting version of
Nixpkgs for that machine and never change it afterwards. In other words, even if you
upgrade Nixpkgs later on you would keep the state version the same.
Nixpkgs uses the state version to migrate your NixOS system because in order to migrate
your system each migration needs to know where your system started from.
Two common mistakes NixOS users sometimes make are:
If you deploy that using nix run you can open the web page in your browser by visiting
http://localhost:8080¹ which should display the following contents:
Hello, world!
In general I don’t recommend testing things by hand like this. Remember the “master
cue”:
In a later chapter we’ll cover how to automate this sort of testing using NixOS’s support
for integration tests. These tests will also take care of starting up and tearing down the
virtual machine for you so that you don’t have to do that by hand either.
DevOps
The previous example illustrates how NixOS promotes DevOps on a small scale. If the inline
web page represents the software development half of the project (the “Dev”) and the nginx
configuration represents the operational half of the project (the “Ops”) then we can in principle
store both the “Dev” and the “Ops” halves of our project within the same file. As an extreme
example, we can even template the web page with system configuration options!
# module.nix
{ services = {
getty.autologinUser = "root";
nginx = {
enable = true;
virtualHosts.localhost.locations."/" = {
index = "index.html";
<ul>
${
let
renderPort = port: "<li>${toString port}</li>\n";
in
¹http://localhost:8080
Our first web server 25
networking.firewall.allowedTCPPorts = [ 80 ];
virtualisation.forwardPorts = [
{ from = "host"; guest.port = 80; host.port = 8080; }
];
system.stateVersion = "22.11";
}
If you restart the machine and refresh http://localhost:8080² the page should now display:
• 80
There are less roundabout ways to query our system’s configuration that don’t involve
serving a web page. For example, using the same flake.nix file we can more directly
query the open ports using:
TODO list
Now we’re going to create the first prototype of a toy web application: a TODO list implemented
entirely in client-side JavaScript (later on we’ll add a backend service).
Create a subdirectory named www within your current directory:
$ mkdir www
… and then save a file named index.html with the following contents underneath that subdirec-
tory:
²http://localhost:8080
Our first web server 26
<html>
<body>
<button id='add'>+</button>
</body>
<script>
let add = document.getElementById('add');
function newTask() {
let subtract = document.createElement('button');
subtract.textContent = "-";
let input = document.createElement('input');
input.setAttribute('type', 'text');
let div = document.createElement('div');
div.replaceChildren(subtract, input);
function remove() {
div.replaceChildren();
div.remove();
}
subtract.addEventListener('click', remove);
add.before(div);
}
add.addEventListener('click', newTask);
</script>
</html>
In other words, the above file should be located at www/index.html relative to the directory
containing your module.nix file.
Now save the following NixOS configuration to module.nix:
# module.nix
{ services = {
getty.autologinUser = "root";
nginx = {
enable = true;
virtualHosts.localhost.locations."/" = {
index = "index.html";
root = ./www;
};
};
};
networking.firewall.allowedTCPPorts = [ 80 ];
virtualisation.forwardPorts = [
{ from = "host"; guest.port = 80; host.port = 8080; }
];
system.stateVersion = "22.11";
}
If you restart the virtual machine and refresh the web page you’ll see a single + button:
Our first web server 27
Each time you click the + button it will add a TODO list item consisting of:
virtualisation.sharedDirectories.www = {
source = "$WWW";
target = "/var/www";
};
virtualHosts.localhost.locations."/" = {
index = "index.html";
root = "/var/www";
};
Finally, restart the machine, except with a slightly modified version of our original nix run
command:
Now, we only need to refresh the page to view any changes we make to index.html and we no
longer need to restart the virtual machine.
Exercise: Add a “TODO list” heading (i.e. <h1>TODO list</h1>)to the web page and refresh the
page to confirm that your changes took effect.
6. NixOS option definitions
By this point in the book you may have copied and pasted some NixOS code, but perhaps you
don’t fully understand what is going on, especially if you’re not an experienced NixOS user. This
chapter will slow down and help you solidify your understanding of the NixOS module system
so that you can improve your ability to read, author, and debug modules.
Throughout this book I’ll consistently use the following terminology to avoid ambiguity:
In this chapter and the next chapter we’ll focus mostly on option definitions and later
on we’ll cover option declarations in more detail.
# Module arguments which our system can use to refer to its own configuration
{ config, lib, pkgs, ... }:
In other words, in the fully general case a NixOS module is a function whose output is an attribute
set with three attributes named imports, options, and config.
Nix supports data structures known “attribute sets” which are analogous to “maps” or
“records” in other programming languages.
To be precise, Nix uses the following terminology:
I’m explaining all of this because I’ll use the terms “attribute set”, “attribute”, and
“attribute path” consistently throughout the text to match Nix’s official terminology
(even though no other language uses those terms).
Syntactic sugar
All elements of a NixOS module are optional and NixOS supports “syntactic sugar” to simplify
several common cases. For example, you can omit the module arguments if you don’t use them:
{ imports = [
…
];
options = {
…
};
config = {
…
};
}
You can also omit any of the imports, options, or config attributes, too, like in this module,
which only imports other modules:
NixOS option definitions 30
{ imports = [
./physical.nix
./logical.nix
];
}
{ config = {
services = {
apache-kafka.enable = true;
zookeeper.enable = true;
};
};
}
Additionally, the NixOS module system provides special support for modules which only define
options by letting you elide the config attribute and promote the options defined within to the
“top level”. As an example, we can simplify the previous NixOS module to this:
{ services = {
apache-kafka.enable = true;
zookeeper.enable = true;
};
}
You might wonder if there should be some sort of coding style which specifies whether
people should include or omit these elements of a NixOS module. For example, perhaps
you might require that all elements are present, for consistency, even if they are empty
or unused.
My coding style for NixOS modules is:
The NixOS module system is a domain-specific language implemented within the Nix pro-
gramming language. Specifically, the NixOS module system is (mostly) implemented within the
lib/modules.nix file included in Nixpkgs¹. If you ever receive a stack trace related to the NixOS
module system you will often see functions from modules.nix show up in the stack trace, because
they are ordinary functions and not language features.
In fact, a NixOS module in isolation is essentially “inert” from the Nix language’s point of view.
For example, if you save the following NixOS module to a file named example.nix:
{ config = {
services.openssh.enable = true;
};
}
… and you evaluate that, the result will be the same, just without the syntactic sugar:
The Nix programming language provides “syntactic sugar” for compressing nested
attributes by chaining them using a dot (.). In other words, this Nix expression:
{ config = {
services.openssh.enable = true;
};
}
{ config = {
services = {
openssh = {
enable = true;
};
};
};
}
… and they are both also the same thing as this Nix expression:
{ config.services.openssh.enable = true; }
Note that this syntactic sugar is a feature of the Nix programming language, not the
NixOS module system. In other words, this feature works even for Nix expressions that
are not destined for use as NixOS modules.
¹https://github.com/NixOS/nixpkgs/blob/22.05/lib/modules.nix
NixOS option definitions 32
{ config, ... }:
{ config = {
services.apache-kafka.enable = config.services.zookeeper.enable;
};
}
… is just a function. If we save that to example.nix and then evaluate that the interpreter will
simply say that the file evaluates to a “lambda” (an anonymous function):
… although we can get a more useful result within the nix repl by calling our function on a
sample argument:
$ nix repl
…
nix-repl> example = import ./example.nix
nix-repl> :p output
{ config = { services = { apache-kafka = { enable = true; }; }; }; }
nix-repl> output.config.services.apache-kafka.enable
true
This illustrates that our NixOS module really is just a function whose input is an attribute set
and whose output is also an attribute set. There is nothing special about this function other than
it happens to be the same shape as what the NixOS module system accepts.
NixOS
So if NixOS modules are just pure functions or pure attribute sets, what turns those functions
or attribute sets into a useful operating system? In other words, what puts the “NixOS” in the
“NixOS module system”?
The answer is that this actually happens in two steps:
• All NixOS modules your system depends on are combined into a single, composite
attribute set
In other words all of the imports, options declarations, and config settings are fully
resolved, resulting in one giant attribute set. The code for combining these modules lives
in lib/modules.nix² in Nixpkgs.
²https://github.com/NixOS/nixpkgs/blob/22.05/lib/modules.nix
NixOS option definitions 33
• The final composite attribute set contains a special attribute that builds
the system
Specifically, there will be a config.system.build.toplevel attribute path which contains
a derivation you can use to build a runnable NixOS system. The top-level code for assem-
bling an operating system lives in nixos/modules/system/activation/top-level.nix³ in
Nixpkgs.
This will probably make more sense if we use the NixOS module system ourselves to create a
fake placeholder value that will stand in for a real operating system.
First, we’ll create our
own top-level.nix module that will include a fake
config.system.build.toplevel attribute path that is a string instead of a derivation for
building an operating system:
# top-level.nix
{ imports = [ ./other.nix ];
options = {
system.build.toplevel = lib.mkOption {
description = "A fake NixOS, modeled as a string";
type = lib.types.str;
};
};
config = {
system.build.toplevel =
"Fake NixOS - version ${config.system.nixos.release}";
};
}
{ lib, ... }:
{ options = {
system.nixos.release = lib.mkOption {
description = "The NixOS version";
type = lib.types.str;
};
};
config = {
system.nixos.release = "22.05";
};
}
We can then materialize the final composite attribute set like this:
³https://github.com/NixOS/nixpkgs/blob/22.05/nixos/modules/system/activation/top-level.nix
NixOS option definitions 34
nix-repl> :p result.config
{ system = { build = { toplevel = "Fake NixOS - version 22.05"; }; nixos = { release = "22.05"; }; }; }
nix-repl> result.config.system.build.toplevel
"Fake NixOS - version 22.05"
In other words, lib.evalModules is the magic function that combines all of our NixOS modules
into a composite attribute set.
NixOS essentially does the same thing as in the above example, except on a much larger scale.
Also, in a real NixOS system the final config.system.build.toplevel attribute path stores a
buildable derivation instead of a string.
Recursion
The NixOS module system lets modules refer to the final composite configuration using the
config function argument that is passed into every NixOS module. For example, this is how our
top-level.nix module was able to refer to the system.nixos.release option that was set in the
other.nix module:
{ …
config = {
system.build.toplevel =
"Fake NixOS - version ${config.system.nixos.release}";
# |
# … which we can use within our configuration
};
}
You’re not limited to referencing configuration values set in other NixOS modules; you can even
reference configuration values set within the same module. In other words, NixOS modules
support recursion⁴ where modules can refer to themselves.
As a concrete example of recursion, we can safely merge the other.nix module into the top-
level.nix module:
⁴https://en.wikipedia.org/wiki/Recursion
NixOS option definitions 35
{ options = {
system.build.toplevel = lib.mkOption {
description = "A fake NixOS, modeled as a string";
type = lib.types.str;
};
system.nixos.release = lib.mkOption {
description = "The NixOS version";
type = lib.types.str;
};
};
config = {
system.build.toplevel =
"Fake NixOS - version ${config.system.nixos.release}";
system.nixos.release = "22.05";
};
}
… and this would still work, even though this module now refers to its own configuration values.
The Nix interpreter won’t go into an infinite loop because the recursion is still well-founded.
We can better understand why this recursion is well-founded by simulating how
lib.evalModules works by hand. Conceptually what lib.evalModules does is:
We’ll walk through this by performing the same steps as lib.evalModules. First, to simplify
things we’ll consolidate the prior example into a single flake that we can evaluate as we go:
{ inputs.nixpkgs.url = "github:NixOS/nixpkgs/22.05";
⁵https://en.wikipedia.org/wiki/Fixed_point_(mathematics)
NixOS option definitions 36
topLevel =
{ config, lib, ... }:
{ imports = [ other ];
options.system.build.toplevel = lib.mkOption {
description = "A fake NixOS, modeled as a string";
type = lib.types.str;
};
config.system.build.toplevel =
"Fake NixOS - version ${config.system.nixos.release}";
};
in
nixpkgs.lib.evalModules { modules = [ topLevel ]; };
}
Various nix commands (like nix eval) take a flake reference as an argument which has
the form:
${URI}#${ATTRIBUTE_PATH}
In the previous example, the URI was ./evalModules (a file path in this case) and the
ATTRIBUTE_PATH was config.system.build.toplevel.
However, if you use zsh as your shell with EXTENDED_GLOB glob support (i.e. setopt
The
extended_glob) then zsh interprets # as a special character. This is why all of the
examples from this book quote the flake reference as a precaution, but if you’re not
using zsh or its extended globbing support then you can remove the quotes, like this:
first thing that lib.evalModules does is to merge the other module into the topLevel module,
which we will simulate by hand by performing the same merge ourselves:
NixOS option definitions 37
{ inputs.nixpkgs.url = "github:NixOS/nixpkgs/22.05";
in
nixpkgs.lib.evalModules { modules = [ topLevel ]; };
}
After that we compute the fixed point of our module by passing the module’s output as its own
input, the same way that evalModules would:
{ inputs.nixpkgs.url = "github:NixOS/nixpkgs/22.05";
result = topLevel {
inherit (result) config options;
inherit (nixpkgs) lib;
};
in
result;
}
NixOS option definitions 38
This walkthrough grossly oversimplifies what evalModules does. For starters, we’ve
completely ignored how evalModules uses the options declarations to:
The last step is that when nix eval accesses the config.system.build.toplevel field of the
result, the Nix interpreter conceptually performs the following substitutions:
result.config.system.build.toplevel
So even though our NixOS module is defined recursively in terms of itself, that recursion is still
well-founded and produces an actual result.
7. Advanced option definitions
NixOS option definitions are actually much more sophisticated than the previous chapter let on
and in this chapter we’ll cover some common tricks and pitfalls.
Make sure that you followed the instructions from the “Setting up your development environ-
ment” chapter if you would like to test the examples in this chapter.
Imports
The NixOS module system lets you import other modules by their path, which merges their
option declarations and option definitions with the current module. But, did you know that the
elements of an imports list don’t have to be paths?
You can put inline NixOS configurations in the imports list, like these:
{ imports = [
{ services.openssh.enable = true; }
{ services.getty.autologinUser = "root"; }
];
}
… and they will behave as if you had imported files with the same contents as those inline
configurations.
In fact, anything that is a valid NixOS module can go in the import list, including NixOS modules
that are functions:
{ imports = [
{ services.openssh.enable = true; }
I will make use of this trick in a few examples below, so that we can simulate modules importing
other modules within a single file.
lib utilities
Nixpkgs provides several utility functions for NixOS modules that are stored underneath the
“lib” hierarchy, and you can find the source code for those functions in lib/modules.nix¹.
¹https://github.com/NixOS/nixpkgs/blob/22.05/lib/modules.nix
Advanced option definitions 41
If you want to become a NixOS module system expert, take the time to read and
understand all of the code in lib/modules.nix.
Remember that the NixOS module system is implemented as a domain-specific language
in Nix and lib/modules.nix contains the implementation of that domain-specific
language, so if you understand everything in that file then you understand essentially
all that there is to know about how the NixOS module system works under the hood.
That said, this chapter will still try to explain things enough so that you don’t have to
read through that code.
You do not need to use or understand all of the functions in lib/modules.nix, but you do need
to familiarize yourself with the following four primitive functions:
• lib.mkMerge
• lib.mkOverride
• lib.mkIf
• lib.mkOrder
By “primitive”, I mean that these functions cannot be implemented in terms of other functions.
They all hook into special behavior built into lib.evalModules.
mkMerge
The lib.mkMerge function merges a list of “configuration sets” into a single “configuration
set” (where “configuration set” means a potentially nested attribute set of configuration option
settings).
For example, the following NixOS module:
{ lib, ... }:
{ config = lib.mkMerge [
{ services.openssh.enable = true; }
{ services.getty.autologinUser = "root"; }
];
}
{ config = {
services.openssh.enable = true;
services.getty.autologinUser = "root";
};
}
Advanced option definitions 42
You might wonder whether you should merge modules using lib.mkMerge or merge
them using the imports list. After all, we could have also written the previous mkMerge
example as:
{ imports = [
{ services.openssh.enable = true; }
{ services.getty.autologinUser = "root"; }
];
}
… and that would have produced the same result. So which is better?
The short answer is: lib.mkMerge is usually what you want.
The long answer is that the main trade-off between imports and lib.mkMerge is:
• The imports section can merge NixOS modules that are functions
lib.mkMerge can only merge configuration sets and not functions.
The latter point is why you should typically prefer using lib.mkMerge.
Merging options
You can merge configuration sets that define same option multiple times, like this:
{ lib, ... }:
{ config = lib.mkMerge [
{ networking.firewall.allowedTCPPorts = [ 80 ]; }
{ networking.firewall.allowedTCPPorts = [ 443 ]; }
];
}
… and the outcome of merging two identical attribute paths depends on the option’s “type”.
For example, the networking.firewall.allowedTCPPorts option’s type is:
If you specify a list-valued option twice, the lists are combined, so the above example reduces to
this:
Advanced option definitions 43
{ lib, ... }:
{ config = lib.mkMerge [
{ networking.firewall.allowedTCPPorts = [ 80 443 ]; }
];
}
… and we can even prove that by querying the final value of the option from the command line:
However, you might find the nix repl more convenient if you prefer to interactively browse the
available options. Run this command:
… which will load your NixOS system into the REPL and now you can use tab-completion to
explore what is available:
nix-repl> config.<TAB>
config.appstream config.nix
config.assertions config.nixops
…
nix-repl> config.networking.<TAB>
config.networking.bonds
config.networking.bridges
…
nix-repl> config.networking.firewall.<TAB>
config.networking.firewall.allowPing
config.networking.firewall.allowedTCPPortRanges
…
nix-repl> config.networking.firewall.allowedTCPPorts
[ 80 443 ]
Exercise: Try to save the following NixOS module to module.nix, which specifies the
same option twice without using lib.mkMerge:
{ lib, ... }:
{ config = {
networking.firewall.allowedTCPPorts = [ 80 ];
networking.firewall.allowedTCPPorts = [ 443 ];
};
}
This will fail to deploy. Do you understand why? Specifically, is the failure a limitation
of the NixOS module system or the Nix programming language?
You can also nest lib.mkMerge underneath an attribute. For example, this:
Advanced option definitions 44
{ config = lib.mkMerge [
{ networking.firewall.allowedTCPPorts = [ 80 ]; }
{ networking.firewall.allowedTCPPorts = [ 443 ]; }
];
}
{ config.networking = lib.mkMerge [
{ firewall.allowedTCPPorts = [ 80 ]; }
{ firewall.allowedTCPPorts = [ 443 ]; }
];
}
{ config.networking.firewall = lib.mkMerge [
{ allowedTCPPorts = [ 80 ]; }
{ allowedTCPPorts = [ 443 ]; }
];
}
{ config.networking.firewall.allowedTCPPorts = [ 80 443 ]; }
Conflicts
Duplicate options cannot necessarily always be merged. For example, if you merge two
configuration sets that disagree on whether to enable a service:
{ lib, ... }:
{ config = {
services.openssh.enable = lib.mkMerge [ true false ];
};
}
This is because services.openssh.enable is declared to have a boolean type, and you can only
merge multiple boolean values if all occurrences agree. You can verify this yourself by changing
both occurrences to true, which will fix the error.
As a general rule of thumb:
• Most complex option types will successfully merge in the obvious way
e.g. lists will be concatenated and attribute sets will be combined.
The most common exception to this rule of thumb is the “lines” type (lib.types.lines), which
is a string option type that you can define multiple times. services.zookeeper.extraConf is an
example of one such option that has this type:
{ lib, ... }:
{ config = {
services.zookeeper = {
enable = true;
… and merging multiple occurrences of that option concatenates them as lines by inserting an
intervening newline character:
mkOverride
The lib.mkOverride function specifies the “priority” of an option definition, which comes in
handy if you want to override a configuration value that another NixOS module already defined.
This most commonly comes up when we need to override an option that was already defined by
one of our dependencies (typically a NixOS module provided by Nixpkgs). One example would
be overriding the restart frequency of nginx:
Advanced option definitions 46
{ config = {
services.nginx.enable = true;
systemd.services.nginx.serviceConfig.RestartSec = "5s";
};
}
The problem is that when we enable nginx that automatically defines a whole bunch of other
NixOS options, including systemd.services.nginx.serviceConfig.RestartSec². This option is
a scalar string option that disallows multiple distinct values because the NixOS module system
by default has no way to known which one to pick to resolve the conflict.
However, we can use mkOverride to annotate our value with a higher priority so that it overrides
the other conflicting definition:
{ lib, ... }:
{ config = {
services.nginx.enable = true;
… and now that works, since we specified a new priority of 50 that takes priority over the default
priority of 100. There is also a pre-existing utility named lib.mkForce which sets the priority to
50, so we could have also used that instead:
{ lib, ... }:
{ config = {
services.nginx.enable = true;
²https://github.com/NixOS/nixpkgs/blob/nixos-22.05/nixos/modules/services/web-servers/nginx/default.nix#L890
Advanced option definitions 47
{ lib, ... }:
{ config = {
services.nginx.enable = true;
That is not equivalent, because it overrides not only the RestartSec attribute, but also all
other attributes underneath the serviceConfig attribute (like Restart, User, and Group,
all of which are now gone).
You always want to narrow your use of lib.mkForce as much as possible to protect
against this common mistake.
The default priority is 100 and lower numeric values actually represent higher priority. In other
words, an option definition with a priority of 50 takes precedence over an option definition with
a priority of 100.
Yes, the NixOS module system confusingly uses lower numbers to indicate higher priorities, but
in practice you will rarely see explicit numeric priorities. Instead, people tend to use derived
utilities like lib.mkForce or lib.mkDefault which select the appropriate numeric priority for
you.
In extreme cases you might still need to specify an explicit numeric priority. The most common
example is when one of your dependencies already define an option using lib.mkForce and
you need to override that. In that scenario you could use lib.mkOverride 49, which would take
precedence over lib.mkForce
{ lib, ... }:
{ config = {
services.nginx.enable = true;
systemd.services.nginx.serviceConfig.RestartSec = lib.mkMerge [
(lib.mkForce "5s")
(lib.mkOverride 49 "3s")
];
};
}
The default values for options also have a priority, which is priority 1500 and there’s a
lib.mkOptionDefault that sets a configuration value to that same priority.
That means that a NixOS module like this:
Advanced option definitions 48
{ lib, ... }:
{ options.foo = lib.mkOption {
default = 1;
};
}
{ lib, ... }:
{ options.foo = lib.mkOption { };
config.foo = lib.mkOptionDefault 1;
}
However, you will more commonly use lib.mkDefault which defines a configuration option
with priority 1000. Typically you’ll use lib.mkDefault if you want to override the default value
of an option, while still allowing a downstream user to override the option yet again at the
normal priority (100).
mkIf
mkIf is far-and-away the most widely used NixOS module primitive, because you can use mkIf
to selectively enable certain options based on the value of another option.
An extremely common idiom from Nixpkgs is to use mkIf in conjunction with an enable option,
like this:
# module.nix
let
# Pretend that this came from another file
cowsay =
{ config, lib, pkgs, ... }:
{ options.services.cowsay = {
enable = lib.mkEnableOption "cowsay";
greeting = lib.mkOption {
description = "The phrase the cow will greet you with";
type = lib.types.str;
};
};
}
in
{ imports = [ cowsay ];
config = {
services.cowsay.enable = true;
services.getty.autologinUser = "root";
};
}
If you launch the above NixOS configuration you should be able to verify that the cowsay service
is running like this:
You might wonder why we need a mkIf primitive at all. Couldn’t we use an if expression like
this instead?
{ …
The most important reason why this doesn’t work is because it triggers an infinite loop:
Advanced option definitions 50
at /nix/store/vgicc88fhmlh7mwik7gqzzm2jyfva9l9-source/lib/modules.nix:259:21:
… and the reason why lib.mkIf doesn’t share the same problem is because evalModules pushes
mkIf conditions to the “leaves” of the configuration tree, as if we had instead written this:
{ …
config = {
systemd.services.cowsay = {
wantedBy = lib.mkIf config.services.cowsay.enable [ "multi-user.target" ];
script =
lib.mkIf config.services.cowsay.enable
"${pkgs.cowsay}/bin/cowsay ${config.services.cowsay.greeting}";
};
};
}
let
kafkaSynonym =
{ config, lib, ... }:
config.services.apache-kafka.enable = config.services.kafka.enable;
};
in
{ imports = [ kafkaSynonym ];
config.services.apache-kafka.enable = true;
}
Advanced option definitions 51
The above example leads to a conflict because the kafkaSynonym module defines
services.kafka.enable to false (at priority 100), and the downstream module defines
services.apache-kafka.enable to true (also at priority 100).
let
kafkaSynonym =
{ config, lib, ... }:
config.services.apache-kafka.enable =
lib.mkIf config.services.kafka.enable true;
};
in
{ imports = [ kafkaSynonym ];
config.services.apache-kafka.enable = true;
}
… then that would do the right thing because in the default case services.apache-kafka.enable
would remain undefined, which would be the same thing as being defined as false at priority
1500. That avoids defining the same option twice at the same priority.
mkOrder
The NixOS module system strives to make the behavior of our system depend as little as possible
on the order in which we import or mkMerge NixOS modules. In other words, if we import two
modules that we depend on:
… then ideally the behavior shouldn’t change if we import those same two modules in a different
order:
… and in most cases that is true. 99% of the time you can safely sort your import list and either
your NixOS system will be exactly the same as before (producing the exact same Nix store build
product) or essentially the same as before, meaning that the difference is irrelevant. However,
for those 1% of cases where order matters we need the lib.mkOrder function.
Here’s one example of where ordering matters:
Advanced option definitions 52
let
moduleA = { pkgs, ... }: {
environment.defaultPackages = [ pkgs.gcc ];
};
in
{ imports = [ moduleA moduleB ]; }
Both the gcc package and clang package add a cc executable to the PATH, so the order matters
here because the first cc on the PATH wins.
In the above example, clang’s cc is the first one on the PATH, because we imported moduleB
second:
This sort of order-sensitivity frequently arises for “list-like” option types, including actual lists
or string types like lines that concatenate multiple definitions.
Fortunately, we can fix situations like these with the lib.mkOrder function, which specifies a
numeric ordering that NixOS will respect when merging multiple definitions of the same option.
Every option’s numeric order is 1000 by default, so if we set the numeric order of clang to 1500:
let
moduleA = { pkgs, ... }: {
environment.defaultPackages = [ pkgs.gcc ];
};
in
{ imports = [ moduleA moduleB ]; }
… then gcc will always come first on the PATH, no matter which order we import the modules.
You can also use lib.mkBefore and lib.mkAfter, which provide convenient synonyms for
numeric order 500 and 1500, respectively:
Advanced option definitions 53
let
moduleA = { pkgs, ... }: {
environment.defaultPackages = [ pkgs.gcc ];
};
in
{ imports = [ moduleA moduleB ]; }
8. Deploying to AWS using Terraform
Up until now we’ve been playing things safe and test-driving everything locally on our own
machine. We could even prolong this for quite a while because NixOS has advanced support
for building and testing clusters of NixOS machines locally using virtual machines. However, at
some point we need to dive in and deploy a server if we’re going to use NixOS for real.
In this chapter we’ll deploy our TODO app to our first “production” server in AWS meaning that
you will need to create an AWS account¹ to follow along.
AWS prices and offers will vary so this book can’t provide any strong guarantees about
what this would cost you. However, at the time of this writing the examples in this
chapter would fit well within the current AWS free tier, which is 750 hours of a t3.micro
instance.
Even if there were no free tier, the cost of a t3.micro instance is currently ≈1¢ / hour or
≈ $7.50 / month if you never shut it off (and you can shut it off when you’re not using
it). So at most this chapter should only cost you a few cents from start to finish.
Throughout this book I’ll take care to minimize your expenditures by showing how you
to develop and test locally as much as possible.
In the spirit of Infrastructure as Code, we’ll be using Terraform to declaratively provision AWS
resources, but before doing so we need to generate AWS access keys for programmatic access.
The above AWS documentation also recommends generating temporary access creden-
tials instead of long-term credentials. However, setting this up properly and ergonom-
ically requires setting up the IAM Identity Center which is only permitted for AWS
accounts that have set up an AWS Organization. That is way outside of the scope of this
book so instead you should just generate long-term credentials for a non-root admin
account.
If you haven’t already, configure your development environment to use these tokens by running:
If you’re not sure what region to use, pick the one closest to you based on
the list of AWS service endpoints³.
• module.nix + www/index.html
The NixOS configuration for our TODO list web application, except adapted to run on AWS
instead of inside of a qemu VM.
• flake.nix
A Nix flake that wraps our NixOS configuration so that we can refer to the configuration
using a flake URI.
• main.tf
The Terraform specification for deploying our NixOS configuration to AWS.
• backend/main.tf
This Terraform configuration provisions an S3 bucket for use with Terraform’s S3 backend⁴.
We won’t use this until the very end of this chapter, though, so we’ll ignore it for now.
³https://docs.aws.amazon.com/general/latest/gr/rande.html
⁴https://developer.hashicorp.com/terraform/language/settings/backends/s3
Deploying to AWS using Terraform 56
… and when prompted to enter the region, use the same AWS region you specified earlier when
running aws configure:
var.region
Enter a value: …
After that, terraform will display the execution plan and ask you to confirm the plan:
module.ami.data.external.ami: Reading...
module.ami.data.external.ami: Read complete after 1s [id=-]
Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
+ create
<= read (data resources)
… and if you confirm then terraform will deploy that execution plan:
Outputs:
public_dns = "ec2-….compute.amazonaws.com"
The final output will include the URL for your server. If you open that URL in your browser you
will see the exact same TODO server as before, except now running on AWS instead of inside of a
qemu virtual machine. If this is your first time deploying something to AWS then congratulations!
Cleaning up
Once you verify that everything works you can destroy all deployed resources by running:
Deploying to AWS using Terraform 57
terraform will prompt you for the same information (i.e. the same region) and also prompt for
confirmation just like before:
var.region
Enter a value: …
Now you can read the rest of this chapter in peace knowing that you are no longer being billed
for this example.
Terraform walkthrough
The key file in our Terraform project is main.tf containing the Terraform logic for how to deploy
our TODO list application.
You can think of a Terraform module as being sort of like a function with side effects, meaning:
Our starting main.tf file provides examples of all of the above concepts.
⁵https://developer.hashicorp.com/terraform/language/values/variables
⁶https://developer.hashicorp.com/terraform/language/values/outputs
⁷https://developer.hashicorp.com/terraform/language/resources/syntax
⁸https://developer.hashicorp.com/terraform/language/modules/syntax#calling-a-child-module
Deploying to AWS using Terraform 58
Input variables
For example, the beginning of the module declares one input variable:
variable "region" {
type = string
nullable = false
}
… which is analogous to a Nix function like this one that takes the following attribute set as an
input:
{ region }:
…
When you run terraform apply you will be automatically prompted to supply all input variables:
$ terraform apply
var.region
Enter a value: …
… but you can also provide the same values on the command line, too, if you don’t want to
supply them interactively:
… and if you really want to make the whole command non-interactive you can also add the
-auto-approve flag:
… so that you don’t have to manually confirm the deployment by entering “yes”.
Output variables
The end of the Terraform module declares one output value:
output "public_dns" {
value = aws_instance.todo.public_dns
}
… which would be like our function returning an attribute set with one attribute:
Deploying to AWS using Terraform 59
{ region }:
let
…
in
{ output = aws_instance.todo.public_dns; }
… and when the deploy completes Terraform will render all output values:
Outputs:
public_dns = "ec2-….compute.amazonaws.com"
Resources
In between the input variables and the output values the Terraform module declares several
resources. For now, we’ll highlight the resource that provisions the EC2 instance:
root_block_device {
volume_size = 7
}
}
… and you can think of resources sort of like let bindings that provision infrastructure as a side
effect:
Deploying to AWS using Terraform 60
{ region }:
let
…;
aws_security_group.todo = aws_security_group { … };
tls_private_key.nixos-in-production = tls_private_key { … };
local_sensitive_file.ssh_key_file = ssh_key_file { … };
aws_key_pair.nixos-in-production = aws_key_pair { … };
aws_instance.todo = aws_instance {
ami = module.ami.ami;
instance_type = "t3.micro";
security_groups = [ aws_security_group.todo.name ];
key_name = aws_key_pair.nixos-in-production.key_name;
root_block_device.volume_size = 7;
}
null_resource.wait = null_resource { … };
in
{ output = aws_instance.todo.public_dns; }
Our Terraform deployment declares six resources, the first of which declares a security group
(basically like a firewall):
# We need to open port 80 so that we can view our TODO list web page.
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = [ "0.0.0.0/0" ]
}
}
Deploying to AWS using Terraform 61
The next three resources generate an SSH key pair that we’ll use to manage the machine:
# Synchronize the SSH private key to a local file that the "nixos" module can
# use
resource "local_sensitive_file" "ssh_key_file" {
filename = "${path.module}/id_ed25519"
content = tls_private_key.nixos-in-production.private_key_openssh
}
# Mirror the SSH public key to EC2 so that we can later install the public key
# as an authorized key for our server
resource "aws_key_pair" "nixos-in-production" {
public_key = tls_private_key.nixos-in-production.public_key_openssh
}
The tls_private_key resource⁹ is currently not secure because the deployment state
is stored locally unencrypted. We will fix this later on in this chapter by storing the
deployment state using Terraform’s S3 backend¹⁰.
# We could use a smaller instance size, but at the time of this writing the
# t3.micro instance type is available for 750 hours under the AWS free tier.
instance_type = "t3.micro"
Finally, we declare a resource whose sole purpose is to wait until the EC2 instance is reachable
via SSH so that the “nixos” module knows how long to wait before deploying the NixOS
configuration:
⁹https://registry.terraform.io/providers/hashicorp/tls/latest/docs/resources/private_key
¹⁰https://developer.hashicorp.com/terraform/language/settings/backends/s3
Deploying to AWS using Terraform 62
# This ensures that the instance is reachable via `ssh` before we deploy NixOS
resource "null_resource" "wait" {
provisioner "remote-exec" {
connection {
host = aws_instance.todo.public_dns
private_key = tls_private_key.nixos-in-production.private_key_openssh
}
Modules
Our Terraform module also invokes two other Terraform modules (which I’ll refer to as “child
modules”) and we’ll highlight here the module that deploys the NixOS configuration:
module "ami" {
…;
}
module "nixos" {
source = "github.com/Gabriella439/terraform-nixos-ng//nixos?ref=d8563d06cc65bc699ffbf1ab8d692b1343ec\
d927"
host = "root@${aws_instance.todo.public_ip}"
flake = ".#default"
arguments = [ "--build-host", "root@${aws_instance.todo.public_ip}" ]
ssh_options = "-o StrictHostKeyChecking=accept-new"
depends_on = [ null_resource.wait ]
}
You can liken child modules to Nix function calls for imported functions:
{ region }:
let
module.ami = …;
module.nixos =
let
source = fetchFromGitHub {
owner = "Gabriella439";
repo = "terraform-nixos-ng";
rev = "d8563d06cc65bc699ffbf1ab8d692b1343ecd927";
hash = …;
};
in
import source {
host = "root@${aws_instance.todo_public_ip}";
flake = ".#default";
arguments = [ "--build-host" "root@${aws_instance.todo.public_ip}" ];
ssh_options = "-o StrictHostKeyChecking=accept-new";
depends_on = [ null_resource.wait ];
Deploying to AWS using Terraform 63
};
aws_security_group.todo = aws_security_group { … };
tls_private_key.nixos-in-production = tls_private_key { … };
local_sensitive_file.ssh_key_file = ssh_key_file { … };
aws_key_pair.nixos-in-production = aws_key_pair { … };
aws_instance.todo = aws_instance { … };
null_resource.wait = null_resource { … };
in
{ output = aws_instance.todo.public_dns; }
The first child module selects the correct NixOS AMI to use:
module "ami" {
source = "github.com/Gabriella439/terraform-nixos-ng//ami?ref=d8563d06cc65bc699ffbf1ab8d692b1343ecd9\
27"
release = "22.11"
region = var.region
system = "x86_64-linux"
}
… and the second child module deploys our NixOS configuration to our EC2 instance:
module "nixos" {
source = "github.com/Gabriella439/terraform-nixos-ng//nixos?ref=d8563d06cc65bc699ffbf1ab8d692b1343ec\
d927"
host = "root@${aws_instance.todo.public_ip}"
# Build our NixOS configuration on the same machine that we're deploying to
arguments = [ "--build-host", "root@${aws_instance.todo.public_ip}" ]
depends_on = [ null_resource.wait ]
}
Deploying to AWS using Terraform 64
In this example we build our NixOS configuration on our web server so that this example
can be deployed without any supporting infrastructure. However, you typically will
want to build the NixOS configuration on a dedicated builder rather than building on
the target server for two reasons:
The next chapter will cover how to provision a dedicated builder for this purpose.
S3 Backend
The above Terraform deployment doesn’t properly protect the key pair used to ssh into and
manage the NixOS machine. By default the private key of the key pair is stored in a world-
readable terraform.tfstate file. However, even if we were to restrict that file’s permissions we
wouldn’t be able to easily share our Terraform deployment with colleagues. In particular, we
wouldn’t want to add the terraform.tfstate file to version control in a shared repository since
it contains sensitive secrets.
The good news is that we can fix both of those problems by setting up an S3 backend¹¹ for
Terraform which allows the secret to be securely stored in an S3 bucket that can be shared by
multiple people managing the same Terraform deployment.
The template for this chapter’s Terraform configuration already comes with a backend/ subdirec-
tory containing a Terraform specification that provisions a suitable S3 bucket and DynamoDB
table for an S3 backend. All you have to do is run:
$ cd ./backend
$ terraform apply
var.region
Enter a value: …
¹¹https://developer.hashicorp.com/terraform/language/settings/backends/s3
Deploying to AWS using Terraform 65
Just make sure to use the same region as our original Terraform deployment when prompted.
When the deployment succeeds it will output the name of the randomly-generated S3 bucket,
which will look something like this (with a timestamp in place of the Xs):
…
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
Outputs:
bucket = "nixos-in-productionXXXXXXXXXXXXXXXXXXXXXXXXXX"
Then switch back to the original Terraform deployment in the parent directory:
$ cd ..
… and modify that deployment’s main.tf to reference the newly-created bucket like this:
terraform {
required_version = ">= 1.3.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.56"
}
}
These last few manual steps to update the S3 backend are a bit gross but this is primarily
to work around limitations in Terraform. In particular, Terraform doesn’t provide a
way for our main deployment to automatically reference the S3 backend we created.
Terraform specifically prohibits backend stanzas from referencing variables so all of the
backend options have to be hard-coded values.
Then you can upgrade your existing deployment to reference the S3 backend you just provisioned
by re-running terraform init with the -migrate-state option:
Deploying to AWS using Terraform 66
… and once that’s done you can verify that nothing broke by running terraform apply again,
which should report that no new changes need to be deployed:
$ terraform apply
var.region
Enter a value: …
The difference is that now the terraform state is securely stored in an S3 bucket instead of on
your filesystem so you’d now be able to store your Terraform configuration in version control
and let other developers manage the same deployment. There’s just one last thing you need to do,
which is to remove the terraform.tfstate.backup file, which contains the old (pre-S3-backend)
Terraform state, including the secrets:
$ rm terraform.tfstate.backup
You can also remove the terraform.tfstate file, too, since it’s empty and no longer used:
$ rm terraform.tfstate
Future Terraform examples in this book won’t include the S3 backend code to keep them
shorter, but feel free to reuse the same S3 bucket created in this chapter to upgrade any
of those examples with an S3 backend. However, if you do keep in mind that you need
to use a different key for storing Terraform’s state if you want to keep those examples
separate.
In other words, when adding the S3 backend to the terraform clause, specify a different
key for each separate deployment:
terraform { This
…
backend "s3" {
…
key = "…" # This is what needs to be unique per deployment
…
}
}
Deploying to AWS using Terraform 67
key is used by Terraform to record where to store the deployment’s state within the S3 bucket,
so if you use the same key for two different deployments they will will interfere with one another.
Version control
Once you create the S3 backend you can safely store your Terraform configuration in version
control. Specifically, these are the files that you want to store in version control:
• flake.lock
It’s also worth keeping this in version control even though it’s not strictly necessary. The
lock file slightly improves the determinism of the deployment, although the flake included
in the template is already fairly deterministic even without the lockfile because it references
a specific tag from Nixpkgs.
• main.tf / backend/main.tf
We definitely want to keep the Terraform deployments for our main deployment and the
S3 backend.
• terraform.tfstate
You don’t need to keep this in version control (it’s an empty file
Just as important, you do NOT want to keep the id_ed25519 file in version control (since this
contains the private key). In fact, the provided template includes a .gitignore file to prevent
you from accidentally adding the private key to version control.
Terraform will recreate this private key file locally for each developer that manages the
deployment. For example, if another developer were to apply the deployment for the first time,
they would see this diff:
… indicating that Terraform will download the private key from the S3 backend and create a
secure local copy in order to ssh into the machine.