Terraform Compiler Pattern: A maintainable and scalable architecture for Terraform

Yoriyasu Yano
Fensak Blog
Published in
9 min readDec 21, 2022

--

Use different tools to translate higher level languages down to Terraform to integrate with the runtimes.
Use different tools to translate higher level languages down to Terraform to integrate with the runtimes.

In this post, I will introduce a Terraform design pattern that I have been using successfully here at Fensak.

This pattern is not necessarily new or unique, but as far as I know, there is no official name for it so I am calling it the Terraform Compiler Pattern. This pattern has certain advantages over pure Terraform, especially in large scale deployments, and is adaptable to many tools in the ecosystem. Best of all, it is something that can be introduced gradually to existing workflows.

Background

Over the years Terraform has emerged as one of the defacto standards for defining Infrastructure as Code. Many organizations large and small rely on Terraform for managing their infrastructure across many different clouds and platforms, with its thousands of first-party and third-party providers, and tens of thousands of modules for provisioning a wide range of infrastructure. As a stable, robust, and reliable tool having reached 1.0, chances are you are already using it in your project.

Terraform code is defined in a dedicated dialect of the HashiCorp Configuration Language (HCL). HCL provides a structured syntax that enhances static configuration languages like JSON and YAML with programming constructs (e.g., loops and conditionals). By itself, HCL is a powerful enhancement over static configuration languages that is maintainable and adaptable to many different kinds of use cases.

However, there are certain limitations to pure Terraform that make it difficult to scale to large deployments. Terraform HCL is closely tied to the Terraform runtime, which leads to some non-language imposed limitations:

  • Code reusability is limited to a single module, and thus single state file. That is, you can’t have a module that outputs resources for multiple state files.
  • Certain constructs can not be interpolated dynamically (e.g., lifecycle and backend).
  • Certain blocks can not be reused across modules (e.g. provider).

These limiting factors are not significant when you are scoped to managing a single module. Indeed, Terraform does a great job of optimizing the developer experience when you are working within the confines of a single module.

On the other hand, Terraform best practices dictate that you should isolate your state files across environments and components, and this is where things start to break down.

Let’s take a look at an example.

Motivating example

Consider a Terraform deployment where you have a single database in a VPC, across two environments (stage and prod). To simplify the example, we will also assume we have defined sub modules for defining a canonical VPC and MySQL database. This will reduce the root modules in our examples to a single module block.

In this example, we intend to isolate the state files for the VPC and the Database, resulting in four state files:

  • VPC in Stage
  • VPC in Prod
  • MySQL database in Stage
  • MySQL database in Prod

To achieve this, we need to define the components with four root modules, one for each state file above. We will have a project structure like below:

.
├── stage
│ ├── mysql
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ └── provider.tf
│ └── vpc
│ ├── backend.tf
│ ├── main.tf
│ └── provider.tf
└── prod
├── mysql
│ ├── backend.tf
│ ├── main.tf
│ └── provider.tf
└── vpc
├── backend.tf
├── main.tf
└── provider.tf

The main.tf file for each component contains a single module call to define the underlying component infrastructure with some hardcoded parameters as the inputs.

For example, the stage/vpc/main.tf might look like:

module "vpc" {
source = "github.com/myorg/my-vpc?ref=v1.0.8"

name = "stage"
cidr_block = "10.0.0.0/16"
}

output "vpc_id" {
value = module.vpc.vpc_id
}

The provider.tf file will contain the provider configuration with the required_providers block to specify the version. Assuming an AWS deployment, this might look like:

terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0"
}
}
}

provider "aws" {
region = "us-west-2"
}

Finally, the backend.tf file will contain the backend configuration for storing the state file in S3. For example, for stage/vpc:

terraform {
backend "s3" {
bucket = "my-stage-bucket"
key = "vpc/terraform.tfstate"
region = "us-west-2"
}
}

This setup works well for a small scale deployment like above, or for a simple provider and backend configuration.

But what if you have hundreds of components distributed across tens of environments? And what if you need to adjust your provider configurations to use different AWS IAM Roles or you want to enhance it with allowed_account_ids?

The challenge is that you can’t modularize the provider and backend configurations due to the aforementioned limitations. This results in the provider.tf and backend.tf files being mostly duplicated across all the modules, and maintained manually, leading to a painful refactoring process if you ever need to change a pattern in one of these files.

Luckily, there is a better way to manage this in the Terraform Compiler Pattern.

Terraform Compiler Pattern

To understand the Terraform Compiler Pattern*, it is worth getting in to a little bit of background on compilers. If you are not familiar with compilers, they are the programs that translate general purpose programming languages into executable machine code.

* NOTE: technically, this pattern is more of a Terraform transpiler as opposed to a compiler, but compiler sounds better so I’m sticking with it.

In particular, this pattern is heavily inspired by the architecture of the LLVM compiler. Traditionally, compilers had to be responsible for a point-to-point translation, where it would take source code in one language and generate the machine code for a single platform. This oftentimes led to compilers that were optimized for a handful of languages and platforms (e.g., only supporting x86_64 or arm64), and the burden was on language maintainers for introducing new platforms.

A major innovation in the LLVM compiler is the introduction of a general purpose Intermediate Representation (IR), which is the form used to represent code within the compiler. With the LLVM IR, language maintainers now only need to target the LLVM IR and rely on the LLVM compiler to translate the IR to the different machine platforms to produce the final artifact. This meant that language maintainers only needed to implement a single translation, and then they are able to achieve compatibility with the great majority of machine platforms for practically free.

Use a common optimizer targeting LLVM IR to integrate with different frontends and backends.
Retargetable compiler design, from http://aosabook.org/en/llvm.html

In the Terraform Compiler Pattern, we adapt this architecture to the Terraform world by treating Terraform HCL as a form of IR for the cloud. Instead of writing HCL directly, we rely on a higher level language that is not limited by the runtime-imposed limitations. This code is then compiled down into a collection of Terraform HCL code that is then passed on to the different runtimes of the Terraform ecosystem.

For example, you might use Jsonnet with tf.libsonnet to define your Terraform code in the Jsonnet language. Since the Jsonnet language is not tied to the Terraform runtime, you are not limited by the same restrictions as writing HCL, which gives you mostly free reign over the amount of code reuse and dynamicism you introduce. You can then take this Jsonnet code and compile it down to Terraform to execute it and deploy your infrastructure. To see what you can accomplish with this, check out Using Jsonnet to DRY multi-component multi-environment Terraform Projects for an example of DRY-ing up the motivating example.

The beauty of this is that it makes refactoring and experimentation a breeze, not just the overall Terraform writing experience. When starting out, you can try out different languages and pick the best one that works for you. Later on, if a new frontend language emerges, you can experiment with it and drop it in, relying on diffs against the compiled code to be sure that you aren't changing any behavior. All with minimal updates to your existing Terraform pipelines!

Note that I used Jsonnet in the above example, but really the pattern is agnostic to the frontend code. CDKTF is a popular alternative that is conducive to this pattern with the synth command, and in fact, is the official way to integrate with TFC/TFE. In this way, the pattern is flexible in the implementation and does not favor any particular tool over others.

Comparison to other tools

At first glance, this may not feel necessary. After all, there is only one terraform CLI. As such, you could address the limitations by using a purpose built replacement for terraform.

There are at least four popular tools that implement this approach:

Each of these tools attempt to solve the problem by using a similar approach with a templating abstraction (Terragrunt and Terramate use an HCL abstraction, while CDKTF and Pulumi use general purpose programming languages).

However, in addition to being a templating abstraction, these tools attempt to manage the lifecycle of the resources, and thus the Terraform runtime.

For example, instead of running terraform plan and terraform apply, you might run:

  • terragrunt plan and terragrunt apply
  • terramate run
  • pulumi up
  • cdktf deploy

This control gives each of these tools the extensibility to implement feature enhancements that are not provided by terraform natively (e.g., terragrunt run-all and dependency blocks).

With that said, there are two significant disadvantages that you trade for this power:

  • Leaky abstractions. The tools try to abstract away Terraform, but because of the law of leaky abstractions, it is impossible to completely hide it. Bugs and issues in Terraform frequently bubble up to the abstraction layer, which can lead to major confusions around debuggability. To most users, it is not clear when an issue is at the Terraform layer or the abstraction layer. In some cases, dropping to Terraform is necessary to debug the issues. Albeit, most of these tools provide a way to debug such issues, but it may not feel natural (see terragrunt debugging for example).
  • Poor integration support. Runtime services need to natively support the runtime and outputs of these tools because these tools depend on being invoked directly. This adds a barrier to entry as many runtimes need to make the decision on whether they should expend time on developing a native integration. Typically, many services start with native Terraform support, and later on add these tools as an afterthought, or may not even support it. For example, Terraform Cloud / Terraform Enterprise does not natively support calling anything other than terraform. This may prevent your team from adopting these tools.

Using the Terraform Compiler Pattern mitigates these limitations by focusing just on the templating abstraction. This results in a clean separation between the higher level abstraction language and the terraform world.

Here is how that addresses the above disadvantages:

  • Clear separation of boundaries. Note that using this pattern does not preclude you from learning Terraform. You must still be aware of Terraform, and the associated runtime. However,
  • Any runtime/tool that natively supports Terraform is supported. Because the Terraform Compiler Pattern is a pure code generation abstraction, you can easily drop it into the existing workflows and runtime. You can tack this on top as a preprocessing step. A simple way to achieve this is to introduce a new repository for your higher level code, and then “ship” the compiled Terraform code to your existing repository as an automated pull request. This should allow you to gradually migrate to the new workflow over time, or even just focus on the heavily duplicated code to introduce this pattern.

Note that some of the aforementioned tools support a similar pattern by focusing on just the templating feature. For example, as mentioned above, with cdktf, you can use cdktf synth to achieve a similar effect. The point is that you want to focus on using the tools for pure code generation rather than as a drop in replacement for the terraform runtime.

Of course, the downside of the pattern is that you now lose access to some of the distinguishing features of the tool. If you are reliant on those features in your workflows, that may be a significant enough deterrent to avoid this pattern, though you may be able to find workarounds (e.g., using Terraform Cloud Run Triggers instead of terragrunt run-all).

If you can workaround these runtime limitations, the compiler pattern can provide a powerful, clean alternative to some of these tools, or provide ways to integrate with tools and services that may otherwise be impossible.

Summary

  • The Terraform Compiler Pattern is a design pattern for writing Terraform code in a higher level abstraction that is translated into Terraform code, which is then passed to other runtimes.
  • This approach can address the limitations of the Terraform configuration language, especially around code reusability of some constructs.
  • By focusing just on code generation and not runtime control, you can integrate with all the tools and services in the ecosystem without sacrificing on maintainability and scalability.

--

--

Staff level Startup Engineer with 10+ years experience (formerly at Gruntwork)