The personal blog of Scott Beru
20 May 2021
This is going to be a quick one; I recently had a really tricky Packer problem, and I don’t think I’m the only one who has had it (or will have it). I’d like to explain how I solved it.
At the company I currently work at, we’re trying to Packerize our server bootstrapping process so that we can produce an AWS AMI as a kind of “baseline” to Terraform new servers from.
We have some servers in data centers, some OpenStack VMs, and some AWS EC2 servers, but common to all of them is an initial setup process that involves running an Ansible job against the server as soon as we bring it up, having it pull down some packages from a private repository, and then doing a bunch of installation and configuration of custom software. This lends itself really well to Packer, right? The process of building an AMI would just be, theoretically:
…right?
The main problem we ran into is specific to our deployment; our private package repos all have firewalls that allowlist the public IP of a new server as part of the pre-Ansible bringup process. But this is a huge problem for us, because the Packer AWS builder is designed to yank a new ephemeral public IP from AWS every time it spins up an instance; there’s no obvious way to maintain a single IP (even if you go so far as to reserve an Elastic IP, there’s no builder config to reach out for it).
tl;dr If you have a server that’s necessary for bringup and relies on a non-dynamic IP allowlist, you have a problem that Packer does not have an obvious solution for.
The first idea I had for this was to set up a VPC (dedicated to Packer builder instances) with a NAT Gateway, under the assumption that we’d just be able to use the Gateway IP (a static Elastic IP) for the instance to access resources outside the VPC, including our servers. Just a single, static source EIP for whatever instance we stick in the VPC. Sounds great, right? And it might be, for some applications—but for us, this was a nonstarter, because Ansible needed to communicate with the instance, and the NAT Gateway is functionally unidirectional; you could have 4090 instances behind it, so SSH to that EIP is just going to bounce off without a Static NAT back to whatever instance we have, and I honestly just wasn’t in the mood to try and setup (and tear down) one of those dynamically as part of a Packer Build.
Resigned to the idea of a “floating” EIP that would simply get attached to an instance at Packer run-time, I started to play around with ways to accomplish this.
The configuration basically looked like this before I started messing around with the EIP stuff:
source "amazon-ebs" "aws_builder" {
access_key = "${var.aws_access_key}"
ami_description = "Base Image - Ubuntu"
ami_name = "base_ubuntu_${local.timestamp}"
instance_type = "t2.large"
region = "us-east-1"
availability_zone = "us-east-1b"
secret_key = "${var.aws_secret_key}"
source_ami = "${data.amazon-ami.plain_ubuntu_image.id}"
ssh_username = "ubuntu"
}
build {
sources = ["source.amazon-ebs.aws_builder"]
# Some initial setup needed to enable Ansible to run properly
#
provisioner "shell" {
script = "setup_users.sh"
}
# Run ansible bootstrapping playbook
#
provisioner "ansible" {
# yada yada yada
# ....
}
}
My first instinct was to run a local-shell
provisioner, and simply run a command like this before we do anything else:
provisioner "shell-local" {
environment_vars = ["AWS_ACCESS_KEY_ID=${var.aws_access_key}", "AWS_SECRET_ACCESS_KEY=${var.aws_secret_key}", "AWS_DEFAULT_REGION=us-east-1"]
inline = ["aws ec2 associate-address --instance-id ${build.ID} --allocation-id ${var.eip_id}"]
}
And this does, in fact, change the IP of the builder instance to the floating EIP! (That is, if you grant your Packer user the ec2:AssociateAddress
permission). The problem with this is that, if you change the IP of the instance, Packer doesn’t know about the new IP. It’ll keep trying to reconnect to the original ephemeral public IP that it grabbed when it first created the instance. So your job will timeout shortly after you run this command. Boo.
I then sought to change the configured ssh_host (the parameter for the AWS builder instance’s IP address) in-flight to match the EIP—basically, I thought, I would set up the instance initially using its ephemeral IP, then assign a new one from my local shell, and finally tell Packer that the instance was at a new ssh_host address
, so that it could carry on starting an Ansible job to the new address.
source "amazon-ebs" "aws_builder" {
access_key = "${var.aws_access_key}"
ami_description = "Base Image - Ubuntu"
ami_name = "base_ubuntu_${local.timestamp}"
instance_type = "t2.large"
region = "us-east-1"
availability_zone = "us-east-1b"
secret_key = "${var.aws_secret_key}"
source_ami = "${data.amazon-ami.plain_ubuntu_image.id}"
ssh_username = "ubuntu"
}
build {
sources = ["source.amazon-ebs.aws_builder"]
provisioner "shell-local" {
environment_vars = ["AWS_ACCESS_KEY_ID=${var.aws_access_key}", "AWS_SECRET_ACCESS_KEY=${var.aws_secret_key}", "AWS_DEFAULT_REGION=us-east-1"]
inline = ["aws ec2 associate-address --instance-id ${build.ID} --allocation-id ${var.eip_id}"]
}
source "source.amazon-ebs.aws_builder" {
ssh_host = "${var.eip_id}"
}
# Some initial setup needed to enable Ansible to run properly
#
provisioner "shell" {
script = "setup_users.sh"
}
# Run ansible bootstrapping playbook
#
provisioner "ansible" {
# yada yada yada
# ....
}
}
But this really, really didn’t have the intended effect. Packer took the additional ssh_host
variable in the build
block to indicate that it should spin up its usual instance (with an ephemeral IP), as well as a parallel instance, with the floating EIP assigned to it! This is where HashiCorp’s characteristically sparse documentation really hurt me; they mention (in this doc; c.f. “The build-level source block”) that, within a build
block, you can reference a source
to assign parameters that are not explicitly defined in the initial configuration. This really sounds like it would allow you to dynamically alter parameters, but that seems to be pretty inconsistent across builders.
It was starting to feel like even the notion of attaching a floating EIP was hopeless with the tooling provided out of the box by Packer and the AWS EBS builder source. At wit’s end, I finally decided to go down a path suggested by an old Google Groups post. It felt like a hack. But it was also very clear to me that there was no other way.
cloud-init
)The basic structure of the solution goes something like this:
ssh_host = "${var.your_eip_address}"
If you do all of that: Bob’s your uncle, Fanny’s your aunt, bim, boom: you should be associating a floating EIP to your builder instances in no time. I’m going to walk you through what this config looks like from the AWS side, as well as from the cloud-init config side.
Let’s talk about those “enhanced permissions” for the Packer user’s IAM policy first.
Packer helpfully provides the JSON for an IAM role sufficient for minimal Packer permissions. This is sufficient for regular Packer builds with an EBS builder, but we actually need to add a few extra permissions to make this floating EIP scheme work.
You’ll need to give the Packer user’s policy three extra permissions:
ec2:AssociateAddress
(Write-level access)iam:GetInstanceProfile
iam:PassRole
The first of these is literally to run the command associating the EIP to the instance, and the latter two are related to Instance Profile permissions (GetInstanceProfile allows your Packer user account to access the instance profile you’ll be creating, and PassRole is a weird AWS requirement for actually using an instance with that role).
Your user-script might be configuring other stuff (ours sure is), but to associate a new IP address to the instance running the command, your user-script just needs the following config (with eipalloc-#############
replaced by the actual ID of your EIP allocation):
#cloud-config
packages:
- awscli
runcmd:
- AWS_DEFAULT_REGION=us-east-2 aws ec2 associate-address --instance-id $(curl -s "http://169.254.169.254/latest/meta-data/instance-id") --allocation-id eipalloc-#############
Look trivial? Good, because it took me, like, an hour to figure out all of the nonsense related to YAML, self-reference of AWS metadata, and permissions.
I’m just going to get right to it: here’s the AWS EBS builder definition that actually does the thing (as HCL2; JSON config in HashiCorp products is for gibbering heathen poltroons):
source "amazon-ebs" "aws_builder" {
access_key = "${var.aws_access_key}"
ami_description = "Base Image - Ubuntu"
ami_name = "base_ubuntu_${local.timestamp}"
instance_type = "t2.large"
region = "us-east-1"
availability_zone = "us-east-1b"
secret_key = "${var.aws_secret_key}"
source_ami = "${data.amazon-ami.plain_ubuntu_image.id}"
ssh_username = "ubuntu"
user_data_file = "user_data" # The filename (and/or path to the filename) of the user-script configured above
ssh_host = "${var.eip_address}" # The floating EIP, i.e., the static persistent IP address
iam_instance_profile = "Packer_Instance_Profile" # The *name* (not ID) of the Instance Profile configured above
}
Thanks for reading—I hope this helps save someone all the toil I had to spend to write it!