<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>diary of a lazy developer</title>
    <description>Post tecnici dai miei progetti</description>
    <link>https://alessandra.bilardi.net/diary/</link>
    <atom:link href="https://alessandra.bilardi.net/diary/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>2026-04-11</pubDate>
    <lastBuildDate>Sat, 11 Apr 2026 01:58:13 +0200</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
    
      <item>
        <title>Docker on EC2 with Terraform</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/bilardi/aws-docker-host/master/images/architecture.drawio.png&quot; alt=&quot;Architecture&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;why-this-project&quot;&gt;Why this project&lt;/h2&gt;

&lt;p&gt;I was preparing a &lt;a href=&quot;https://github.com/bilardi/n8n-workshop&quot;&gt;workshop&lt;/a&gt; and needed to expose a url with a specific interface, sparing participants from installing docker or anything else on their machines.&lt;/p&gt;

&lt;p&gt;I built the workshop locally with docker compose, which is one of the ways to develop and test locally: it works, it’s fast, it’s reproducible. And then?&lt;/p&gt;

&lt;p&gt;Then you need to move everything to the cloud. And as a lazy developer, why not use that same docker compose?&lt;/p&gt;

&lt;p&gt;The point isn’t running Docker in the cloud - it’s everything around it: HTTPS, custom domain, machine access, data backups, and the ability to rebuild or tear it all down with one command.&lt;/p&gt;

&lt;p&gt;With IaC you can manage HTTPS, custom domain, backups, access and cleanup smoothly: everything in one place, versioned, reproducible. Without IaC, you start from scratch every time.&lt;/p&gt;

&lt;p&gt;The usual options:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Manual EC2 setup&lt;/strong&gt;: SSH in, install Docker, configure nginx, certbot, and pray. Slow, fragile, and hard to reproduce.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;ECS/Fargate&lt;/strong&gt;: task definition, service discovery, cluster .. for what ? Using Fargate for a single container is like hiring a moving truck to carry your groceries home.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Docker on EC2 with Terraform&lt;/strong&gt;: one &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;terraform apply&lt;/code&gt; to spin up, one &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bash scripts/destroy.sh&lt;/code&gt; to tear down. Backups included.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The third option is what I chose because it has the simplest architecture .. and the most complex part depends on your user data !&lt;/p&gt;

&lt;p&gt;The architecture in the image above is generated directly from the Terraform code (spoiler) in the &lt;a href=&quot;https://github.com/bilardi/aws-docker-host&quot;&gt;repo&lt;/a&gt;, where you can find the README.md and all the details to use it.&lt;/p&gt;

&lt;p&gt;But let’s take it step by step. The third option can be implemented in 1024 different ways: which IaC tool ? How do you handle HTTPS ? How do you access the machine ? Where do you store backups ? How do you manage DNS ? Which AMI ? It depends. The point is asking the right questions.&lt;/p&gt;

&lt;p&gt;As a lazy developer, every choice follows one criterion: less effort, in terms of time, cost, or both. And when less effort isn’t enough to decide, the cleanest path is a minimal system: you know what’s there, you know what’s missing, no surprises.&lt;/p&gt;

&lt;h2 id=&quot;why-terraform-and-not-cdk&quot;&gt;Why Terraform and not CDK&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;Terraform&lt;/th&gt;
      &lt;th&gt;CDK&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Language&lt;/td&gt;
      &lt;td&gt;HCL: declarative, simple&lt;/td&gt;
      &lt;td&gt;TypeScript/Python: powerful but verbose for simple infra&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;State&lt;/td&gt;
      &lt;td&gt;Local file, zero dependencies&lt;/td&gt;
      &lt;td&gt;Requires CloudFormation stack, S3 bucket for assets&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Bootstrap&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;terraform init&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cdk bootstrap&lt;/code&gt; already creates resources in your AWS account&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Learning curve&lt;/td&gt;
      &lt;td&gt;Low for simple infra&lt;/td&gt;
      &lt;td&gt;Need to know both CDK and CloudFormation .. and their quirks&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Destruction&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;terraform destroy&lt;/code&gt;: clean, predictable&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cdk destroy&lt;/code&gt;, which sometimes leaves orphaned resources&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;For an ephemeral workshop run by one person, Terraform with local state is the minimum effort. CDK makes sense when the infra grows, you need complex logic, or there’s a team involved.&lt;/p&gt;

&lt;h2 id=&quot;the-choices-and-why&quot;&gt;The choices and why&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Choice&lt;/th&gt;
      &lt;th&gt;Why (less effort)&lt;/th&gt;
      &lt;th&gt;The discarded alternative (more effort)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ALB + ACM&lt;/td&gt;
      &lt;td&gt;Free HTTPS certificate, auto-renewal, no certbot/nginx&lt;/td&gt;
      &lt;td&gt;Let’s Encrypt on EC2: port 80 open, cron for renewal, more moving parts&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SSM instead of SSH&lt;/td&gt;
      &lt;td&gt;No keys, no port 22, audit trail on CloudTrail&lt;/td&gt;
      &lt;td&gt;SSH key pair, SG rules, bastion if private subnet&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;S3 for backups&lt;/td&gt;
      &lt;td&gt;Costs nothing, survives the EC2, simple CLI&lt;/td&gt;
      &lt;td&gt;EBS snapshot: tied to instance lifecycle, harder to restore&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Route 53 hosted zone&lt;/td&gt;
      &lt;td&gt;DNS validation for ACM, alias record for ALB, all managed by Terraform&lt;/td&gt;
      &lt;td&gt;External DNS only: manual certificate validation or HTTP challenge&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Amazon Linux 2023 minimal&lt;/td&gt;
      &lt;td&gt;Clean AMI, you install only what you need&lt;/td&gt;
      &lt;td&gt;AL2023 standard: doesn’t have Docker anyway, but has hundreds of extra packages you don’t need&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker compose up --build&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Works with both &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;build&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;image&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Separate logic for build vs pull: pointless complexity&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Local state&lt;/td&gt;
      &lt;td&gt;The workshop is ephemeral, one operator, no team&lt;/td&gt;
      &lt;td&gt;Remote state (S3 + DynamoDB): cost and setup for zero benefit&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Conditional VPC&lt;/td&gt;
      &lt;td&gt;Three modes: use an existing VPC, find the default, or create a new one&lt;/td&gt;
      &lt;td&gt;Always new VPC: waste for a workshop running in the default VPC&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Conditional S3 bucket&lt;/td&gt;
      &lt;td&gt;Pass one and it uses it. Don’t, and it creates one named after the domain&lt;/td&gt;
      &lt;td&gt;Always new bucket: waste for someone running many workshops and just managing backups&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;what-i-learned-the-hard-way&quot;&gt;What I learned (the hard way)&lt;/h2&gt;

&lt;h3 id=&quot;the-right-ami-and-how-much-disk&quot;&gt;The right AMI and how much disk&lt;/h3&gt;

&lt;p&gt;As a lazy developer, instead of reading the documentation, one command to see what’s out there:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;aws ec2 describe-images &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--filters&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Name=name,Values=al2023-ami-*-x86_64&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--owners&lt;/span&gt; amazon &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--query&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;reverse(sort_by(Images, &amp;amp;CreationDate))[:10].[Name, BlockDeviceMappings[0].Ebs.VolumeSize]&apos;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Three variants: &lt;strong&gt;minimal&lt;/strong&gt; (2 GB), &lt;strong&gt;standard&lt;/strong&gt; (8 GB), &lt;strong&gt;ECS-optimized&lt;/strong&gt; (30 GB). The ECS one comes with Docker but is meant to run in an ECS cluster, not on a standalone EC2. Standard and minimal don’t have Docker: you need to install it either way.&lt;/p&gt;

&lt;p&gt;At that point, what does the standard have that minimal doesn’t ? SSM agent and a few hundred packages you don’t need. The &lt;a href=&quot;https://docs.aws.amazon.com/linux/al2023/ug/image-comparison.html&quot;&gt;package comparison page&lt;/a&gt; confirms it: no Docker, no buildx, nothing that changes the picture.&lt;/p&gt;

&lt;p&gt;Minimal is the cleanest choice: install Docker, SSM agent and buildx in the user data, and you know exactly what’s on the machine. One thing to watch: the 2 GB disk isn’t enough, set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;volume_size = 20&lt;/code&gt; and move on.&lt;/p&gt;

&lt;h3 id=&quot;ssm-user-is-not-root&quot;&gt;ssm-user is not root&lt;/h3&gt;

&lt;p&gt;When you connect with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aws ssm start-session&lt;/code&gt;, you’re &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ssm-user&lt;/code&gt;. You don’t have access to the Docker socket. Everything needs &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudo&lt;/code&gt;. Commands sent with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aws ssm send-command&lt;/code&gt; run as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;root&lt;/code&gt; though, so sudo is built in.&lt;/p&gt;

&lt;h3 id=&quot;buildx-no-buildx-no-build&quot;&gt;buildx: no buildx, no build&lt;/h3&gt;

&lt;p&gt;From Docker Compose v2.17+ the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--build&lt;/code&gt; flag requires buildx &amp;gt;= 0.17.0. The minimal AMI doesn’t have it. Without buildx, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker compose up --build&lt;/code&gt; fails even if no service uses &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;build&lt;/code&gt;: install it in the user data and forget about it.&lt;/p&gt;

&lt;h3 id=&quot;that-damn-cache&quot;&gt;That damn cache&lt;/h3&gt;

&lt;p&gt;After a destroy + redeploy, the new Route 53 hosted zone gets different nameservers. You update the NS records on the DNS provider, everything looks fine. But the browser says no.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dig @8.8.8.8&lt;/code&gt; tells you it’s all good. But your local resolver disagrees.&lt;/p&gt;

&lt;p&gt;What happens: your ISP’s resolver has the old SERVFAIL cached, and until it expires, that domain doesn’t exist as far as it’s concerned.&lt;/p&gt;

&lt;p&gt;The fix: temporarily switch your local DNS to Google (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;8.8.8.8&lt;/code&gt;) and wait for your provider’s cache to expire: they say 5-10 minutes, but sometimes (way) longer.&lt;/p&gt;

&lt;h2 id=&quot;anything-else-to-add-&quot;&gt;Anything else to add ?&lt;/h2&gt;

&lt;p&gt;When it’s not a workshop of a few hours but something that lasts weeks or months, it’s worth investing extra effort to make the system hold up over time. But remember, it’s always a temporary solution !&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;More subdomains&lt;/strong&gt;: more applications on the same ALB, with routing rules, separate target groups, and potentially more containers on the same EC2 or, if needed, dedicated EC2s per service&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Tactical scheduling&lt;/strong&gt;: start/stop the EC2 to save money off-hours, periodic backups with EventBridge + SSM, not just at destroy&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;CloudWatch alarms&lt;/strong&gt;: basic monitoring (CPU, disk, health check) with SNS notifications&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Auto-recovery&lt;/strong&gt;: ASG with min=max=1 to replace dying instances (user data restores everything from S3)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Spot instances&lt;/strong&gt;: for workshops that tolerate interruptions, ~70% cost reduction&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>2026-04-11</pubDate>
        <link>https://alessandra.bilardi.net/diary/articles/2026-04/docker-on-ec2-with-terraform.en</link>
        <guid isPermaLink="true">https://alessandra.bilardi.net/diary/articles/2026-04/docker-on-ec2-with-terraform.en</guid>
        
        <category>terraform</category>
        
        <category>docker</category>
        
        <category>aws</category>
        
        <category>ec2</category>
        
        
        <category>devops</category>
        
      </item>
    
  </channel>
</rss>
