Docker Multi-Stage Builds: Reduce Image Size by 90% and Improve Security (Beginner Guide)

Beginner

Why Should You Care About Docker Image Size?

Let me paint a picture I’ve seen play out dozens of times in my career. A team pushes a 1.2 GB Docker image to production. Deployments take forever. Autoscaling is sluggish because new nodes spend minutes just pulling the image. The security team runs a vulnerability scanner and finds 400+ CVEs — most of them in packages that have nothing to do with the actual application.

Every unnecessary tool, library, and file in your Docker image is a liability. It’s a larger attack surface for security vulnerabilities. It’s wasted bandwidth every time you push or pull the image. It’s slower CI/CD pipelines, slower deployments, and slower recovery when things go wrong.

Docker multi-stage builds solve this elegantly. They let you use one stage to build your application (with all the compilers, build tools, and dependencies that requires) and a separate stage to run it (with only the bare minimum). The result? Images that are often 90% smaller, dramatically more secure, and faster to deploy everywhere.

This isn’t a niche optimization. This is a fundamental best practice that every DevOps engineer and developer should understand. Let’s break it down step by step.

Understanding the Problem: Single-Stage Builds

Before we appreciate multi-stage builds, let’s see what life looks like without them. Here’s a typical single-stage Dockerfile for a simple Go application:

First, create a simple Go application. Create a file called main.go:

package main

import (
    "fmt"
    "net/http"
)

func main() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        fmt.Fprintf(w, "Hello from a tiny container!")
    })
    fmt.Println("Server starting on :8080")
    http.ListenAndServe(":8080", nil)
}

Initialize the Go module:

go mod init myapp

Now here’s the naive, single-stage Dockerfile. Create a file called Dockerfile.single:

# Single-stage build — DON'T do this in production
FROM golang:1.23

WORKDIR /app
COPY go.mod ./
COPY main.go ./

RUN go build -o myapp .

EXPOSE 8080
CMD ["./myapp"]

Build it and check the size:

docker build -f Dockerfile.single -t myapp:single-stage .
docker images myapp:single-stage

Expected output:

REPOSITORY   TAG            IMAGE ID       CREATED          SIZE
myapp        single-stage   a1b2c3d4e5f6   10 seconds ago   862MB

862 MB for an application that compiles down to roughly 6 MB. That image includes the entire Go toolchain, compiler, standard library sources, git, gcc, and a full Debian-based operating system. None of that is needed at runtime. Your Go binary is statically compiled — it runs on its own.

What Are Multi-Stage Builds?

Multi-stage builds, introduced in Docker 17.05, allow you to use multiple FROM statements in a single Dockerfile. Each FROM starts a new stage. You can selectively copy artifacts from one stage into another, leaving behind everything you don’t need.

Think of it like a construction site. You need heavy machinery, scaffolding, and raw materials to build a house. But once the house is finished, you remove all of that. The people living in the house don’t need a crane in the living room. Multi-stage builds work the same way.

Your First Multi-Stage Build

Let’s rewrite that Dockerfile using multi-stage builds. Create a file called Dockerfile:

# Stage 1: Build
FROM golang:1.23 AS builder

WORKDIR /app
COPY go.mod ./
COPY main.go ./

RUN CGO_ENABLED=0 GOOS=linux go build -o myapp .

# Stage 2: Run
FROM alpine:3.20

RUN addgroup -S appgroup && adduser -S appuser -G appgroup

WORKDIR /app
COPY --from=builder /app/myapp .

USER appuser
EXPOSE 8080
CMD ["./myapp"]

Let’s break down what’s happening:

  • Stage 1 (builder): Uses the full Go image to compile the application. The AS builder gives this stage a name we can reference later. We set CGO_ENABLED=0 to produce a fully static binary that doesn’t depend on C libraries.
  • Stage 2: Starts fresh from alpine:3.20, a minimal Linux distribution that’s only about 7 MB. It copies only the compiled binary from the builder stage using COPY --from=builder. The entire Go toolchain is left behind.
  • Security bonus: We create a non-root user and run the application as that user. This is a critical security practice.

Build and compare:

docker build -t myapp:multi-stage .
docker images myapp

Expected output:

REPOSITORY   TAG            IMAGE ID       CREATED          SIZE
myapp        multi-stage    f6e5d4c3b2a1   5 seconds ago    14.5MB
myapp        single-stage   a1b2c3d4e5f6   2 minutes ago    862MB

From 862 MB down to 14.5 MB. That’s a 98% reduction. And we can go even smaller.

Going Even Smaller with Scratch and Distroless

Using scratch (The Absolute Minimum)

The scratch image is an empty image — literally nothing in it. No shell, no utilities, no OS. For statically compiled languages like Go and Rust, this is perfect:

# Stage 1: Build
FROM golang:1.23 AS builder

WORKDIR /app
COPY go.mod ./
COPY main.go ./

RUN CGO_ENABLED=0 GOOS=linux go build -o myapp .

# Stage 2: Scratch — absolute minimum
FROM scratch

COPY --from=builder /app/myapp /myapp

EXPOSE 8080
ENTRYPOINT ["/myapp"]
docker build -t myapp:scratch -f Dockerfile.scratch .
docker images myapp:scratch

Expected output:

REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
myapp        scratch   b3c4d5e6f7a8   3 seconds ago    6.5MB

6.5 MB. That’s essentially just the binary. However, there are tradeoffs — you can’t docker exec into the container to debug because there’s no shell. There are no CA certificates for HTTPS calls unless you copy them in. For production services that make outbound HTTPS requests, you’d add:

COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

Using Distroless (The Practical Middle Ground)

Google’s distroless images give you a practical middle ground — they include essential runtime files like CA certificates and timezone data, but no shell, package manager, or other tools:

# Stage 1: Build
FROM golang:1.23 AS builder

WORKDIR /app
COPY go.mod ./
COPY main.go ./

RUN CGO_ENABLED=0 GOOS=linux go build -o myapp .

# Stage 2: Distroless
FROM gcr.io/distroless/static-debian12:nonroot

COPY --from=builder /app/myapp /myapp

EXPOSE 8080
ENTRYPOINT ["/myapp"]

The :nonroot tag runs as a non-root user by default — security baked right in.

Real-World Example: Node.js Application

Multi-stage builds aren’t just for compiled languages. They’re extremely valuable for Node.js, where node_modules and build tools can bloat your image dramatically.

Create a package.json:

{
  "name": "myapp",
  "version": "1.0.0",
  "scripts": {
    "start": "node server.js"
  },
  "dependencies": {
    "express": "^4.21.0"
  }
}

Create server.js:

const express = require('express');
const app = express();

app.get('/', (req, res) => {
  res.json({ message: 'Hello from a lean container!' });
});

app.listen(8080, () => {
  console.log('Server running on port 8080');
});

Here’s a multi-stage Dockerfile for Node.js:

# Stage 1: Install dependencies
FROM node:22-alpine AS deps

WORKDIR /app
COPY package.json package-lock.json* ./
RUN npm ci --only=production

# Stage 2: Run
FROM node:22-alpine

RUN addgroup -S appgroup && adduser -S appuser -G appgroup

WORKDIR /app

COPY --from=deps /app/node_modules ./node_modules
COPY package.json ./
COPY server.js ./

USER appuser
EXPOSE 8080
CMD ["node", "server.js"]

Key points for the Node.js example:

  • npm ci is used instead of npm install because it’s faster, more reliable, and respects the lockfile exactly. This is the standard for CI/CD environments.
  • --only=production skips devDependencies (testing frameworks, linters, build tools) that aren’t needed at runtime.
  • We start from node:22-alpine rather than node:22 — the Alpine variant is roughly 180 MB versus over 1 GB.

For a more complex project with a build step (like a TypeScript or React app), you’d add a build stage:

# Stage 1: Install ALL dependencies and build
FROM node:22-alpine AS builder

WORKDIR /app
COPY package.json package-lock.json* ./
RUN npm ci

COPY . .
RUN npm run build

# Stage 2: Production dependencies only
FROM node:22-alpine AS deps

WORKDIR /app
COPY package.json package-lock.json* ./
RUN npm ci --only=production

# Stage 3: Run
FROM node:22-alpine

RUN addgroup -S appgroup && adduser -S appuser -G appgroup

WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY package.json ./

USER appuser
EXPOSE 8080
CMD ["node", "dist/server.js"]

Three stages: one for building, one for gathering production dependencies, and one lean final image that contains only compiled output and runtime dependencies.

Real-World Example: Python Application

Python is dynamically interpreted, so multi-stage builds work a bit differently. The key strategy is to build wheels in one stage and install them in a clean stage:

# Stage 1: Build wheels
FROM python:3.12-slim AS builder

WORKDIR /app
COPY requirements.txt .

RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Stage 2: Run
FROM python:3.12-slim

RUN groupadd -r appgroup && useradd -r -g appgroup appuser

WORKDIR /app

COPY --from=builder /install /usr/local
COPY . .

USER appuser
EXPOSE 8080
CMD ["python", "app.py"]

The --prefix=/install flag tells pip to install packages into a specific directory, making it easy to copy just those packages into the final image.

The Security Argument: Smaller Means Safer

Image size reduction isn’t just about speed — it’s directly tied to security. Let me show you with real numbers. You can scan images for known vulnerabilities using tools like docker scout or Trivy:

# Install Trivy (macOS)
brew install trivy

# Scan the single-stage image
trivy image myapp:single-stage

# Scan the multi-stage image
trivy image myapp:multi-stage

Typical results you’ll see:

Image Size Critical CVEs High CVEs Total CVEs
myapp:single-stage (golang:1.23) 862 MB 5-15 20-50 200-400+
myapp:multi-stage (alpine:3.20) 14.5 MB 0-1 0-3 0-10
myapp:scratch 6.5 MB 0 0 0

(Exact numbers vary over time as new CVEs are discovered and patched. The pattern is consistent.)

Fewer packages means fewer things that can have vulnerabilities. The scratch image has zero OS packages, so the only vulnerabilities possible are in your application code and the Go standard library.

Common Mistakes Beginners Make

1. Not leveraging the Docker build cache

Order matters in Dockerfiles. Copy files that change less frequently first:

# GOOD — dependency file copied first, changes less often
COPY go.mod go.sum ./
R

Leave a Comment

Your email address will not be published. Required fields are marked *