Beginner
Why Should You Care About Docker Image Size?
Let me paint a picture I’ve seen play out dozens of times in my career. A team pushes a 1.2 GB Docker image to production. Deployments take forever. Autoscaling is sluggish because new nodes spend minutes just pulling the image. The security team runs a vulnerability scanner and finds 400+ CVEs — most of them in packages that have nothing to do with the actual application.
Every unnecessary tool, library, and file in your Docker image is a liability. It’s a larger attack surface for security vulnerabilities. It’s wasted bandwidth every time you push or pull the image. It’s slower CI/CD pipelines, slower deployments, and slower recovery when things go wrong.
Docker multi-stage builds solve this elegantly. They let you use one stage to build your application (with all the compilers, build tools, and dependencies that requires) and a separate stage to run it (with only the bare minimum). The result? Images that are often 90% smaller, dramatically more secure, and faster to deploy everywhere.
This isn’t a niche optimization. This is a fundamental best practice that every DevOps engineer and developer should understand. Let’s break it down step by step.
Understanding the Problem: Single-Stage Builds
Before we appreciate multi-stage builds, let’s see what life looks like without them. Here’s a typical single-stage Dockerfile for a simple Go application:
First, create a simple Go application. Create a file called main.go:
package main
import (
"fmt"
"net/http"
)
func main() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello from a tiny container!")
})
fmt.Println("Server starting on :8080")
http.ListenAndServe(":8080", nil)
}
Initialize the Go module:
go mod init myapp
Now here’s the naive, single-stage Dockerfile. Create a file called Dockerfile.single:
# Single-stage build — DON'T do this in production
FROM golang:1.23
WORKDIR /app
COPY go.mod ./
COPY main.go ./
RUN go build -o myapp .
EXPOSE 8080
CMD ["./myapp"]
Build it and check the size:
docker build -f Dockerfile.single -t myapp:single-stage .
docker images myapp:single-stage
Expected output:
REPOSITORY TAG IMAGE ID CREATED SIZE
myapp single-stage a1b2c3d4e5f6 10 seconds ago 862MB
862 MB for an application that compiles down to roughly 6 MB. That image includes the entire Go toolchain, compiler, standard library sources, git, gcc, and a full Debian-based operating system. None of that is needed at runtime. Your Go binary is statically compiled — it runs on its own.
What Are Multi-Stage Builds?
Multi-stage builds, introduced in Docker 17.05, allow you to use multiple FROM statements in a single Dockerfile. Each FROM starts a new stage. You can selectively copy artifacts from one stage into another, leaving behind everything you don’t need.
Think of it like a construction site. You need heavy machinery, scaffolding, and raw materials to build a house. But once the house is finished, you remove all of that. The people living in the house don’t need a crane in the living room. Multi-stage builds work the same way.
Your First Multi-Stage Build
Let’s rewrite that Dockerfile using multi-stage builds. Create a file called Dockerfile:
# Stage 1: Build
FROM golang:1.23 AS builder
WORKDIR /app
COPY go.mod ./
COPY main.go ./
RUN CGO_ENABLED=0 GOOS=linux go build -o myapp .
# Stage 2: Run
FROM alpine:3.20
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --from=builder /app/myapp .
USER appuser
EXPOSE 8080
CMD ["./myapp"]
Let’s break down what’s happening:
- Stage 1 (
builder): Uses the full Go image to compile the application. TheAS buildergives this stage a name we can reference later. We setCGO_ENABLED=0to produce a fully static binary that doesn’t depend on C libraries. - Stage 2: Starts fresh from
alpine:3.20, a minimal Linux distribution that’s only about 7 MB. It copies only the compiled binary from the builder stage usingCOPY --from=builder. The entire Go toolchain is left behind. - Security bonus: We create a non-root user and run the application as that user. This is a critical security practice.
Build and compare:
docker build -t myapp:multi-stage .
docker images myapp
Expected output:
REPOSITORY TAG IMAGE ID CREATED SIZE
myapp multi-stage f6e5d4c3b2a1 5 seconds ago 14.5MB
myapp single-stage a1b2c3d4e5f6 2 minutes ago 862MB
From 862 MB down to 14.5 MB. That’s a 98% reduction. And we can go even smaller.
Going Even Smaller with Scratch and Distroless
Using scratch (The Absolute Minimum)
The scratch image is an empty image — literally nothing in it. No shell, no utilities, no OS. For statically compiled languages like Go and Rust, this is perfect:
# Stage 1: Build
FROM golang:1.23 AS builder
WORKDIR /app
COPY go.mod ./
COPY main.go ./
RUN CGO_ENABLED=0 GOOS=linux go build -o myapp .
# Stage 2: Scratch — absolute minimum
FROM scratch
COPY --from=builder /app/myapp /myapp
EXPOSE 8080
ENTRYPOINT ["/myapp"]
docker build -t myapp:scratch -f Dockerfile.scratch .
docker images myapp:scratch
Expected output:
REPOSITORY TAG IMAGE ID CREATED SIZE
myapp scratch b3c4d5e6f7a8 3 seconds ago 6.5MB
6.5 MB. That’s essentially just the binary. However, there are tradeoffs — you can’t docker exec into the container to debug because there’s no shell. There are no CA certificates for HTTPS calls unless you copy them in. For production services that make outbound HTTPS requests, you’d add:
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
Using Distroless (The Practical Middle Ground)
Google’s distroless images give you a practical middle ground — they include essential runtime files like CA certificates and timezone data, but no shell, package manager, or other tools:
# Stage 1: Build
FROM golang:1.23 AS builder
WORKDIR /app
COPY go.mod ./
COPY main.go ./
RUN CGO_ENABLED=0 GOOS=linux go build -o myapp .
# Stage 2: Distroless
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /app/myapp /myapp
EXPOSE 8080
ENTRYPOINT ["/myapp"]
The :nonroot tag runs as a non-root user by default — security baked right in.
Real-World Example: Node.js Application
Multi-stage builds aren’t just for compiled languages. They’re extremely valuable for Node.js, where node_modules and build tools can bloat your image dramatically.
Create a package.json:
{
"name": "myapp",
"version": "1.0.0",
"scripts": {
"start": "node server.js"
},
"dependencies": {
"express": "^4.21.0"
}
}
Create server.js:
const express = require('express');
const app = express();
app.get('/', (req, res) => {
res.json({ message: 'Hello from a lean container!' });
});
app.listen(8080, () => {
console.log('Server running on port 8080');
});
Here’s a multi-stage Dockerfile for Node.js:
# Stage 1: Install dependencies
FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json* ./
RUN npm ci --only=production
# Stage 2: Run
FROM node:22-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY package.json ./
COPY server.js ./
USER appuser
EXPOSE 8080
CMD ["node", "server.js"]
Key points for the Node.js example:
npm ciis used instead ofnpm installbecause it’s faster, more reliable, and respects the lockfile exactly. This is the standard for CI/CD environments.--only=productionskips devDependencies (testing frameworks, linters, build tools) that aren’t needed at runtime.- We start from
node:22-alpinerather thannode:22— the Alpine variant is roughly 180 MB versus over 1 GB.
For a more complex project with a build step (like a TypeScript or React app), you’d add a build stage:
# Stage 1: Install ALL dependencies and build
FROM node:22-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json* ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 2: Production dependencies only
FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json* ./
RUN npm ci --only=production
# Stage 3: Run
FROM node:22-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY package.json ./
USER appuser
EXPOSE 8080
CMD ["node", "dist/server.js"]
Three stages: one for building, one for gathering production dependencies, and one lean final image that contains only compiled output and runtime dependencies.
Real-World Example: Python Application
Python is dynamically interpreted, so multi-stage builds work a bit differently. The key strategy is to build wheels in one stage and install them in a clean stage:
# Stage 1: Build wheels
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# Stage 2: Run
FROM python:3.12-slim
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
WORKDIR /app
COPY --from=builder /install /usr/local
COPY . .
USER appuser
EXPOSE 8080
CMD ["python", "app.py"]
The --prefix=/install flag tells pip to install packages into a specific directory, making it easy to copy just those packages into the final image.
The Security Argument: Smaller Means Safer
Image size reduction isn’t just about speed — it’s directly tied to security. Let me show you with real numbers. You can scan images for known vulnerabilities using tools like docker scout or Trivy:
# Install Trivy (macOS)
brew install trivy
# Scan the single-stage image
trivy image myapp:single-stage
# Scan the multi-stage image
trivy image myapp:multi-stage
Typical results you’ll see:
| Image | Size | Critical CVEs | High CVEs | Total CVEs |
|---|---|---|---|---|
| myapp:single-stage (golang:1.23) | 862 MB | 5-15 | 20-50 | 200-400+ |
| myapp:multi-stage (alpine:3.20) | 14.5 MB | 0-1 | 0-3 | 0-10 |
| myapp:scratch | 6.5 MB | 0 | 0 | 0 |
(Exact numbers vary over time as new CVEs are discovered and patched. The pattern is consistent.)
Fewer packages means fewer things that can have vulnerabilities. The scratch image has zero OS packages, so the only vulnerabilities possible are in your application code and the Go standard library.
Common Mistakes Beginners Make
1. Not leveraging the Docker build cache
Order matters in Dockerfiles. Copy files that change less frequently first:
# GOOD — dependency file copied first, changes less often
COPY go.mod go.sum ./
R