August 1st, 2024 - Alexander Mattoni, Head of Engineering

The Secret To Blazing Fast Docker Builds

It's not an understatement to say Dockerfiles are the underpinning of modern DevOps. Writing a simple Dockerfile that 'works' is relatively straightforward, but there are several tricks and tips that could significantly improve the build speed and efficiency of your container images.

If your current Dockerfiles copy in multi gigabyte contexts, reinstall dependencies on every build, or use only a single stage, we need to talk.

Dockerfiles Are Like Onions - They Have Layers

A Dockerfile is essentially a 'script', with some instructions on how to put together an image that can be used to launch containerized processes. Each new instruction in the Dockerfile creates an immutable, hashed layer; this is what makes Docker builds fast and efficient. As long as the build context (files and directories that are involved in the build) don't change, the layer won't be recreated. This concept will play a crucial role in optimizing our Dockerfiles.

Layers are also reusable, in a sense. Images that depend on the same base layers are able to use pre existing ones instead of recreating them, which can also be used to speed up building images that share a common base.

Key Components of Dockerfile Efficiency

There are three concepts to learn that, when combined, will help you build better optimized container images.

Context

The build context is what files and directories Docker takes into consideration when building an image. Changes to these files could trigger one or more layers to be rebuilt, so understanding and minimizing the build context is crucial for efficiency.

Intelligent Layer Ordering

When it comes to the instructions inside your Dockerfile, order matters. By carefully crafting what files are built into what layers, it's possible to make significant increases in build speed by avoiding unnecessary cache invalidations.

Multi-Stage Builds

By taking advantage of build stages, we can segment each Dockerfile into separate parts, and only target and rebuild those parts that we need. These are also key to reducing the size of the final image - a necessary step in improving deployment speed across your infrastructure.

Reducing Context Size

When you run a docker build command, you specify the context to build it in:

An Absolute Path:

docker build -t my-image /var/lib

Current Directory:

docker build -t my-image . (notice the dot (.) at the end - this points to our current directory)

A Sub-Directory

docker build -t my-image src/

The one requirement is that the directory must contain ALL the files needed to compile the final image. It's not possible to specify two separate contexts. It's easy to accidentally include a large swathe of files gigabytes in size, that then need to be transferred to the Docker daemon, and factor into the caching algorithm. Take a simple node app for example. Oftentimes you'll have the node_modules folder, tests folder, etc. next to your src folder, but the necessary package.json in the root that needs to be included in the build. How can we avoid copying this extra context so that our builds can do a proper fresh npm install?

.dockerignore To The Rescue

We're in luck. Much like .gitignore, Docker has its own flavor, aptly titled .dockerignore. And much like its git counterpart, .dockerignore allows us to exclude files or directories that match the patterns from the context. By adding node_modules, we can exclude it both from the context and the base layer, reducing spin up time AND our final image size (we can do much more purposeful, production-only npm installs inside the Dockerfile itself).

Q: Why shouldn't I just copy my locally-built artifacts into the image and be done with it?

This is a great question and something we often see. Why not just do, i.e. npm build locally and copy .dist/ into the image as a single layer? Or, if coming from Go or Rust, why not just build my binary locally and copy it into my Dockerfile in a single step? While that may work just fine for you, ultimately there are a few issues with this approach.

The benefit of using containers is that your applications are atomic. That means your container image …contains… everything that your application needs, and is recreated when changes are made, instead of modified in place. This guarantees that when you build your application, it works the same for you as when someone else on your team builds it.

When you copy in your node_modules folder or binaries directly into your image, you're removing the guarantee that it was built in the same way, and will work the same on someone else's machine.

For example, if someone on your team uses Linux (amd64), and another team member uses a Mac with an ARM processor (arm64), your node_modules may not be the same! When copied into the container, the person building with Mac may encounter runtime errors or other oddities, breaking the guarantees that come when the image is built in a predefined environment - like our Dockerfile.

This also extends to CI/CD, where we may have much less control over the servers doing our builds. Even differences between versions of the same operating system and architecture can lead to issues that are difficult to track down, and cost your team time and energy.

Minimizing Cache Invalidation With Layer Organization

Each layer is built on the previous layer, so invalidating layers higher up in your Dockerfile have rippling effects, causing each subsequent layer to also be invalidated. Therefore, it's critical to put frequently modified layers toward the bottom of your Dockerfile, and time-intensive layers (like installing dependencies) at the top. Let's see how to achieve this with our example node application.

FROM node:slim WORKDIR /app # First, copy in our dependency lists. # Our package files aren't modified often, and we # want to avoid doing an intensive npm install every time # we build this image. COPY package.json package-lock.json ./ RUN npm install # Now, we copy in the src folder, where our code changes # frequently between builds. Now, modifying our code won't # invalidate package.json, whereas if we copied in the entire # context initially, any file change would have invalidated that layer. COPY ./src ./src RUN npm run build CMD ["node", "./dist/index.js"]

With this order, we've managed to minimize the impact of code changes to not trigger a reevaluation of dependencies every build. This alone is a huge time saver. Depending on the tools and languages you're working with, you may be able to identify other places where you can take advantage of image layering to reduce build times.

Focus Images With Multi-Stage Builds

With multi-stage builds, we can reuse a common base image to build other images off of, in the same Dockerfile. What's the advantage of doing this, you ask?

First, we're able to utilize a common stage for installing our dependencies and building, i.e., a binary of our application. From there, we can copy out only the absolutely necessary compiled pieces into our final image, shrinking size. If you're building a compiled binary in a language like Rust or Go, you can install all the build dependencies in the first stage, compile your static binary, and simply copy the binary into a FROM scratch (no base image) container that only has that single binary in it for the ultimate minimal size.

Second, we can create other targets within our Dockerfile that utilize that same base. Imagine a target for local development that has extra debugging tools built into it. Or, you could have a 'test' stage that builds a container explicitly for integration testing.

Let's take a look at how to enhance our earlier node image with multi-stage builds.

# Notice the 'as base'- this is how we set up multi stage builds. # This base stage will set up common settings for all our other images. FROM node:slim as base WORKDIR /app COPY package.json package-lock.json ./ # Stage 2 - time to cache our dependencies # notice how we're built from base, meaning we get everything # from that stage as our base layer. FROM base as dependencies RUN npm install # Stage 3 - builder # builds our artifacts for the final image FROM dependencies AS test # this can be tailored based on your directory structure COPY . . RUN npm run build # Stage 4 - tests # run linters, setup and tests FROM dependencies AS test # this can be tailored based on your directory structure COPY . . RUN npm test # Stage 4 - production # We're working off the base image only, so as to have a minimal # image size. FROM base AS release # copy node_modules from the dependencies stage COPY --from=dependencies /app/node_modules ./node_modules # copy build artifacts COPY --from=builder /app/dist /app/dist EXPOSE 80 CMD ["node", "./dist/index.js"]

Whew! That's a lot, but hopefully it's clear how you can reuse earlier stages to build and tailor your images for dev, testing, and production, focusing each with exactly what it needs to run correctly, while maintaining a small image size.

Level Up With Organization-Wide Layer Caching, feat. Depot.dev

Going beyond our mere local machines, we can take our image layers to the cloud and share them with our other team members and CI/CD. After all, if I've already built the image layers, why should everyone else have to? Especially when it comes to rapid iteration in CI/CD, sharing layers can be the difference between a 10 minute build and a 1 minute build.

One of the premier tools out there for organization-wide image building is Depot.dev. The Depot CLI works the same as docker build, so you can simply replace docker with depot to get blazing fast image builds that get faster the more your team uses it. You get some other things we didn't touch on much in this article, like cross-architecture compilation (which can be painfully slow) and native integrations into tools to speed up deployments.

Faster Docker Deployments on Cycle.io

The ultimate goal with building a Docker image is to deploy and run it somewhere. At Cycle.io, we excel at running your containers on your own infrastructure, whether that be AWS, GCP, On-Prem hardware, or more (at the same time!). We recently announced our integration with Depot.dev, so you can take advantage of your cloud-cached layers directly in the Cycle platform.

Combining our tips on reducing image build times and size with Depot.dev, going from code change to world-spanning production applications has never been faster. Throw in Cycle's zero-downtime rainbow deployments, and your SREs will be sleeping peacefully the night before a production push.

We've put a lot of thought into DevOps so you don't have to. That's why today, Cycle is the leading LowOps platform for building platforms.

💡 Interested in trying the Cycle platform? Create your account today! Want to drop in and have a chat with the Cycle team? We'd love to have you join our public Cycle Slack community!