It's not an understatement to say Dockerfiles are the underpinning of modern DevOps. Writing a simple Dockerfile that 'works' is relatively straightforward, but there are several tricks and tips that could significantly improve the build speed and efficiency of your container images.
If your current Dockerfiles copy in multi gigabyte contexts, reinstall dependencies on every build, or use only a single stage, we need to talk.
A Dockerfile is essentially a 'script', with some instructions on how to put together an image that can be used to launch containerized processes. Each new instruction in the Dockerfile creates an immutable, hashed layer; this is what makes Docker builds fast and efficient. As long as the build context (files and directories that are involved in the build) don't change, the layer won't be recreated. This concept will play a crucial role in optimizing our Dockerfiles.
Layers are also reusable, in a sense. Images that depend on the same base layers are able to use pre existing ones instead of recreating them, which can also be used to speed up building images that share a common base.
There are three concepts to learn that, when combined, will help you build better optimized container images.
The build context is what files and directories Docker takes into consideration when building an image. Changes to these files could trigger one or more layers to be rebuilt, so understanding and minimizing the build context is crucial for efficiency.
When it comes to the instructions inside your Dockerfile, order matters. By carefully crafting what files are built into what layers, it's possible to make significant increases in build speed by avoiding unnecessary cache invalidations.
By taking advantage of build stages, we can segment each Dockerfile into separate parts, and only target and rebuild those parts that we need. These are also key to reducing the size of the final image - a necessary step in improving deployment speed across your infrastructure.
When you run a docker build
command, you specify the context to build it
in:
docker build -t my-image /var/lib
docker build -t my-image .
(notice the dot (.) at the end -
this points to our current directory)
docker build -t my-image src/
The one requirement is that the directory must contain ALL the files needed
to compile the final image. It's not possible to specify two separate
contexts. It's easy to accidentally include a large swathe of files
gigabytes in size, that then need to be transferred to the Docker daemon,
and factor into the caching algorithm. Take a simple node app for example.
Oftentimes you'll have the node_modules
folder, tests folder, etc. next to
your src
folder, but the necessary package.json
in the root that needs
to be included in the build. How can we avoid copying this extra context so
that our builds can do a proper fresh npm install
?
We're in luck. Much like .gitignore
, Docker has its own flavor, aptly
titled .dockerignore
. And much like its git counterpart, .dockerignore
allows us to exclude files or directories that match the patterns from the
context. By adding node_modules
, we can exclude it both from the context
and the base layer, reducing spin up time AND our final image size (we can
do much more purposeful, production-only npm installs inside the Dockerfile
itself).
This is a great question and something we often see. Why not just do, i.e.
npm build
locally and copy .dist/
into the image as a single layer? Or,
if coming from Go or Rust, why not just build my binary locally and copy it
into my Dockerfile in a single step? While that may work just fine for you,
ultimately there are a few issues with this approach.
The benefit of using containers is that your applications are atomic. That means your container image …contains… everything that your application needs, and is recreated when changes are made, instead of modified in place. This guarantees that when you build your application, it works the same for you as when someone else on your team builds it.
When you copy in your node_modules
folder or binaries directly into your
image, you're removing the guarantee that it was built in the same way, and
will work the same on someone else's machine.
For example, if someone on your team uses Linux (amd64), and another team
member uses a Mac with an ARM processor (arm64), your node_modules
may not
be the same! When copied into the container, the person building with Mac
may encounter runtime errors or other oddities, breaking the guarantees that
come when the image is built in a predefined environment - like our
Dockerfile.
This also extends to CI/CD, where we may have much less control over the servers doing our builds. Even differences between versions of the same operating system and architecture can lead to issues that are difficult to track down, and cost your team time and energy.
Each layer is built on the previous layer, so invalidating layers higher up
in your Dockerfile have rippling effects, causing each subsequent layer to
also be invalidated. Therefore, it's critical to put frequently modified
layers toward the bottom of your Dockerfile, and time-intensive layers (like
installing dependencies) at the top. Let's see how to achieve this with our
example node
application.
FROM node:slim
WORKDIR /app
# First, copy in our dependency lists.
# Our package files aren't modified often, and we
# want to avoid doing an intensive npm install every time
# we build this image.
COPY package.json package-lock.json ./
RUN npm install
# Now, we copy in the src folder, where our code changes
# frequently between builds. Now, modifying our code won't
# invalidate package.json, whereas if we copied in the entire
# context initially, any file change would have invalidated that layer.
COPY ./src ./src
RUN npm run build
CMD ["node", "./dist/index.js"]
With this order, we've managed to minimize the impact of code changes to not trigger a reevaluation of dependencies every build. This alone is a huge time saver. Depending on the tools and languages you're working with, you may be able to identify other places where you can take advantage of image layering to reduce build times.
With multi-stage builds, we can reuse a common base image to build other images off of, in the same Dockerfile. What's the advantage of doing this, you ask?
First, we're able to utilize a common stage for installing our dependencies and building, i.e., a binary of our application. From there, we can copy out only the absolutely necessary compiled pieces into our final image, shrinking size. If you're building a compiled binary in a language like Rust or Go, you can install all the build dependencies in the first stage, compile your static binary, and simply copy the binary into a FROM scratch
(no base image) container that only has that single binary in it for the ultimate minimal size.
Second, we can create other targets within our Dockerfile that utilize that same base. Imagine a target for local development that has extra debugging tools built into it. Or, you could have a 'test' stage that builds a container explicitly for integration testing.
Let's take a look at how to enhance our earlier node image with multi-stage builds.
# Notice the 'as base'- this is how we set up multi stage builds.
# This base stage will set up common settings for all our other images.
FROM node:slim as base
WORKDIR /app
COPY package.json package-lock.json ./
# Stage 2 - time to cache our dependencies
# notice how we're built from base, meaning we get everything
# from that stage as our base layer.
FROM base as dependencies
RUN npm install
# Stage 3 - builder
# builds our artifacts for the final image
FROM dependencies AS test
# this can be tailored based on your directory structure
COPY . .
RUN npm run build
# Stage 4 - tests
# run linters, setup and tests
FROM dependencies AS test
# this can be tailored based on your directory structure
COPY . .
RUN npm test
# Stage 4 - production
# We're working off the base image only, so as to have a minimal
# image size.
FROM base AS release
# copy node_modules from the dependencies stage
COPY --from=dependencies /app/node_modules ./node_modules
# copy build artifacts
COPY --from=builder /app/dist /app/dist
EXPOSE 80
CMD ["node", "./dist/index.js"]
Whew! That's a lot, but hopefully it's clear how you can reuse earlier stages to build and tailor your images for dev, testing, and production, focusing each with exactly what it needs to run correctly, while maintaining a small image size.
Going beyond our mere local machines, we can take our image layers to the cloud and share them with our other team members and CI/CD. After all, if I've already built the image layers, why should everyone else have to? Especially when it comes to rapid iteration in CI/CD, sharing layers can be the difference between a 10 minute build and a 1 minute build.
One of the premier tools out there for organization-wide image building is Depot.dev. The Depot CLI works the same as docker build, so you can simply replace docker
with depot
to get blazing fast image builds that get faster the more your team uses it. You get some other things we didn't touch on much in this article, like cross-architecture compilation (which can be painfully slow) and native integrations into tools to speed up deployments.
The ultimate goal with building a Docker image is to deploy and run it somewhere. At Cycle.io, we excel at running your containers on your own infrastructure, whether that be AWS, GCP, On-Prem hardware, or more (at the same time!). We recently announced our integration with Depot.dev, so you can take advantage of your cloud-cached layers directly in the Cycle platform.
Combining our tips on reducing image build times and size with Depot.dev, going from code change to world-spanning production applications has never been faster. Throw in Cycle's zero-downtime rainbow deployments, and your SREs will be sleeping peacefully the night before a production push.
We've put a lot of thought into DevOps so you don't have to. That's why today, Cycle is the leading LowOps platform for building platforms.
💡 Interested in trying the Cycle platform? Create your account today! Want to drop in and have a chat with the Cycle team? We'd love to have you join our public Cycle Slack community!