Docker Image Cookbook¶
Docker is, essentially, a way to send a self-contained computer called a container to another person. You define the software that goes into the container, and then anyone with Docker installed on their own computer (the “host”) can run your container and access the software inside without that sofware being installed on the host. This is an enormous advantage in distributed computing, where it can be difficult to ensure that software that your own software depends on (“dependencies”) are installed on the computers your code actually runs on.
To use Docker, you write a Dockerfile which tells Docker how to generate an image, which is a blueprint to construct a container. The Dockerfile is a list of instructions, such as shell commands or instructions for Docker to copy files from the build environment into the image. You then tell Docker to “build” the image from the Dockerfile.
For use with HTMap, you then upload this image to Docker Hub, where it can then be downloaded to execute nodes in an HTCondor pool. When your HTMap component lands on an execute node, HTCondor will download your image from Docker Hub and run your code inside it using HTMap.
The following sections describe, roughly in order of increasing complexity, different ways to build Docker images for use with HTMap. Each level of complexity is introduced to solve a more advanced dependency management problem. We recommend reading them in order until reach one that works for your dependencies (each section assumes knowledge of the previous sections).
More detailed information on how Dockerfiles work can be found in the Docker documentation itself This page only covers the bare minimum to get started with HTMap and Docker.
Attention
This recipe only covers using Docker for execute-side dependency management. You still need to install dependencies submit-side to launch your map in the first place!
Can I use HTMap’s default image?¶
HTMap’s default Docker image is htcondor/htmap-exec,
which is itself based on`continuumio/anaconda3 <https://hub.docker.com/r/continuumio/anaconda3/>`_.
It is based on Python 3 and has many useful packages pre-installed, such as numpy
, scipy
, and pandas
.
If your software only depends on packages included in the Anaconda distribution,
you can use HTMap’s default image and won’t need to create your own.
I depend on Python packages that aren’t in the Anaconda distribution¶
Attention
Before proceeding, install Docker on your computer and make an account on Docker Hub.
Let’s pretend that there’s a package called foobar
that your Python function depends on,
but isn’t part of the Anaconda distribution.
You will need to write your own Dockerfile to include this package in your Docker image.
Docker images are built in layers.
You always start a Dockerfile by stating which existing Docker image you’d like to use as your base layer.
A good choice is the same Anaconda image that HTMap uses as the default,
which comes with both the conda
package manager and the standard pip
.
Create a file named Dockerfile
and write this into it:
# Dockerfile
FROM continuumio/anaconda3:latest
ENV PATH=/opt/conda/bin/:${PATH}
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
Each line in the Dockerfile starts with a short, capitalized word which tells Docker what kind of build instruction it is.
FROM
means “start with this base image”.RUN
means “execute these shell commands in the container”.ARG
means “set build argument” - it acts like an environment variable that’s only set during the image build.
Lines that begin with a #
are comments in a Dockerfile.
The above lines say that we want to inherit from the image continuumio/anaconda3:latest
and build on top of it.
To be compatible with HTMap, we install htmap
via pip
.
We also set up a non-root user to do the execution, which is important for security.
Naming that user htmap
is arbitrary and has nothing to do with the htmap
package itself.
Now we need to tell Docker to run a shell command during the build to install foobar
by adding one more line to the bottom of the Dockerfile.
# Dockerfile
FROM continuumio/anaconda3:latest
ENV PATH=/opt/conda/bin/:${PATH}
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
# if foobar can be install via conda, use these lines
RUN conda install -y foobar \
&& conda clean -y --all
# if foobar can be installed via pip, use these lines
RUN pip install --no-cache-dir foobar
Some notes on the above:
If you need to install some packages via
conda
and some viapip
, you may need to use both types of lines.The
conda clean
and--no-cache-dir
instructions forconda
andpip
respectively just help keep the final Docker image as small as possible.The
-y
options for theconda
commands are the equivalent of answering “yes” to questions thatconda
asks on the command line, since the Docker build is non-interactive.A trailing
\
is a line continuation, so that first command is equivalent to runningconda install -y foobar && conda clean -y --all
, which is justbash
shorthand for “do both of these things”.
If you need install many packages, we recommend writing a requirements.txt
file (see the Python packaging docs)
and using
# Dockerfile
FROM continuumio/anaconda3:latest
ENV PATH=/opt/conda/bin/:${PATH}
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
The COPY
build instruction tells Docker to copy the file requirements.txt
(path relative to the build directory, explained below)
to the path requirements.txt
inside the image.
Relative paths inside the container work the same way they do in the shell; the image has a “working directory” that you can set using the WORKDIR
instruction.
Now that we have a Dockerfile, we can tell Docker to use it to build an image.
You’ll need to choose a descriptive name for the image, ideally something easy to type that’s related to your project (like qubits
or gene-analysis
).
Wherever you see <image>
below, insert that name.
You’ll also want to version your images by adding a “tag” after a :
, like <image>:v1
, <image>:v2
, <image>:v3
, etc.
You can use any string you’d like for the tag.
You’ll also need to know your Docker Hub username.
Wherever you see <username>
below, insert your username, and wherever you see <tag>
, insert your chosen version tag.
At the command line, in the directory that contains Dockerfile
, run
$ docker build -t <username>/<image>:<tag> .
You should see the output of the build process, hopefully ending with
Successfully tagged <username>/<image>:<tag>
<username>/<image>:<tag>
is the universal identifier for your image.
Now we need to push the image up to Docker Hub. Run
$ docker push <username>/<image>:<tag>
You’ll be asked for your credentials, and then all of the data for your image will be pushed up to Docker Hub. Once this is done, you should be able to use the image with HTMap. Change your HTMap settings (see DOCKER) to point to your new image, and launch your maps!
I don’t need most of the Anaconda distribution and want to use a lighter-weight base image¶
Instead of using the full Anaconda distribution, use a base Docker image that only includes the conda
package manager:
# Dockerfile
FROM continuumio/miniconda3:latest
ENV PATH=/opt/conda/bin/:${PATH}
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
From here, install your particular dependencies as above.
If you prefer to not use conda
, an even-barer-bones image could be produced from
# Dockerfile
FROM python:latest
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
We use python:latest
as our base image, so we don’t have conda
anymore.
I want to use a Python package that’s not on PyPI or Anaconda¶
Perhaps you’ve written a package yourself, or you want to use a package that is only available as source code on GitHub or a similar website.
There are multiple way to approach this, most of them roughly equivalent.
The first step for all of them is to write a setup.py
file for your package.
Some instructions for writing a setup.py
can be found here.
Once you have a working setup.py
, there are various ways to proceed, in reverse order of complexity:
Upload your package to PyPI and
pip install <package>
as in previous sections. This is the least flexible because you’ll need to upload to PyPI every time your update your package. If you don’t own the package, you shouldn’t do this!Upload your package to a publicly-accessible version control repository and use pip’s VCS support to install it (for example, if your package is on GitHub, something like
pip install git+https://github.com/<UserName>/<package>.git
).Use the
COPY
build instruction to copy your package directly into the Docker image, thenpip install <path/to/dir/containing/setup.py>
as aRUN
instruction. Note that your package will need to be in the Docker build context (see the docs for details).
I want to use a base image that doesn’t come with Python pre-installed¶
Say you have an existing Docker image that you need to use (maybe it includes non-Python dependencies that you aren’t sure how to install yourself).
You need to add Python to this image so that you can run your own code in it.
We recommend adding miniconda
to the image by adding these lines to your Dockerfile:
# Dockerfile
# see discussion below
FROM ubuntu:latest
RUN apt-get -y update \
&& apt-get install -y wget
# Docker build arguments
# use the Python version you need
# default to latest version of miniconda (which can then install any version of Python)
ARG PYTHON_VERSION=3.6
ARG MINICONDA_VERSION=latest
# set install location, and add the Python in that location to the PATH
ENV CONDA_DIR=/opt/conda
ENV PATH=${CONDA_DIR}/bin:${PATH}
# install miniconda and Python version specified in config
# (and ipython, which is nice for debugging inside the container)
RUN cd /tmp \
&& wget --quiet https://repo.continuum.io/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh \
&& bash Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -f -b -p $CONDA_DIR \
&& rm Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh \
&& conda install python=${PYTHON_VERSION} \
&& conda clean -y -all
After this, you can install HTMap and any other Python packages you need as in the preceeding sections.
Note that in this example we based the image on Ubuntu’s base image and installed wget
,
which we used to download the miniconda
installer.
Depending on your base image, you may need to use a different package manager
(for example, yum
) or different command-line file download tool (for example, curl
).
I want to build an image for use on the Open Science Grid¶
First, read through OSG’s Singularity documentation.
Based on that, our goal will be to build a Docker image and have OSG convert
it to a Singularity image that can be served by OSG.
The tricky part of this is that Docker’s ENV
instruction won’t carry over to
Singularity, which is the usual method of etting python3
on the PATH
inside the container.
To remedy this, we will create a special directory structure that Singularity
recognizes and uses to execute instructions with specified environments.
This is not a Singularity tutorial, so the simplest thing to do is copy the entire singularity.d directory that htmap-exec uses: https://github.com/htcondor/htmap/tree/master/htmap-exec/singularity.d
Anything you need to specify for your environment should be done in
singularity.d/env/90-environment.sh
.
This file will be “sourced” (run) when the image starts, before HTMap executes.
In your Dockerfile, you must copy this directory to the correct location inside the image:
# Dockerfile snippet
COPY <path/to/singularity.d> /.singularity.d
Note the path on the right: a hidden directory at the root of the filesystem.
This is just a Singularity convention.
The left path is just the location of the singularity.d
directory you made.
Note that if you FROM
an htmap-exec
image, this setup will already be embedded
in the image for you.