Content from Introducing Containers
Last updated on 2024-09-24 | Edit this page
Estimated time: 20 minutes
Overview
Questions
- What are containers, and why might they be useful to me?
Objectives
- Show how software depending on other software leads to configuration management problems.
- Identify the problems that software installation can pose for research.
- Explain the advantages of containerization.
- Explain how using containers can solve software configuration problems
Learning about Docker Containers
The Australian Research Data Commons has produced a short introductory video about Docker containers that covers many of the points below. Watch it before or after you go through this section to reinforce your understanding!
How can software containers help your research? Australian Research Data Commons, 2021. DOI: 10.5281/zenodo.5091260
Scientific Software Challenges
What’s Your Experience?
Take a minute to think about challenges that you have experienced in using scientific software (or software in general!) for your research. Then, share with your neighbors and try to come up with a list of common gripes or challenges.
You may have come up with some of the following:
- you want to use software that doesn’t exist for the operating system (Mac, Windows, Linux) you’d prefer.
- you struggle with installing a software tool because you have to install a number of other dependencies first. Those dependencies, in turn, require other things, and so on (i.e. combinatoric explosion).
- the software you’re setting up involves many dependencies and only a subset of all possible versions of those dependencies actually works as desired.
- you’re not actually sure what version of the software you’re using because the install process was so circuitous.
- you and a colleague are using the same software but get different results because you have installed different versions and/or are using different operating systems.
- you installed everything correctly on your computer but now need to install it on a colleague’s computer/campus computing cluster/etc.
- you’ve written a package for other people to use but a lot of your users frequently have trouble with installation.
- you need to reproduce a research project from a former colleague and the software used was on a system you no longer have access to.
A lot of these characteristics boil down to one fact: the main program you want to use likely depends on many, many, different other programs (including the operating system!), creating a very complex, and often fragile system. One change or missing piece may stop the whole thing from working or break something that was already running. It’s no surprise that this situation is sometimes informally termed “dependency hell”.
Software and Science
Again, take a minute to think about how the software challenges we’ve discussed could impact (or have impacted!) the quality of your work. Share your thoughts with your neighbors. What can go wrong if our software doesn’t work?
Unsurprisingly, software installation and configuration challenges can have negative consequences for research:
- you can’t use a specific tool at all, because it’s not available or installable.
- you can’t reproduce your results because you’re not sure what tools you’re actually using.
- you can’t access extra/newer resources because you’re not able to replicate your software set up.
- others cannot validate and/or build upon your work because they cannot recreate your system’s unique configuration.
Thankfully there are ways to get underneath (a lot of) this mess: containers to the rescue! Containers provide a way to package up software dependencies and access to resources such as files and communications networks in a uniform manner.
What is a Container? What is Docker?
Docker is a tool that allows you to build what are called “containers.” It’s not the only tool that can create containers, but is the one we’ve chosen for this workshop. But what is a container?
To understand containers, let’s first talk briefly about your computer.
Your computer has some standard pieces that allow it to work – often what’s called the hardware. One of these pieces is the CPU or processor; another is the amount of memory or RAM that your computer can use to store information temporarily while running programs; another is the hard drive, which can store information over the long-term. All these pieces work together to do the “computing” of a computer, but we don’t see them because they’re hidden from view (usually).
Instead, what we see is our desktop, program windows, different folders, and files. These all live in what’s called the filesystem. Everything on your computer – programs, pictures, documents, the operating system itself – lives somewhere in the filesystem.
NOW, imagine you want to install some new software but don’t want to take the chance of making a mess of your existing system by installing a bunch of additional stuff (libraries/dependencies/etc.). You don’t want to buy a whole new computer because it’s too expensive. What if, instead, you could have another independent filesystem and running operating system that you could access from your main computer, and that is actually stored within this existing computer?
Or, imagine you have two tools you want to use in your groundbreaking
research on cat memes: PurrLOLing
, a tool that does
AMAZINGLY well at predicting the best text for a meme based on the cat
species and WhiskerSpot
, the only tool available for
identifying cat species from images. You want to send cat pictures to
WhiskerSpot
, and then send the species output to
PurrLOLing
. But there’s a problem: PurrLOLing
only works on Ubuntu and WhiskerSpot
is only supported for
OpenSUSE so you can’t have them on the same system! Again, we really
want another filesystem (or two) on our computer that we could use to
chain together WhiskerSpot
and PurrLOLing
in a
“pipeline”…
Container systems, like Docker, are special programs on your computer that make it possible! The term “container” can be usefully considered with reference to shipping containers. Before shipping containers were developed, packing and unpacking cargo ships was time consuming and error prone, with high potential for different clients’ goods to become mixed up. Just like shipping containers keep things together that should stay together, software containers standardize the description and creation of a complete software system: you can drop a container into any computer with the container software installed (the ‘container host’), and it should “just work”.
Virtualization
Containers are an example of what’s called virtualization – having a second “virtual” computer running and accessible from a main or host computer. Another example of virtualization are virtual machines or VMs. A virtual machine typically contains a whole copy of an operating system in addition to its own filesystem and has to get booted up in the same way a computer would. A container is considered a lightweight version of a virtual machine; underneath, the container is (usually) using the Linux kernel and simply has some flavour of Linux + the filesystem inside.
One final term: while the container is an alternative filesystem layer that you can access and run from your computer, the container image is the ‘recipe’ or template for a container. The container image has all the required information to start up a running copy of the container. A running container tends to be transient and can be started and shut down. The container image is more long-lived, as a definition for the container. You could think of the container image like a cookie cutter – it can be used to create multiple copies of the same shape (or container) and is relatively unchanging, where cookies come and go. If you want a different type of container (cookie) you need a different container image (cookie cutter).
Putting the Pieces Together
Think back to some of the challenges we described at the beginning. The many layers of scientific software installations make it hard to install and re-install scientific software – which ultimately, hinders reliability and reproducibility.
But now, think about what a container is – a self-contained, complete, separate computer filesystem. What advantages are there if you put your scientific software tools into containers?
This solves several of our problems:
- documentation – there is a clear record of what software and software dependencies were used, from bottom to top.
- portability – the container can be used on any computer that has Docker installed – it doesn’t matter whether the computer is Mac, Windows or Linux-based.
- reproducibility – you can use the exact same software and environment on your computer and on other resources (like a large-scale computing cluster).
- configurability – containers can be sized to take advantage of more resources (memory, CPU, etc.) on large systems (clusters) or less, depending on the circumstances.
The rest of this workshop will show you how to download and run containers from pre-existing container images on your own computer, and how to create and share your own container images.
Use cases for containers
Now that we have discussed a little bit about containers – what they do and the issues they attempt to address – you may be able to think of a few potential use cases in your area of work. Some examples of common use cases for containers in a research context include:
- Using containers solely on your own computer to use a specific software tool or to test out a tool (possibly to avoid a difficult and complex installation process, to save your time or to avoid dependency hell).
- Creating a
Dockerfile
that generates a container image with software that you specify installed, then sharing a container image generated using this Dockerfile with your collaborators for use on their computers or a remote computing resource (e.g. cloud-based or HPC system). - Archiving the container images so you can repeat analysis/modelling using the same software and configuration in the future – capturing your workflow.
Key Points
- Almost all software depends on other software components to function, but these components have independent evolutionary paths.
- Small environments that contain only the software that is needed for a given task are easier to replicate and maintain.
- Critical systems that cannot be upgraded, due to cost, difficulty, etc. need to be reproduced on newer systems in a maintainable and self-documented way.
- Virtualization allows multiple environments to run on a single computer.
- Containerization improves upon the virtualization of whole computers by allowing efficient management of the host computer’s memory and storage resources.
- Containers are built from ‘recipes’ that define the required set of software components and the instructions necessary to build/install them within a container image.
- Docker is just one software platform that can create containers and the resources they use.
Content from The Docker Hub
Last updated on 2024-09-24 | Edit this page
Estimated time: 20 minutes
One of the reasons why Docker is useful is that there is a lot of content (container images) already prepared to be used. Each container image represents a complete software installation that you can use and access without any extra work!
Overview
Questions
- What is the Docker Hub, and why is it useful?
Objectives
- Explore the Docker Hub webpage for a popular Docker container image.
- Find the list of tags for a particular Docker container image.
- Identify the three components of a container image’s identifier.
A lot of these images are hosted in the Docker Hub.
Introducing the Docker Hub
The Docker Hub is an online repository of container images, a vast number of which are publicly available. A large number of the container images are curated by the developers of the software that they package. Also, many commonly used pieces of software that have been containerized into images are officially endorsed, which means that you can trust the container images to have been checked for functionality, stability, and that they don’t contain malware.
Docker can be used without connecting to the Docker Hub
Note that while the Docker Hub is well integrated into Docker functionality, the Docker Hub is certainly not required for all types of use of Docker containers. For example, some organizations may run container infrastructure that is entirely disconnected from the Internet.
Exploring an example Docker Hub page
As an example of a Docker Hub page, let’s explore the page for the official Python language container images. Open your web browser to https://hub.docker.com
We will need to register for the later stages of this course, so feel free to do so now if you are not already.
In the search bar type “python” and hit enter.
You should see a list of images related to python. We can immediately get a feel of the sheer number of container images hosted here. There is upwards from 10,000 images related to python alone.
There is also some useful information that can help us choose the image that we want, which we will point out in a moment.
For now, lets go to the top result, the python container image (which is endorsed by the Docker team) to see what is on a typical Docker Hub software page.
The top-left provides information about the name, short description, popularity (i.e., more than a billion downloads in the case of this container image), and endorsements.
The top-right provides the command to pull this container image to your computer.
The main body of the page contains many used headings, such as:
- Which tags (i.e., container image versions) are supported.
- Summary information about where to get help, which computer architectures are supported, etc.
- A longer description of the container image.
- Examples of how to use the container image.
- The license that applies.
The “How to use the image” section of most container images’ pages will provide examples that are likely to cover your intended use of the container image.
Container Image versions and owners
A single Docker Hub page can have many different versions of
container images, based on the version of the software inside. These
versions are indicated by “tags”. When referring to the specific version
of a container image by its tag, you use a colon, :
, like
this:
CONTAINER_IMAGE_NAME:TAG
If we click the version tag for alpine3.18
of this
image, Docker Hub shows it as python:alpine3.18
The default tag (which is used if you don’t specify one) is called
latest
.
This container image is a “Docker Official Image”, which means that the docker team mantains it and guarantees it. However, it’s equally common to use container images that have been produced by individual owners or organizations. Container images that you create and upload to Docker Hub would fall into this category, as would the container images maintained by organizations like ContinuumIO (the folks who develop the Anaconda Python environment) or community groups like rocker, a group that builds community R container images.
In these cases, the “owner”, that is the group- or individual- that manage the container image is part of the image name, and it is given in the format:
OWNER/CONTAINER_IMAGE_NAME:TAG
as can be seen in these examples
Repositories
The technical name for the contents of a Docker Hub page is a “repository.” The tag indicates the specific version of the container image that you’d like to use from a particular repository. So a slightly more accurate version of the above example is:
OWNER/REPOSITORY:TAG
Choosing Container Images on Docker Hub
Note that anyone can create an account on Docker Hub and share container images there, so it’s important to exercise caution when choosing a container image on Docker Hub. These are some indicators that a container image on Docker Hub is consistently maintained, functional and secure:
- The container image is updated regularly.
- The container image associated with a well established company, community, or other group that is well-known. Docker helps with badges to mark official images, verified publishers and sponsored open source software.
- There is a Dockerfile or other listing of what has been installed to the container image.
- The container image page has documentation on how to use the container image.
- The container image is used by the wider community. The graph on the right at the search page can help with this.
If a container image is never updated, created by a random person, and does not have a lot of metadata, it is probably worth skipping over. Even if such a container image is secure, it is not reproducible and not a dependable way to run research computations.
Other sources of Container Images
Although many of the containers made for docker are hosted in the Docker Hub, there are other places where these can be distributed, including (but not limited to):
- GHCR from Github.
- Quay from Red Hat.
- Artifact Registry from Google.
- GLR from GitLab.
- ECR from Amazon.
- ACR from Azuere, Microsoft.
Key Points
- The Docker Hub is an online repository of container images.
- Many Docker Hub container images are public, and may be officially endorsed.
- Each Docker Hub page about a container image provides structured information and subheadings
- Most Docker Hub pages about container images contain sections that provide examples of how to use those container images.
- Many Docker Hub container images have multiple versions, indicated by tags.
- The naming convention for Docker container images is:
OWNER/CONTAINER_IMAGE_NAME:TAG
Content from Docker Desktop
Last updated on 2024-09-24 | Edit this page
Estimated time: 20 minutes
This episode is meant to be demonstrative, that is, you do not need to follow along.
Overview
Questions
- What is Docker Desktop?
- What can it be used for?
- Why can’t it replace the cli?
Objectives
- Show Docker Desktop and its components.
- Understand what images and containers are.
- Visualize the process of image aquisition, container execution and where it ends.
- Understand the ephimeral nature of containers.
- Have a glimpse at containers that allow interaction.
- Understand the importance of cleaning up in docker.
- Understand the limitations of Docker Desktop.
We will present the Docker Desktop dashboard, as it will be useful to understand key concepts of docker, such as images and containers. However, it is important to note that while it is mostly is free, some features are offered at a premium. Also, it is not fully functional on all operating systems; it can produce conflicts with the docker engine on Linux, for example.
Getting images
Setting up docker in Windows or Mac will have installed Docker Desktop by default. If you open the application you will likely see something like this:
You’ll notice that the panel on the left has a tab for ‘Images’ and another for ‘Containers’. These will be the focus for the episode, and we will ignore most other features.
On the top you’ll also find a search icon, which links to Docker Hub, and allows us to search for the images we saw in the previous episode directly from here.
*Note that there are two tabs, one for containers and one for images. Make sure that you select the right tab when you search!
In Docker Desktop you can either search by name only,
or include the owner. You can then select the tag from the dropdown menu.
Once you find the image you were looking for, you can either download it (pull), or directly run it.
We’ll start by downloading the latest versions of
hello-world
, docker/getting-started
and
alpine
.
Inspecting images
The ‘Images’ tab on the left panel will show all the images in your system, so you will be able to see them here.
From this tab we can see some information about the images on disk, and run them, but we can also inspect the images. Clicking on the image will open a window with information on how the image is built, and examine its packages and vulnerabilities.
The hello-world
image does not seem too interesting from
here. If you go to DockerHub you’ll find links to the Github site, where
you’ll see that the image is not as simple as it looks. Nevertheless,
this is a very nice and quick way to explore an image.
If we now inspect the docker/getting-started
image, for
example, we can see that it detects some vulnerabilities:
You can even further inspect the vulnerable layers by looking at the command
This all looks rather scary, and it is important that we are careful with the images that we download. It is therefore quite useful to be able to analize them like this. This image, in particular, is from a verified publisher (Docker Inc. no less!), and has been downloaded over 10M times, so the container is likely not maliicous.
Running containers
The images that we just downloaded are immutable files, they are snapshots of an environment, distributed to be used as templates to create ‘containers’. The containers are, essentially, images being run. They are executions of the image, and because they are running, they are no longer ‘static’.
Let’s run the hello-world
image by either clicking the
‘Run’ button in the ‘Actions’ column, from the Images tab.
A prompt will ask you to confirm ‘Run’ or modify some optional settings.
The optional settings allow you to modify the container’s name, so that you can easily identify it afterwards. Lets add an appropriate name and confirm with the ‘Run’ button.
You will likely be taken to a ‘Logs’ tab inside the container that you just ran. The logs show the output of this particular image, ‘Hello from Docker!’ among other things.
If you look carefully, the ‘Containers’ tab on the left is highlighted. We are looking at a container now, not an image, and so we were re-located.
Exploring the ‘Inspect’ tab will show us some information, but for now we are more interested in what the ‘Terminal’ and ‘Stats’ tabs have to say. They both seem to indicate that we need to run or start the container.
Indeed, if we look carefully, we will find an ‘Exited (0)’ status under the container name, and a ‘Start’ button near the top-right corner. However, if we click on that button we will see the output duplicated in the logs, and the ‘Exited (0)’ status again.
If we go back to the images tab and run the image again (let’s not bother giving it a name this time), we’ll see that the same thing hapens. We get the ‘Hello from Docker!’, and the container exits.
The nature of most containers is ephimeral. They are meant to execute a process, and when the process is completed, they exit. We can confirm this by clicking on the ‘Containers’ tab on the left. This will exit the container inspection and show us all the containers. Both containers in the list have a status ‘Exited’.
You may be wondering why if we have only run the
hello-world
image, you can see there are two containers.
One of the containers we named, and the other has some gibberish as a
name (Docker generated this randomly). As mentioned before, the
image is used as a template, and as many containers as
we want can be created from it. If we go back to the ‘Images’ tab and
run hello world
again, we’ll see a new container
appear.
Interacting with containers
Not all containers are as short lived as the one we’ve been using. If
we run the docker/getting-started
image that we had pulled
earlier, we will see something different happen. You can immediately
notice the status under the container name is ‘RUNNING’ now. nal
inside the container.
The ‘Logs’ tab is not too informative, but the ‘Inspect’ tab already shows more information. A process called ‘nginx’ is running. The ‘Terminal’ and ‘Stats’ tabs changed the most. Since the container is still running, the stats get shown, and we are able to launch a termi
Before trying to do anything in the terminal, let’s look at the container list by clicking on the ‘Containers’ tab on the left. You’ll see the green icon of the container indicating that it is still live, and indication of how long it’s been running for.
Clicking on the container name again will take us back to the ‘Logs’
tab in the container. Lets try and interact with the terminal inside the
container. If you print the working directory with pwd
you’ll get the base directory: /
. You can also list the
contents with ls
, and the docker-entrypoint
files are a dead giveaway that we are inside the container. At this
point this container is very much like a VM. We can modify things, like
for example making a directory with mkdir
, and see it has
been created with ls
again.
But we can do more than that, we can install things. For example,
you’ll notice that htop
is not installed. Since the
getting-started
image is based on Alpine, we can install it
using apk add htop
, and we can now use it.
The container does not need to stay alive forever, and you can see that there is a ‘stop’ icon on the top right. If we stop the container, we get a familiar empty tab in ‘Terminal’ and ‘Stats’. The ‘Containers’ tab on the left will also show the container status as ‘Exited’
Now lets say I want to run the getting-started
image
again. So I go to the ‘Images’ tab, and click run. Now lets go to the
‘Terminal’ tab, and try and find our directory with ls
. The
directory is not there. We’d also installed htop
. so lets
have a go at running it. Not there either.
When we re-ran the image, we created a new
container. The new container is created from the template saved
in the image, and so ‘our’ changes have banished. This becomes very
clear when we go back to the ‘Containers’ tab on the left. We can see
that the first container we created from the
getting-started
image is there, next to the new container
(which is still running, by the way).
Reviving containers
We can get the old container running again, although this is rarely something we’d want to do. In Docker Desktop, all we need to do is click on the ‘Start’ button from the ‘Containers’ list. The terminal will appear empty, because it is a new session, but you will even be able to ‘recall’ commands.
Cleaning up
The hello-world
image was nice and useful to test docker
was working, but it is now rather useless. If I want to delete it, the
‘Images’ tab on the left has a convenient bin icon to do so. Clicking on
it will prompt you for confirmation, but it will fail.
You’ll probably notice that the status of the image is ‘In use’. That seems strange though, given that all the containers from that image excited immediately.
Lets have a look at the ‘Containers’ tab. It shows a list of 5
containers now. Three of them came from the hello-world
image, and are stopped. Two of them came from the
getting-started
image, and are running.
We’ve only been using Docker for 15 minutes though! You may see how this can become a problem… Particularly so because we were a bit sloppy and did not name the containers. Let’s try and get rid of the containers then. We can conveniently select them all with the tickbox at the top, and an option to ‘Delete’ shows up. Clicking on it will prompt for confirmation, and we can go ahead and accept.
All our containers are now gone. Forever.
Warning: You have to be careful here, this action deleted even the containers that were running. You can filter the containers before you select them “all”.
On the up-side, the ‘Images’ tab shows both the
hello-world
and the getting-started
images as
‘Unused’ now. For docker, an image is ‘In use’ as long as at least one
container has been created from it. We have just deleted all the
containers created from either of these images. This tells Docker that
they are no longer being used, and can therefore be safely deleted.
Limitations - Why not Docker Desktop?
We have seen many of the neat and functional bits of Docker Desktop, and it can be mighty appealing, particularly if you lean towards the use of graphical interfaces. However, we’ve not touched on its weaknesses. We’ll just need to point at one to feel the need to throw everything overboard.
Let’s go ahead and run the only image we have already pulled,
alpine
.
That was fast, and uneventful. Not even a single output to the ‘Logs’. No way to open a terminal inside Alpine.
Just to be clear though, this Docker image does contain the whole Alpine OS. In Docker Desktop, however, there is no way to interact with it.
Let’s try something different. There’s a program called
cowsay
that lets you print messages as if a cow was saying
them. Searching for that image shows that there is one by
beatrixxx32
with a reasonable number of downloads.
So lets pull that image and run it.
We do get a cow this time, but it is not saying anything. But it does not know what to say. Going back to the cowsay image search, you may notice that in ‘Usage’ the command line asks for “your message”. We are not using a command though, we just clicked run. Maybe we missed something in the optional settings!
No, it does not seem like it. No matter what we do, we cannot make the cow say anything from here.
Are the alpine and cowsay images useless? No, definitely not. However, they are expecting some sort of input or command, which we cannot provide from Docker Desktop.
This is the case for most images, and so Docker Desktop (as it is now) cannot really be used for much more than as a nice dashboard.
In the next episode, we will use docker from the command line, and all of the advantages it brings will become aparent.
Key Points
- Docker Desktop is a great dashboard that allows us to understand and visualize the lifecycle of images and containers.
- Images are snapshots of an environment, easily distributable and ready to be used as templates for containers.
- Containers are executions of the images, often with configuration added on top, and usually meant for single use.
- Running a container usually implies creating a new copy, so it is important to clean up regularly.
- Docker Desktop could potentially be all you need to use if you only consume images out of the box.
- However, it is very limited in most cases (even for consumers), and rarely allows the user to configure and interact with the containers adequately.
Content from Using the Docker command line
Last updated on 2024-09-24 | Edit this page
Estimated time: 198 minutes
So far, we have seen how easily you can find images and run containers in Docker Desktop. However, we have also seen Docker Desktop’s limitations, why couldn’t we do anything useful with our Alpine container?
Overview
Questions
- What actually happens when I run a container?
- How can I control the behaviour of containers?
Objectives
- Learn the lifecycle of Docker containers
- Learn how to use the Docker command line to perform tasks we learned to do in Docker Desktop
- Learn to perform more advanced Docker tasks, only possible in the command line
To understand this, we need to learn about the lifecycle of Docker containers.
Container lifecycle
So, what happens when we run a Docker container?
Startup
When we run a Docker container, the first set of actions occur during the “startup” phase:
- First, Docker checks if you already have the container image, and downloads it if you don’t (known as a pull)
- Next, the container image is used to create the container and start the container’s runtime environment
Default Command
Once the container has started, it executes its default command.
The default command is specified by the image’s creator and is formed of two parts:
- Entrypoint: The base command for the container (may be omitted, default: ““)
- Command: Parameters for the base command (or, if entrypoint omitted, the whole default command)
When the container runs, Entrypoint and Command are concatenated to form the default command which is then executed.
Shutdown
Whether a container shuts down depends on the default command. A container will not shutdown until its default process has finished.
Therefore, a command the runs to completion will allow a container to shut down, but a service that runs indefinitely will not.
Examples
Let’s take a look at some examples.
Example 1
In example 1, an Entrypoint of “echo” is given along with a Command of “hello world” to the container.
This works in exactly the same way as running the following in a standard terminal:
$ echo hello world
Example 2
In example 2, an Entrypoint of “sleep” is given along with a Command of “infinity” to the container.
This causes the container to run the sleep command indefinitely, meaning that the default command never finishes, and the container stays alive without shutting down.
This example may seem artificial, but it is common to have containers that run indefinitely. A common example would be a webserver which renders a website, running until stopped or changed.
Return to Alpine
So, how does this explain why the Alpine container seemed so uninteresting?
First, let’s apply what we have learned about the lifecycle of a Docker container. Let’s find out what the Entrypoint and Command for the Alpine container are. To do this we will use the Docker command line interface.
Docker command line syntax
The general syntax for the Docker command line explained in this diagram:
There is a base command, always “docker”. A specialising command which specifies the type of the object you wish to act on. The action command which select the actual process that will beformed and finally the name of the object you wish to act on. You may also need to add extra arguments and switches.
Inspecting images
Applying this, we want to inspect the
image of Alpine to find out the
Entrypoint and Command. We can do this by running the command below,
where we have specified an extra argument -f
which is a
string specifying the output format of the command (without it we would
get a large JSON dump describing the container).
OUTPUT
Entrypoint: [] Command: [/bin/sh]
So, we can see from this command that Alpine specifies:
- Entrypoint: null (unspecified)
- Command: /bin/sh
Therefore, the default command for Alpine is: “/bin/sh”.
What is wrong with this? Shouldn’t this just open a terminal prompt?
Not in this case! Running a command in a Docker container is not quite like running it in your terminal.
When you use your terminal, you are able to type commands and interact with the underlying system. By default however, Docker does not allow containers to do this. Without special measures, a Docker container has no way of recieving interactions and instead sends a empty signal to the sh process. sh is programmed to exit when it recieves an empty signal and therefore finishes its process.
$ /bin/sh # you might expect Docker to be running the default command like this
$ /bin/sh \< /dev/null # but actually, it is like this!
This is why the Alpine contatainer seemed so uninteresting, in Docker Desktop there was no way for us to provide an interactive session to the container! It is possible to change this behavior, but not using Docker Desktop. Instead, we now need to dig in to the more powerful and flexible Docker Command Line Interface.
This behavior can also be seen on other shell-like containers, for example those with “bash” or “python” as default commands.
Running a container
So, how does the Docker command line address this? Can we use the Docker CLI to interact with our containers?
Let’s now focus on running Docker containers using the CLI.
Again, we will use the docker base command, this time specialising this command with ‘run’ and the name of an image; to tell Docker that we want to start up the specified container image.
You will likely see output similar to the below:
OUTPUT
> \##Unable to find image 'alpine:latest' locally
> latest: Pulling from library/alpine
> f56be85fc22e: Already exists
> Digest: sha256:124c7d2707904eea7431fffe91522a01e5a861a624ee31d03372cc1d138a3126
> Status: Downloaded newer image for alpine:latest
This output shows docker recognising that it does not have the Alpine image locally, and beginning the process of downloading and storing the image.
But is something missing? We don’t see any other output? Let’s run the command again.
OUTPUT
>
This time there is no output at all, with the image already downloaded it is able to run without downloading. But why is no output produced? This is for the same reason as when we tried running Alpine in Docker Desktop (as explained above), the Docker run command is not sending any commands, and the default command exits automatically.
Why don’t we finally try and make Alpine do something?
Overriding the default command
To get the Alpine image to do something interesting; we will want to overwrite the default command.
This can be done in the Docker CLI, by specifying the desired command, after the docker run command and image name.
For example, if we wanted the Alpine container to print “hello world”, we could use the standard Unix command, echo.
OUTPUT
> hello world
Great! We have finally managed to do something interesting with the Alpine image.
This is a big step, but how can we evolve from here? What if we tried to generalise our printing command?
We can do this by modifying the entrypoint portion of the default command (remember: the entrypoint is the portion of the default command which prefixes the command that comes after the image name).
This way we can have just the text we want to print after the image name.
We do this by adding the ‘–entrypoint’ switch to our Docker command, followed by the command we want to become the entrypoint.
BASH
$ docker run --entrypoint echo alpine "hello world"
$ docker run --entrypoint echo alpine "lovely day"
OUTPUT
> hello world
> lovely day
Wonderful! But this needn’t be limited to just echo, let’s use this approach to get some more information about the container, by printing the contents of a special file called ‘os-release’.
OUTPUT
NAME="Alpine Linux"
ID=alpine
VERSION\_ID=3.17.3
PRETTY\_NAME="Alpine Linux v3.17"
HOME\_URL="[https://alpinelinux.org/](https://alpinelinux.org/)"
BUG\_REPORT\_URL="[https://gitlab.alpinelinux.org/alpine/aports/-/issues](https://gitlab.alpinelinux.org/alpine/aports/-/issues)"
Make the container use cat as a base command
Can you make the Alpine container use cat as its base command? Can you print the os-release information using this approach?
Print the users present in the Alpine container
Can you make the Alpine container print the contents of the /etc/passwd file? (which contains details of the registered users of a Linux system)
We can see from the output that we are indeed running in an Alpine container and now have a powerful way to execute single commands in a Docker container.
Running a container interactively
But what if we wanted to run a series of commands? What if we wanted to be able to explore within a container?
We will look now at how we can create interactive sessions for using Docker containers.
Luckily this is straightforward, we simply need to modify our standard Docker command to include two new flags.
We need to add:
- ‘-i’ to enable interactivity
- ‘-t’ to enable a TTY connection (the ability to type things!)
Let’s try this with the Alpine image:
OUTPUT
/ #
You are now inside the container! Let’s try some familiar commands.
OUTPUT
NAME="Alpine Linux"
ID=alpine
VERSION\_ID=3.17.3
PRETTY\_NAME="Alpine Linux v3.17"
HOME\_URL="[https://alpinelinux.org/](https://alpinelinux.org/)"
BUG\_REPORT\_URL="[https://gitlab.alpinelinux.org/alpine/aports/-/issues](https://gitlab.alpinelinux.org/alpine/aports/-/issues)"
You can see we get the same output as earlier, but this time the container remains live and we remain inside the container.
You can now exit the container by running:
Using containers in this interactive way can be useful, but we have to be careful as (by default) our changes will not persist.
You can see this for yourself with the following process
BASH
$ docker run -i -t alpine
/# touch hello-world
/# echo "hi everyone" > hello-world
/# cat hello-world
/# exit
OUTPUT
> hi everyone
OUTPUT
> cat: can't open 'hello-world': No such file or directory
You can see that, while we were still interacting with the container, our file continued to exist. Once we left, it was gone!
Getting files into the container
Now that we are figuring out how to work with containers, let’s imagine a situation. We are working in a data science lab, and our supervisor has asked us to run some tools that they have developed; which they are distributing in Docker.
The first tool, called Random, generates a series of random numbers. The second tool, named Datamash, performs simple statistical analyses on single columns of data. Let’s figure out how to use these tools and some of the hurdles we will have to overcome.
First, let’s download the tools:
Great! But… how can we check what has happened? How can we know we have the images? Let’s check with:
OUTPUT
REPOSITORY TAG IMAGE ID CREATED SIZE
ghcr.io/uomresearchit/datamash latest a0143c45d868 5 minutes ago 49.7MB
ghcr.io/uomresearchit/random latest b39bc463abfd 6 weeks ago 7.04MB
You should see output similar to the above, with the images we just pulled listed.
Now we have the tools let’s try running them. We’ll start by generating our data:
OUTPUT
[...]
[Lot's of numbers!]
[...]
22307
21278
28211
21151
9209
We can see this produces a lot of output. Let’s write it to file instead:
We now have a datafile we can use on our other tool. Our supervisor has told us a couple of helpful things about this container, apparently all you have to do is give the container the filename you want to analyse.
Let’s try that!
OUTPUT
Traceback (most recent call last):
File "//mean.py", line 7, in <module>
with open(args.filename) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'local\_data.txt'
We gave the right filename… So why hasn’t this worked? The answer is simple, but has profound implications.
The file we specified is present on our machine, but not inside the container! The container has its own, completely isolated filesystem.
This can really help, the container’s file are stored in a nice, safe, isolated place perfectly tailored for the application. But then, how can we give our file to the container, in order for it to be analysed?
Using mounts and volumes
Docker’s answer to this is to allow you to selectively expose files and folders on your machine to the container you are running.
Docker calls these bind mounts and they take the following form:
Using this command we specify the type of mount we want (more on this later!), the name of a local file (or directory) to mount to the container and the path and filename which we would like inside the container.
$PWD
You will notice we used an envirnoment variable (denoted by $) to
specify the location of our local file. This is because the docker
command needs the full path to a file to mount correctly, it wont work
if you do ./local_file.txt
!
Let’s try this out with our datamash container:
BASH
$ docker run --mount type=bind,source="$PWD"/local\_data.txt,target=/local\_data.txt ghcr.io/uomresearchit/datamash local\_data.txt
OUTPUT
Data file data.txt, selected.
Number of entries: 100.0
Sum: 1681522.0
Mean: 16815.22
Wonderful! The container works now. But that syntax is very clunky… Can it be shortened? The answer is yes, but it requires care - let’s investigate.
Try out the following command:
It works just like before! But there is a hidden danger here.
So far we have only discussed bind mounts. These are a simple mechanism for binding bits of your local filesystem into containers. There is another method for getting files into container, called volumes.
Volumes are the isolated, persistent filesystems for containers, they can be shared between containers and are very useful for retaining information between container runs.
However, they are not particularly useful for getting files from host to container!
Can you spot the difference?
OUTPUT
Traceback (most recent call last):
File "//mean.py", line 7, in <module>
with open(args.filename) as f:
IsADirectoryError: [Errno 21] Is a directory: 'data.txt'
Why has this not worked? The answer is that Docker has created a volume! Not mounted our file! This is because the Docker command requires absolute paths, if a path doesn’t evaluate correclty then it assumes you are naming a new volume.
See this bit of the Docker man page:
\-v|--volume[\=[[HOST-DIR:]CONTAINER-DIR[:OPTIONS]]]
Create a bind mount. If you specify, -v /HOST-DIR:/CONTAINER-DIR, Docker
bind mounts /HOST-DIR in the host to /CONTAINER-DIR in the Docker
container. If 'HOST-DIR' is omitted, Docker automatically creates the new
volume on the host.
For this reason, we would recommend using the full (clunky) syntax of the mount command.
Running services in Docker
This impermanence can be a challenge! What if we
Long running processes
docker container with python analysis built in and simple upload portal
Docker Exec
Copying files
docker cp
Docker ps
Managing the system
docker stats
Sum up
docker run –rm –name mydocker –entrypoint thing –mount
in compose
Conclusion
We have learned about the lifecycle of Docker containers, what happens when they are started and how their behaviour is controled. We have learned to run containers in the Docker command line interace and how to interact with them directly.
Key Points
- asdf
- asdf
- asdf
Content from Creating Your Own Container Images
Last updated on 2024-09-24 | Edit this page
Estimated time: 35 minutes
There are lots of reasons why you might want to create your own Docker container image.
Overview
Questions
- How can I make my own Docker container images?
- How do I document the ‘recipe’ for a Docker container image?
Objectives
- Explain the purpose of a
Dockerfile
and show some simple examples. - Demonstrate how to build a Docker container image from a
Dockerfile
. - Compare the steps of creating a container image interactively versus
a
Dockerfile
. - Create an installation strategy for a container image.
- Demonstrate how to upload (‘push’) your container images to the Docker Hub.
- Describe the significance of the Docker Hub naming scheme.
- You can’t find a container image with all the tools you need on Docker Hub.
- You want to have a container image to “archive” all the specific software versions you ran for a project.
- You want to share your workflow with someone else.
Interactive installation
Before creating a reproducible installation, let’s experiment with
installing software inside a container. Start a container from the
alpine
container image we used before, interactively:
Because this is a basic container, there’s a lot of things not
installed – for example, python3
.
OUTPUT
sh: python3: not found
Inside the container, we can run commands to install Python 3. The
Alpine version of Linux has a installation tool called apk
that we can use to install Python 3.
We can test our installation by running a Python command:
Once Python is installed, we can add Python packages using the pip package installer:
Exercise: Searching for Help
Can you find instructions for installing R on Alpine Linux? Do they work?
Once we exit, these changes are not saved to a new container image by
default. There is a command that will “snapshot” our changes, but
building container images this way is not easily reproducible. Instead,
we’re going to take what we’ve learned from this interactive
installation and create our container image from a reproducible recipe,
known as a Dockerfile
.
If you haven’t already, exit out of the interactively running container.
Put installation instructions in a Dockerfile
A Dockerfile
is a plain text file with keywords and
commands that can be used to create a new container image.
From your shell, go to the folder you downloaded at the start of the lesson and print out the Dockerfile inside:
OUTPUT
FROM <EXISTING IMAGE>
RUN <INSTALL CMDS FROM SHELL>
RUN <INSTALL CMDS FROM SHELL>
CMD <CMD TO RUN BY DEFAULT>
Let’s break this file down:
- The first line,
FROM
, indicates which container image we’re starting with. It is the “base” container image we are going to start from. - The next two lines
RUN
, will indicate installation commands we want to run. These are the same commands that we used interactively above. - The last line,
CMD
, indicates the default command we want a container based on this container image to run, if no other command is provided. It is recommended to provideCMD
in exec-form (see theCMD
section of the Dockerfile documentation for more details). It is written as a list which contains the executable to run as its first element, optionally followed by any arguments as subsequent elements. The list is enclosed in square brackets ([]
) and its elements are double-quoted ("
) strings which are separated by commas. For example,CMD ["ls", "-lF", "--color", "/etc"]
would translate tols -lF --color /etc
.
shell-form and exec-form for CMD
Another way to specify the parameter for the CMD
instruction is the shell-form. Here you type the command as
you would call it from the command line. Docker then silently runs this
command in the image’s standard shell. CMD cat /etc/passwd
is equivalent to CMD ["/bin/sh", "-c", "cat /etc/passwd"]
.
We recommend to prefer the more explicit exec-form because we
will be able to create more flexible container image command options and
make sure complex commands are unambiguous in this format.
Exercise: Take a Guess
Do you have any ideas about what we should use to fill in the sample Dockerfile to replicate the installation we did above?
Based on our experience above, edit the Dockerfile
(in
your text editor of choice) to look like this:
FROM alpine
RUN apk add --update python3 py3-pip python3-dev
RUN pip install cython
CMD ["python3", "--version"]
The recipe provided by the Dockerfile
shown in the
solution to the preceding exercise will use Alpine Linux as the base
container image, add Python 3 and the Cython library, and set a default
command to request Python 3 to report its version information.
Create a new Docker image
So far, we only have a text file named Dockerfile
– we
do not yet have a container image. We want Docker to take this
Dockerfile
, run the installation commands contained within
it, and then save the resulting container as a new container image. To
do this we will use the docker image build
command.
We have to provide docker image build
with two pieces of
information:
- the location of the
Dockerfile
- the name of the new container image. Remember the naming scheme from
before? You should name your new image with your Docker Hub username and
a name for the container image, like this:
USERNAME/CONTAINER_IMAGE_NAME
.
All together, the build command that you should run on your computer, will have a similar structure to this:
The -t
option names the container image; the final dot
indicates that the Dockerfile
is in our current
directory.
For example, if my user name was alice
and I wanted to
call my container image alpine-python
, I would use this
command:
Build Context
Notice that the final input to docker image build
isn’t
the Dockerfile – it’s a directory! In the command above, we’ve used the
current working directory (.
) of the shell as the final
input to the docker image build
command. This option
provides what is called the build context to Docker – if there
are files being copied into the built container image more details in the next episode they’re
assumed to be in this location. Docker expects to see a Dockerfile in
the build context also (unless you tell it to look elsewhere).
Even if it won’t need all of the files in the build context directory, Docker does “load” them before starting to build, which means that it’s a good idea to have only what you need for the container image in a build context directory, as we’ve done in this example.
Exercise: Review!
Think back to earlier. What command can you run to check if your container image was created successfully? (Hint: what command shows the container images on your computer?)
We didn’t specify a tag for our container image name. What tag did Docker automatically use?
What command will run a container based on the container image you’ve created? What should happen by default if you run such a container? Can you make it do something different, like print “hello world”?
To see your new image, run
docker image ls
. You should see the name of your new container image under the “REPOSITORY” heading.In the output of
docker image ls
, you can see that Docker has automatically used thelatest
tag for our new container image.We want to use
docker container run
to run a container based on a container image.
The following command should run a container and print out our default message, the version of Python:
To run a container based on our container image and print out “Hello world” instead:
While it may not look like you have achieved much, you have already effected the combination of a lightweight Linux operating system with your specification to run a given command that can operate reliably on macOS, Microsoft Windows, Linux and on the cloud!
Boring but important notes about installation
There are a lot of choices when it comes to installing software – sometimes too many! Here are some things to consider when creating your own container image:
- Start smart, or, don’t install everything from scratch! If you’re using Python as your main tool, start with a Python container image. Same with R. We’ve used Alpine Linux as an example in this lesson, but it’s generally not a good container image to start with for initial development and experimentation because it is a less common distribution of Linux; using Ubuntu, Debian and CentOS are all good options for scientific software installations. The program you’re using might recommend a particular distribution of Linux, and if so, it may be useful to start with a container image for that distribution.
- How big? How much software do you really need to install? When you have a choice, lean towards using smaller starting container images and installing only what’s needed for your software, as a bigger container image means longer download times to use.
-
Know (or Google) your Linux. Different
distributions of Linux often have distinct sets of tools for installing
software. The
apk
command we used above is the software package installer for Alpine Linux. The installers for various common Linux distributions are listed below:- Ubuntu:
apt
orapt-get
- Debian:
deb
- CentOS:
yum
Most common software installations are available to be installed via these tools. A web search for “install X on Y Linux” is usually a good start for common software installation tasks; if something isn’t available via the Linux distribution’s installation tools, try the options below.
- Ubuntu:
-
Use what you know. You’ve probably used commands
like
pip
orinstall.packages()
before on your own computer – these will also work to install things in container images (if the basic scripting language is installed). - README. Many scientific software tools have a README or installation instructions that lay out how to install software. You want to look for instructions for Linux. If the install instructions include options like those suggested above, try those first.
In general, a good strategy for installing software is:
- Make a list of what you want to install.
- Look for pre-existing container images.
- Read through instructions for software you’ll need to install.
- Try installing everything interactively in your base container – take notes!
- From your interactive installation, create a
Dockerfile
and then try to build the container image from that.
Share your new container image on Docker Hub
Container images that you release publicly can be stored on the
Docker Hub for free. If you name your container image as described
above, with your Docker Hub username, all you need to do is run the
opposite of docker image pull
–
docker image push
.
Make sure to substitute the full name of your container image!
In a web browser, open https://hub.docker.com, and on your user page you should now see your container image listed, for anyone to use or build on.
Logging In
Technically, you have to be logged into Docker on your computer for
this to work. Usually it happens by default, but if
docker image push
doesn’t work for you, run
docker login
first, enter your Docker Hub username and
password, and then try docker image push
again.
What’s in a name? (again)
You don’t have to name your containers images using the
USERNAME/CONTAINER_IMAGE_NAME:TAG
naming scheme. On your
own computer, you can call container images whatever you want, and refer
to them by the names you choose. It’s only when you want to share a
container image that it needs the correct naming format.
You can rename container images using the
docker image tag
command. For example, imagine someone
named Alice has been working on a workflow container image and called it
workflow-test
on her own computer. She now wants to share
it in her alice
Docker Hub account with the name
workflow-complete
and a tag of v1
. Her
docker image tag
command would look like this:
She could then push the re-named container image to Docker Hub, using
docker image push alice/workflow-complete:v1
Key Points
-
Dockerfile
s specify what is within Docker container images. - The
docker image build
command is used to build a container image from aDockerfile
. - You can share your Docker container images through the Docker Hub so that others can create Docker containers from your container images.
Content from Creating More Complex Container Images
Last updated on 2024-09-24 | Edit this page
Estimated time: 60 minutes
In order to create and use your own container images, you may need more information than our previous example. You may want to use files from outside the container, that are not included within the container image, either by copying the files into the container image, or by making them visible within a running container from their existing location on your host system. You may also want to learn a little bit about how to install software within a running container or a container image. This episode will look at these advanced aspects of running a container or building a container image. Note that the examples will get gradually more and more complex – most day-to-day use of containers and container images can be accomplished using the first 1–2 sections on this page.
Overview
Questions
- How can I make more complex container images?
Objectives
- Explain how you can include files within Docker container images when you build them.
- Explain how you can access files on the Docker host from your Docker containers.
Using scripts and files from outside the container
In your shell, change to the sum
folder in the
docker-intro
folder and look at the files inside.
This folder has both a Dockerfile
and a Python script
called sum.py
. Let’s say we wanted to try running the
script using a container based on our recently created
alpine-python
container image.
Running containers
What command would we use to run Python from the
alpine-python
container?
If we try running the container and Python script, what happens?
OUTPUT
python3: can't open file 'sum.py': [Errno 2] No such file or directory
No such file or directory
What does the error message mean? Why might the Python inside the container not be able to find or open our script?
The problem here is that the container and its filesystem is separate from our host computer’s filesystem. When the container runs, it can’t see anything outside itself, including any of the files on our computer. In order to use Python (inside the container) and our script (outside the container, on our host computer), we need to create a link between the directory on our computer and the container.
This link is called a “mount” and is what happens automatically when a USB drive or other external hard drive gets connected to a computer – you can see the contents appear as if they were on your computer.
We can create a mount between our computer and the running container
by using an additional option to docker container run
.
We’ll also use the variable ${PWD}
which will substitute in
our current working directory. The option will look like this
--mount type=bind,source=${PWD},target=/temp
What this means is: make my current working directory (on the host
computer) – the source – visible within the container that is
about to be started, and inside this container, name the directory
/temp
– the target.
Types of mounts
You will notice that we set the mount type=bind
, there
are other types of mount that can be used in Docker
(e.g. volume
and tmpfs
). We do not cover other
types of mounts or the differences between these mount types in the
course as it is more of an advanced topic. You can find more information
on the different mount types in the Docker
documentation.
Let’s try running the command now:
BASH
$ docker container run --mount type=bind,source=${PWD},target=/temp alice/alpine-python python3 sum.py
But we get the same error!
OUTPUT
python3: can't open file 'sum.py': [Errno 2] No such file or directory
This final piece is a bit tricky – we really have to remember to put
ourselves inside the container. Where is the sum.py
file?
It’s in the directory that’s been mapped to /temp
– so we
need to include that in the path to the script. This command should give
us what we need:
BASH
$ docker container run --mount type=bind,source=${PWD},target=/temp alice/alpine-python python3 /temp/sum.py
Note that if we create any files in the /temp
directory
while the container is running, these files will appear on our host
filesystem in the original directory and will stay there even when the
container stops.
Other Commonly Used Docker Run Flags
Docker run has many other useful flags to alter its function. A
couple that are commonly used include -w
and
-u
.
The --workdir
/-w
flag sets the working
directory a.k.a. runs the command being executed inside the directory
specified. For example, the following code would run the
pwd
command in a container started from the latest ubuntu
image in the /home/alice
directory and print
/home/alice
. If the directory doesn’t exist in the image it
will create it.
docker container run -w /home/alice/ ubuntu pwd
The --user
/-u
flag lets you specify the
username you would like to run the container as. This is helpful if
you’d like to write files to a mounted folder and not write them as
root
but rather your own user identity and group. A common
example of the -u
flag is
--user $(id -u):$(id -g)
which will fetch the current
user’s ID and group and run the container as that user.
Exercise: Explore the script
What happens if you use the docker container run
command
above and put numbers after the script name?
This script comes from the Python Wiki and is set to add all numbers that are passed to it as arguments.
Exercise: Checking the options
Our Docker command has gotten much longer! Can you go through each piece of the Docker command above and explain what it does? How would you characterize the key components of a Docker command?
Here’s a breakdown of each piece of the command above
-
docker container run
: use Docker to run a container -
--mount type=bind,source=${PWD},target=/temp
: connect my current working directory (${PWD}
) as a folder inside the container called/temp
-
alice/alpine-python
: name of the container image to use to run the container -
python3 /temp/sum.py
: what commands to run in the container
More generally, every Docker command will have the form:
docker [action] [docker options] [docker container image] [command to run inside]
Exercise: Interactive jobs
Try using the directory mount option but run the container interactively. Can you find the folder that’s connected to your host computer? What’s inside?
The docker command to run the container interactively is:
Once inside, you should be able to navigate to the /temp
folder and see that’s contents are the same as the files on your host
computer:
Mounting a directory can be very useful when you want to run the software inside your container on many different input files. In other situations, you may want to save or archive an authoritative version of your data by adding it to the container image permanently. That’s what we will cover next.
Including your scripts and data within a container image
Our next project will be to add our own files to a container image –
something you might want to do if you’re sharing a finished analysis or
just want to have an archived copy of your entire analysis including the
data. Let’s assume that we’ve finished with our sum.py
script and want to add it to the container image itself.
In your shell, you should still be in the sum
folder in
the docker-intro
folder.
Let’s add a new line to the Dockerfile
we’ve been using
so far to create a copy of sum.py
. We can do so by using
the COPY
keyword.
COPY sum.py /home
This line will cause Docker to copy the file from your computer into the container’s filesystem. Let’s build the container image like before, but give it a different name:
The Importance of Command Order in a Dockerfile
When you run docker build
it executes the build in the
order specified in the Dockerfile
. This order is important
for rebuilding and you typically will want to put your RUN
commands before your COPY
commands.
Docker builds the layers of commands in order. This becomes important
when you need to rebuild container images. If you change layers later in
the Dockerfile
and rebuild the container image, Docker
doesn’t need to rebuild the earlier layers but will instead used a
stored (called “cached”) version of those layers.
For example, in an instance where you wanted to copy
multiply.py
into the container image instead of
sum.py
. If the COPY
line came before the
RUN
line, it would need to rebuild the whole image. If the
COPY
line came second then it would use the cached
RUN
layer from the previous build and then only rebuild the
COPY
layer.
Exercise: Did it work?
Can you remember how to run a container interactively? Try that with this one. Once inside, try running the Python script.
This COPY
keyword can be used to place your own scripts
or own data into a container image that you want to publish or use as a
record. Note that it’s not necessarily a good idea to put your scripts
inside the container image if you’re constantly changing or editing
them. Then, referencing the scripts from outside the container is a good
idea, as we did in the previous section. You also want to think
carefully about size – if you run docker image ls
you’ll
see the size of each container image all the way on the right of the
screen. The bigger your container image becomes, the harder it will be
to easily download.
Callout
## Security warning Login credentials including passwords, tokens, secure access tokens or other secrets must never be stored in a container. If secrets are stored, they are at high risk to be found and exploited when made public.
Copying alternatives
Another trick for getting your own files into a container image is by
using the RUN
keyword and downloading the files from the
internet. For example, if your code is in a GitHub repository, you could
include this statement in your Dockerfile to download the latest version
every time you build the container image:
RUN git clone https://github.com/alice/mycode
Similarly, the wget
command can be used to download any
file publicly available on the internet:
RUN wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.0/ncbi-blast-2.10.0+-x64-linux.tar.gz
Note that the above RUN
examples depend on commands
(git
and wget
respectively) that must be
available within your container: Linux distributions such as Alpine may
require you to install such commands before using them within
RUN
statements.
More fancy Dockerfile
options (optional, for
presentation or as exercises)
We can expand on the example above to make our container image even more “automatic”. Here are some ideas:
Make the sum.py
script run automatically
FROM alpine
RUN apk add --update python3 py3-pip python3-dev
COPY sum.py /home
# Run the sum.py script as the default command
CMD ["python3", "/home/sum.py"]
Build and test it:
You’ll notice that you can run the container without arguments just
fine, resulting in sum = 0
, but this is boring. Supplying
arguments however doesn’t work:
results in
OUTPUT
docker: Error response from daemon: OCI runtime create failed:
container_linux.go:349: starting container process caused "exec:
\"10\": executable file not found in $PATH": unknown.
This is because the arguments 10 11 12
are interpreted
as a command that replaces the default command given by
CMD ["python3", "/home/sum.py"]
in the image.
To achieve the goal of having a command that always runs
when a container is run from the container image and can be
passed the arguments given on the command line, use the keyword
ENTRYPOINT
in the Dockerfile
.
FROM alpine
COPY sum.py /home
RUN apk add --update python3 py3-pip python3-dev
# Run the sum.py script as the default command and
# allow people to enter arguments for it
ENTRYPOINT ["python3", "/home/sum.py"]
# Give default arguments, in case none are supplied on
# the command-line
CMD ["10", "11"]
Build and test it:
BASH
$ docker image build -t alpine-sum:v2 .
# Most of the time you are interested in the sum of 10 and 11:
$ docker container run alpine-sum:v2
# Sometimes you have more challenging calculations to do:
$ docker container run alpine-sum:v2 12 13 14
Overriding the ENTRYPOINT
Sometimes you don’t want to run the image’s ENTRYPOINT
.
For example if you have a specialized container image that does only
sums, but you need an interactive shell to examine the container:
will yield
OUTPUT
Please supply integer arguments
You need to override the ENTRYPOINT
statement in the
container image like so:
Add the sum.py
script to the PATH
so you
can run it directly:
FROM alpine
RUN apk add --update python3 py3-pip python3-dev
COPY sum.py /home
# set script permissions
RUN chmod +x /home/sum.py
# add /home folder to the PATH
ENV PATH /home:$PATH
Build and test it:
Best practices for writing Dockerfiles
Take a look at Nüst et al.’s “Ten simple rules
for writing Dockerfiles for reproducible data science” [1] for
some great examples of best practices to use when writing Dockerfiles.
The GitHub
repository associated with the paper also has a set of example
Dockerfile
s demonstrating how the rules highlighted by
the paper can be applied.
[1] Nüst D, Sochat V, Marwick B, Eglen SJ, Head T, et al. (2020) Ten simple rules for writing Dockerfiles for reproducible data science. PLOS Computational Biology 16(11): e1008316. https://doi.org/10.1371/journal.pcbi.1008316
Key Points
- Docker allows containers to read and write files from the Docker host.
- You can include files from your Docker host into your Docker
container images by using the
COPY
instruction in yourDockerfile
.