Content from Introducing Containers


Last updated on 2024-09-24 | Edit this page

Estimated time: 20 minutes

Overview

Questions

  • What are containers, and why might they be useful to me?

Objectives

  • Show how software depending on other software leads to configuration management problems.
  • Identify the problems that software installation can pose for research.
  • Explain the advantages of containerization.
  • Explain how using containers can solve software configuration problems

Learning about Docker Containers

The Australian Research Data Commons has produced a short introductory video about Docker containers that covers many of the points below. Watch it before or after you go through this section to reinforce your understanding!

How can software containers help your research? Australian Research Data Commons, 2021. DOI: 10.5281/zenodo.5091260

Scientific Software Challenges

What’s Your Experience?

Take a minute to think about challenges that you have experienced in using scientific software (or software in general!) for your research. Then, share with your neighbors and try to come up with a list of common gripes or challenges.

You may have come up with some of the following:

  • you want to use software that doesn’t exist for the operating system (Mac, Windows, Linux) you’d prefer.
  • you struggle with installing a software tool because you have to install a number of other dependencies first. Those dependencies, in turn, require other things, and so on (i.e. combinatoric explosion).
  • the software you’re setting up involves many dependencies and only a subset of all possible versions of those dependencies actually works as desired.
  • you’re not actually sure what version of the software you’re using because the install process was so circuitous.
  • you and a colleague are using the same software but get different results because you have installed different versions and/or are using different operating systems.
  • you installed everything correctly on your computer but now need to install it on a colleague’s computer/campus computing cluster/etc.
  • you’ve written a package for other people to use but a lot of your users frequently have trouble with installation.
  • you need to reproduce a research project from a former colleague and the software used was on a system you no longer have access to.

A lot of these characteristics boil down to one fact: the main program you want to use likely depends on many, many, different other programs (including the operating system!), creating a very complex, and often fragile system. One change or missing piece may stop the whole thing from working or break something that was already running. It’s no surprise that this situation is sometimes informally termed “dependency hell”.

Software and Science

Again, take a minute to think about how the software challenges we’ve discussed could impact (or have impacted!) the quality of your work. Share your thoughts with your neighbors. What can go wrong if our software doesn’t work?

Unsurprisingly, software installation and configuration challenges can have negative consequences for research:

  • you can’t use a specific tool at all, because it’s not available or installable.
  • you can’t reproduce your results because you’re not sure what tools you’re actually using.
  • you can’t access extra/newer resources because you’re not able to replicate your software set up.
  • others cannot validate and/or build upon your work because they cannot recreate your system’s unique configuration.

Thankfully there are ways to get underneath (a lot of) this mess: containers to the rescue! Containers provide a way to package up software dependencies and access to resources such as files and communications networks in a uniform manner.

What is a Container? What is Docker?


Docker is a tool that allows you to build what are called “containers.” It’s not the only tool that can create containers, but is the one we’ve chosen for this workshop. But what is a container?

To understand containers, let’s first talk briefly about your computer.

Your computer has some standard pieces that allow it to work – often what’s called the hardware. One of these pieces is the CPU or processor; another is the amount of memory or RAM that your computer can use to store information temporarily while running programs; another is the hard drive, which can store information over the long-term. All these pieces work together to do the “computing” of a computer, but we don’t see them because they’re hidden from view (usually).

Instead, what we see is our desktop, program windows, different folders, and files. These all live in what’s called the filesystem. Everything on your computer – programs, pictures, documents, the operating system itself – lives somewhere in the filesystem.

NOW, imagine you want to install some new software but don’t want to take the chance of making a mess of your existing system by installing a bunch of additional stuff (libraries/dependencies/etc.). You don’t want to buy a whole new computer because it’s too expensive. What if, instead, you could have another independent filesystem and running operating system that you could access from your main computer, and that is actually stored within this existing computer?

Or, imagine you have two tools you want to use in your groundbreaking research on cat memes: PurrLOLing, a tool that does AMAZINGLY well at predicting the best text for a meme based on the cat species and WhiskerSpot, the only tool available for identifying cat species from images. You want to send cat pictures to WhiskerSpot, and then send the species output to PurrLOLing. But there’s a problem: PurrLOLing only works on Ubuntu and WhiskerSpot is only supported for OpenSUSE so you can’t have them on the same system! Again, we really want another filesystem (or two) on our computer that we could use to chain together WhiskerSpot and PurrLOLing in a “pipeline”…

Container systems, like Docker, are special programs on your computer that make it possible! The term “container” can be usefully considered with reference to shipping containers. Before shipping containers were developed, packing and unpacking cargo ships was time consuming and error prone, with high potential for different clients’ goods to become mixed up. Just like shipping containers keep things together that should stay together, software containers standardize the description and creation of a complete software system: you can drop a container into any computer with the container software installed (the ‘container host’), and it should “just work”.

Virtualization

Containers are an example of what’s called virtualization – having a second “virtual” computer running and accessible from a main or host computer. Another example of virtualization are virtual machines or VMs. A virtual machine typically contains a whole copy of an operating system in addition to its own filesystem and has to get booted up in the same way a computer would. A container is considered a lightweight version of a virtual machine; underneath, the container is (usually) using the Linux kernel and simply has some flavour of Linux + the filesystem inside.

One final term: while the container is an alternative filesystem layer that you can access and run from your computer, the container image is the ‘recipe’ or template for a container. The container image has all the required information to start up a running copy of the container. A running container tends to be transient and can be started and shut down. The container image is more long-lived, as a definition for the container. You could think of the container image like a cookie cutter – it can be used to create multiple copies of the same shape (or container) and is relatively unchanging, where cookies come and go. If you want a different type of container (cookie) you need a different container image (cookie cutter).

Putting the Pieces Together


Think back to some of the challenges we described at the beginning. The many layers of scientific software installations make it hard to install and re-install scientific software – which ultimately, hinders reliability and reproducibility.

But now, think about what a container is – a self-contained, complete, separate computer filesystem. What advantages are there if you put your scientific software tools into containers?

This solves several of our problems:

  • documentation – there is a clear record of what software and software dependencies were used, from bottom to top.
  • portability – the container can be used on any computer that has Docker installed – it doesn’t matter whether the computer is Mac, Windows or Linux-based.
  • reproducibility – you can use the exact same software and environment on your computer and on other resources (like a large-scale computing cluster).
  • configurability – containers can be sized to take advantage of more resources (memory, CPU, etc.) on large systems (clusters) or less, depending on the circumstances.

The rest of this workshop will show you how to download and run containers from pre-existing container images on your own computer, and how to create and share your own container images.

Use cases for containers


Now that we have discussed a little bit about containers – what they do and the issues they attempt to address – you may be able to think of a few potential use cases in your area of work. Some examples of common use cases for containers in a research context include:

  • Using containers solely on your own computer to use a specific software tool or to test out a tool (possibly to avoid a difficult and complex installation process, to save your time or to avoid dependency hell).
  • Creating a Dockerfile that generates a container image with software that you specify installed, then sharing a container image generated using this Dockerfile with your collaborators for use on their computers or a remote computing resource (e.g. cloud-based or HPC system).
  • Archiving the container images so you can repeat analysis/modelling using the same software and configuration in the future – capturing your workflow.

Key Points

  • Almost all software depends on other software components to function, but these components have independent evolutionary paths.
  • Small environments that contain only the software that is needed for a given task are easier to replicate and maintain.
  • Critical systems that cannot be upgraded, due to cost, difficulty, etc. need to be reproduced on newer systems in a maintainable and self-documented way.
  • Virtualization allows multiple environments to run on a single computer.
  • Containerization improves upon the virtualization of whole computers by allowing efficient management of the host computer’s memory and storage resources.
  • Containers are built from ‘recipes’ that define the required set of software components and the instructions necessary to build/install them within a container image.
  • Docker is just one software platform that can create containers and the resources they use.

Content from The Docker Hub


Last updated on 2024-09-24 | Edit this page

Estimated time: 20 minutes

One of the reasons why Docker is useful is that there is a lot of content (container images) already prepared to be used. Each container image represents a complete software installation that you can use and access without any extra work!

Overview

Questions

  • What is the Docker Hub, and why is it useful?

Objectives

  • Explore the Docker Hub webpage for a popular Docker container image.
  • Find the list of tags for a particular Docker container image.
  • Identify the three components of a container image’s identifier.

A lot of these images are hosted in the Docker Hub.

Introducing the Docker Hub


The Docker Hub is an online repository of container images, a vast number of which are publicly available. A large number of the container images are curated by the developers of the software that they package. Also, many commonly used pieces of software that have been containerized into images are officially endorsed, which means that you can trust the container images to have been checked for functionality, stability, and that they don’t contain malware.

Docker can be used without connecting to the Docker Hub

Note that while the Docker Hub is well integrated into Docker functionality, the Docker Hub is certainly not required for all types of use of Docker containers. For example, some organizations may run container infrastructure that is entirely disconnected from the Internet.

Exploring an example Docker Hub page


As an example of a Docker Hub page, let’s explore the page for the official Python language container images. Open your web browser to https://hub.docker.com

Dockerhub_landing

We will need to register for the later stages of this course, so feel free to do so now if you are not already.

In the search bar type “python” and hit enter.

Dockerhub_search

You should see a list of images related to python. We can immediately get a feel of the sheer number of container images hosted here. There is upwards from 10,000 images related to python alone.

There is also some useful information that can help us choose the image that we want, which we will point out in a moment.

For now, lets go to the top result, the python container image (which is endorsed by the Docker team) to see what is on a typical Docker Hub software page.

Dockerhub_python

The top-left provides information about the name, short description, popularity (i.e., more than a billion downloads in the case of this container image), and endorsements.

The top-right provides the command to pull this container image to your computer.

The main body of the page contains many used headings, such as:

  • Which tags (i.e., container image versions) are supported.
  • Summary information about where to get help, which computer architectures are supported, etc.
  • A longer description of the container image.
  • Examples of how to use the container image.
  • The license that applies.

The “How to use the image” section of most container images’ pages will provide examples that are likely to cover your intended use of the container image.

Container Image versions and owners


A single Docker Hub page can have many different versions of container images, based on the version of the software inside. These versions are indicated by “tags”. When referring to the specific version of a container image by its tag, you use a colon, :, like this:

CONTAINER_IMAGE_NAME:TAG

If we click the version tag for alpine3.18 of this image, Docker Hub shows it as python:alpine3.18

Dockerhub_python

The default tag (which is used if you don’t specify one) is called latest.

This container image is a “Docker Official Image”, which means that the docker team mantains it and guarantees it. However, it’s equally common to use container images that have been produced by individual owners or organizations. Container images that you create and upload to Docker Hub would fall into this category, as would the container images maintained by organizations like ContinuumIO (the folks who develop the Anaconda Python environment) or community groups like rocker, a group that builds community R container images.

In these cases, the “owner”, that is the group- or individual- that manage the container image is part of the image name, and it is given in the format:

OWNER/CONTAINER_IMAGE_NAME:TAG

as can be seen in these examples


rocker/tidyverse:latest
mathworks/matlab
continuumio/anaconda3

Repositories

The technical name for the contents of a Docker Hub page is a “repository.” The tag indicates the specific version of the container image that you’d like to use from a particular repository. So a slightly more accurate version of the above example is:

OWNER/REPOSITORY:TAG

Choosing Container Images on Docker Hub


Note that anyone can create an account on Docker Hub and share container images there, so it’s important to exercise caution when choosing a container image on Docker Hub. These are some indicators that a container image on Docker Hub is consistently maintained, functional and secure:

  • The container image is updated regularly.
  • The container image associated with a well established company, community, or other group that is well-known. Docker helps with badges to mark official images, verified publishers and sponsored open source software.
  • There is a Dockerfile or other listing of what has been installed to the container image.
  • The container image page has documentation on how to use the container image.
  • The container image is used by the wider community. The graph on the right at the search page can help with this.

If a container image is never updated, created by a random person, and does not have a lot of metadata, it is probably worth skipping over. Even if such a container image is secure, it is not reproducible and not a dependable way to run research computations.

Other sources of Container Images


Although many of the containers made for docker are hosted in the Docker Hub, there are other places where these can be distributed, including (but not limited to):

Key Points

  • The Docker Hub is an online repository of container images.
  • Many Docker Hub container images are public, and may be officially endorsed.
  • Each Docker Hub page about a container image provides structured information and subheadings
  • Most Docker Hub pages about container images contain sections that provide examples of how to use those container images.
  • Many Docker Hub container images have multiple versions, indicated by tags.
  • The naming convention for Docker container images is: OWNER/CONTAINER_IMAGE_NAME:TAG

Content from Docker Desktop


Last updated on 2024-09-24 | Edit this page

Estimated time: 20 minutes

This episode is meant to be demonstrative, that is, you do not need to follow along.

Overview

Questions

  • What is Docker Desktop?
  • What can it be used for?
  • Why can’t it replace the cli?

Objectives

  • Show Docker Desktop and its components.
  • Understand what images and containers are.
  • Visualize the process of image aquisition, container execution and where it ends.
  • Understand the ephimeral nature of containers.
  • Have a glimpse at containers that allow interaction.
  • Understand the importance of cleaning up in docker.
  • Understand the limitations of Docker Desktop.

We will present the Docker Desktop dashboard, as it will be useful to understand key concepts of docker, such as images and containers. However, it is important to note that while it is mostly is free, some features are offered at a premium. Also, it is not fully functional on all operating systems; it can produce conflicts with the docker engine on Linux, for example.

Getting images


Setting up docker in Windows or Mac will have installed Docker Desktop by default. If you open the application you will likely see something like this: Docker Desktop being opened for the first time.

You’ll notice that the panel on the left has a tab for ‘Images’ and another for ‘Containers’. These will be the focus for the episode, and we will ignore most other features.

On the top you’ll also find a search icon, which links to Docker Hub, and allows us to search for the images we saw in the previous episode directly from here.

*Note that there are two tabs, one for containers and one for images. Make sure that you select the right tab when you search! Search window.

In Docker Desktop you can either search by name only, Search by name.

or include the owner. You can then select the tag from the dropdown menu. Search with owner and select a tag.

Once you find the image you were looking for, you can either download it (pull), or directly run it.

We’ll start by downloading the latest versions of hello-world, docker/getting-started and alpine.

Inspecting images


The ‘Images’ tab on the left panel will show all the images in your system, so you will be able to see them here. Images list, hello-world, getting-started and alpine.

From this tab we can see some information about the images on disk, and run them, but we can also inspect the images. Clicking on the image will open a window with information on how the image is built, and examine its packages and vulnerabilities. Inspecting image hello-world.

The hello-world image does not seem too interesting from here. If you go to DockerHub you’ll find links to the Github site, where you’ll see that the image is not as simple as it looks. Nevertheless, this is a very nice and quick way to explore an image.

If we now inspect the docker/getting-started image, for example, we can see that it detects some vulnerabilities: Inspecting image getting-started.

You can even further inspect the vulnerable layers by looking at the command Inspecting image command in getting-started.

This all looks rather scary, and it is important that we are careful with the images that we download. It is therefore quite useful to be able to analize them like this. This image, in particular, is from a verified publisher (Docker Inc. no less!), and has been downloaded over 10M times, so the container is likely not maliicous.

Running containers


The images that we just downloaded are immutable files, they are snapshots of an environment, distributed to be used as templates to create ‘containers’. The containers are, essentially, images being run. They are executions of the image, and because they are running, they are no longer ‘static’.

Let’s run the hello-world image by either clicking the ‘Run’ button in the ‘Actions’ column, from the Images tab. Run button from Images tab.

A prompt will ask you to confirm ‘Run’ or modify some optional settings. Run confirmation prompt.

The optional settings allow you to modify the container’s name, so that you can easily identify it afterwards. Lets add an appropriate name and confirm with the ‘Run’ button. Run optional settings.

You will likely be taken to a ‘Logs’ tab inside the container that you just ran. The logs show the output of this particular image, ‘Hello from Docker!’ among other things.

If you look carefully, the ‘Containers’ tab on the left is highlighted. We are looking at a container now, not an image, and so we were re-located.

Exploring the ‘Inspect’ tab will show us some information, but for now we are more interested in what the ‘Terminal’ and ‘Stats’ tabs have to say. They both seem to indicate that we need to run or start the container.


Logs tab in container from hello-world image.
Inspect tab in container from hello-world image.
Terminal tab in container from hello-world image.
Stats tab in container from hello-world image.

Indeed, if we look carefully, we will find an ‘Exited (0)’ status under the container name, and a ‘Start’ button near the top-right corner. However, if we click on that button we will see the output duplicated in the logs, and the ‘Exited (0)’ status again. Clickling Start on the already run hello-world container.

If we go back to the images tab and run the image again (let’s not bother giving it a name this time), we’ll see that the same thing hapens. We get the ‘Hello from Docker!’, and the container exits. Running hello-world image for a second time.

The nature of most containers is ephimeral. They are meant to execute a process, and when the process is completed, they exit. We can confirm this by clicking on the ‘Containers’ tab on the left. This will exit the container inspection and show us all the containers. Both containers in the list have a status ‘Exited’. Containers list.

You may be wondering why if we have only run the hello-world image, you can see there are two containers. One of the containers we named, and the other has some gibberish as a name (Docker generated this randomly). As mentioned before, the image is used as a template, and as many containers as we want can be created from it. If we go back to the ‘Images’ tab and run hello world again, we’ll see a new container appear.

Interacting with containers


Not all containers are as short lived as the one we’ve been using. If we run the docker/getting-started image that we had pulled earlier, we will see something different happen. You can immediately notice the status under the container name is ‘RUNNING’ now. nal inside the container.

The ‘Logs’ tab is not too informative, but the ‘Inspect’ tab already shows more information. A process called ‘nginx’ is running. The ‘Terminal’ and ‘Stats’ tabs changed the most. Since the container is still running, the stats get shown, and we are able to launch a termi


Logs tab in container from getting-started image.
Inspect tab in container from getting-started image.
Terminal tab in container from getting-started image.
Stats tab in container from getting-started image.

Before trying to do anything in the terminal, let’s look at the container list by clicking on the ‘Containers’ tab on the left. You’ll see the green icon of the container indicating that it is still live, and indication of how long it’s been running for. Containers list, getting-started still running.

Clicking on the container name again will take us back to the ‘Logs’ tab in the container. Lets try and interact with the terminal inside the container. If you print the working directory with pwd you’ll get the base directory: /. You can also list the contents with ls, and the docker-entrypoint files are a dead giveaway that we are inside the container. At this point this container is very much like a VM. We can modify things, like for example making a directory with mkdir, and see it has been created with ls again. Terminal, mkdir and ls inside getting-started container.

But we can do more than that, we can install things. For example, you’ll notice that htop is not installed. Since the getting-started image is based on Alpine, we can install it using apk add htop, and we can now use it.


Terminal, installing htop inside getting-started container.
Terminal, running htop inside getting-started container.

The container does not need to stay alive forever, and you can see that there is a ‘stop’ icon on the top right. If we stop the container, we get a familiar empty tab in ‘Terminal’ and ‘Stats’. The ‘Containers’ tab on the left will also show the container status as ‘Exited’


Terminal tab in container from stopped getting-started image.
Stats tab in container from stopped getting-started image.
Container list after stopping getting-started image.

Now lets say I want to run the getting-started image again. So I go to the ‘Images’ tab, and click run. Now lets go to the ‘Terminal’ tab, and try and find our directory with ls. The directory is not there. We’d also installed htop. so lets have a go at running it. Not there either. Terminal in fresh getting-started image.

When we re-ran the image, we created a new container. The new container is created from the template saved in the image, and so ‘our’ changes have banished. This becomes very clear when we go back to the ‘Containers’ tab on the left. We can see that the first container we created from the getting-started image is there, next to the new container (which is still running, by the way). Containers after new run of getting-started image.

Reviving containers


We can get the old container running again, although this is rarely something we’d want to do. In Docker Desktop, all we need to do is click on the ‘Start’ button from the ‘Containers’ list. The terminal will appear empty, because it is a new session, but you will even be able to ‘recall’ commands. Reviving container getting-started.

Cleaning up


The hello-world image was nice and useful to test docker was working, but it is now rather useless. If I want to delete it, the ‘Images’ tab on the left has a convenient bin icon to do so. Clicking on it will prompt you for confirmation, but it will fail. Failing to delete image.

You’ll probably notice that the status of the image is ‘In use’. That seems strange though, given that all the containers from that image excited immediately.

Lets have a look at the ‘Containers’ tab. It shows a list of 5 containers now. Three of them came from the hello-world image, and are stopped. Two of them came from the getting-started image, and are running.

We’ve only been using Docker for 15 minutes though! You may see how this can become a problem… Particularly so because we were a bit sloppy and did not name the containers. Let’s try and get rid of the containers then. We can conveniently select them all with the tickbox at the top, and an option to ‘Delete’ shows up. Clicking on it will prompt for confirmation, and we can go ahead and accept. Deleting containers.

All our containers are now gone. Forever.

Warning: You have to be careful here, this action deleted even the containers that were running. You can filter the containers before you select them “all”.

On the up-side, the ‘Images’ tab shows both the hello-world and the getting-started images as ‘Unused’ now. For docker, an image is ‘In use’ as long as at least one container has been created from it. We have just deleted all the containers created from either of these images. This tells Docker that they are no longer being used, and can therefore be safely deleted. Successfully deleting images.

Limitations - Why not Docker Desktop?


We have seen many of the neat and functional bits of Docker Desktop, and it can be mighty appealing, particularly if you lean towards the use of graphical interfaces. However, we’ve not touched on its weaknesses. We’ll just need to point at one to feel the need to throw everything overboard.

Let’s go ahead and run the only image we have already pulled, alpine. Logs tab in container from alpine image.

That was fast, and uneventful. Not even a single output to the ‘Logs’. No way to open a terminal inside Alpine.


Logs tab in container from alpine image.
Inspect tab in container from alpine image.
Terminal tab in container from alpine image.
Stats tab in container from alpine image.

Just to be clear though, this Docker image does contain the whole Alpine OS. In Docker Desktop, however, there is no way to interact with it.

Let’s try something different. There’s a program called cowsay that lets you print messages as if a cow was saying them. Searching for that image shows that there is one by beatrixxx32 with a reasonable number of downloads. Search of cowsay image.

So lets pull that image and run it.


Logs tab in container from cowsay image.
Inspect tab in container from cowsay image.
Terminal tab in container from cowsay image.
Stats tab in container from cowsay image.

We do get a cow this time, but it is not saying anything. But it does not know what to say. Going back to the cowsay image search, you may notice that in ‘Usage’ the command line asks for “your message”. We are not using a command though, we just clicked run. Maybe we missed something in the optional settings! Optional settings for cowsay.

No, it does not seem like it. No matter what we do, we cannot make the cow say anything from here.

Are the alpine and cowsay images useless? No, definitely not. However, they are expecting some sort of input or command, which we cannot provide from Docker Desktop.

This is the case for most images, and so Docker Desktop (as it is now) cannot really be used for much more than as a nice dashboard.

In the next episode, we will use docker from the command line, and all of the advantages it brings will become aparent.

Key Points

  • Docker Desktop is a great dashboard that allows us to understand and visualize the lifecycle of images and containers.
  • Images are snapshots of an environment, easily distributable and ready to be used as templates for containers.
  • Containers are executions of the images, often with configuration added on top, and usually meant for single use.
  • Running a container usually implies creating a new copy, so it is important to clean up regularly.
  • Docker Desktop could potentially be all you need to use if you only consume images out of the box.
  • However, it is very limited in most cases (even for consumers), and rarely allows the user to configure and interact with the containers adequately.

Content from Using the Docker command line


Last updated on 2024-09-24 | Edit this page

Estimated time: 198 minutes

So far, we have seen how easily you can find images and run containers in Docker Desktop. However, we have also seen Docker Desktop’s limitations, why couldn’t we do anything useful with our Alpine container?

Overview

Questions

  • What actually happens when I run a container?
  • How can I control the behaviour of containers?

Objectives

  • Learn the lifecycle of Docker containers
  • Learn how to use the Docker command line to perform tasks we learned to do in Docker Desktop
  • Learn to perform more advanced Docker tasks, only possible in the command line

To understand this, we need to learn about the lifecycle of Docker containers.

Container lifecycle


So, what happens when we run a Docker container? A flowchart showing the lifecycle of a Docker container.

Startup

When we run a Docker container, the first set of actions occur during the “startup” phase:

  • First, Docker checks if you already have the container image, and downloads it if you don’t (known as a pull)
  • Next, the container image is used to create the container and start the container’s runtime environment

Default Command

Once the container has started, it executes its default command.

The default command is specified by the image’s creator and is formed of two parts:

  • Entrypoint: The base command for the container (may be omitted, default: ““)
  • Command: Parameters for the base command (or, if entrypoint omitted, the whole default command)

When the container runs, Entrypoint and Command are concatenated to form the default command which is then executed.

Shutdown

Whether a container shuts down depends on the default command. A container will not shutdown until its default process has finished.

Therefore, a command the runs to completion will allow a container to shut down, but a service that runs indefinitely will not.

Examples

Let’s take a look at some examples.

Example 1

A flowchart showing the lifecycle of a Docker container. with an example

In example 1, an Entrypoint of “echo” is given along with a Command of “hello world” to the container.

This works in exactly the same way as running the following in a standard terminal:

$ echo hello world

Example 2

A flowchart showing the lifecycle of a Docker container. with an example

In example 2, an Entrypoint of “sleep” is given along with a Command of “infinity” to the container.

This causes the container to run the sleep command indefinitely, meaning that the default command never finishes, and the container stays alive without shutting down.

This example may seem artificial, but it is common to have containers that run indefinitely. A common example would be a webserver which renders a website, running until stopped or changed.

Return to Alpine


So, how does this explain why the Alpine container seemed so uninteresting?

First, let’s apply what we have learned about the lifecycle of a Docker container. Let’s find out what the Entrypoint and Command for the Alpine container are. To do this we will use the Docker command line interface.

Docker command line syntax

The general syntax for the Docker command line explained in this diagram:

A diagram showing the syntactic structure of a Docker command

There is a base command, always “docker”. A specialising command which specifies the type of the object you wish to act on. The action command which select the actual process that will beformed and finally the name of the object you wish to act on. You may also need to add extra arguments and switches.

Inspecting images

Applying this, we want to inspect the image of Alpine to find out the Entrypoint and Command. We can do this by running the command below, where we have specified an extra argument -f which is a string specifying the output format of the command (without it we would get a large JSON dump describing the container).

BASH

docker inspect alpine -f "Entrypoint: {{.Config.Entrypoint}} Command: {{.Config.Cmd}}"

OUTPUT

Entrypoint: [] Command: [/bin/sh]

So, we can see from this command that Alpine specifies:

  • Entrypoint: null (unspecified)
  • Command: /bin/sh

Therefore, the default command for Alpine is: “/bin/sh”.

What is wrong with this? Shouldn’t this just open a terminal prompt?

Not in this case! Running a command in a Docker container is not quite like running it in your terminal.

When you use your terminal, you are able to type commands and interact with the underlying system. By default however, Docker does not allow containers to do this. Without special measures, a Docker container has no way of recieving interactions and instead sends a empty signal to the sh process. sh is programmed to exit when it recieves an empty signal and therefore finishes its process.

$ /bin/sh               # you might expect Docker to be running the default command like this
$ /bin/sh \< /dev/null  # but actually, it is like this!

This is why the Alpine contatainer seemed so uninteresting, in Docker Desktop there was no way for us to provide an interactive session to the container! It is possible to change this behavior, but not using Docker Desktop. Instead, we now need to dig in to the more powerful and flexible Docker Command Line Interface.

This behavior can also be seen on other shell-like containers, for example those with “bash” or “python” as default commands.

Running a container


So, how does the Docker command line address this? Can we use the Docker CLI to interact with our containers?

Let’s now focus on running Docker containers using the CLI.

Again, we will use the docker base command, this time specialising this command with ‘run’ and the name of an image; to tell Docker that we want to start up the specified container image.

BASH

$ docker run alpine

You will likely see output similar to the below:

OUTPUT

> \##Unable to find image 'alpine:latest' locally
> latest: Pulling from library/alpine
> f56be85fc22e: Already exists
> Digest: sha256:124c7d2707904eea7431fffe91522a01e5a861a624ee31d03372cc1d138a3126
> Status: Downloaded newer image for alpine:latest

This output shows docker recognising that it does not have the Alpine image locally, and beginning the process of downloading and storing the image.

But is something missing? We don’t see any other output? Let’s run the command again.

BASH

$ docker run alpine

OUTPUT

>

This time there is no output at all, with the image already downloaded it is able to run without downloading. But why is no output produced? This is for the same reason as when we tried running Alpine in Docker Desktop (as explained above), the Docker run command is not sending any commands, and the default command exits automatically.

Downloading containers

When we run docker run, the image we ask for is automatically downloaded. If you wanted to download in image without running this is also possible. You can use:

BASH

$ docker pull alpine

Why don’t we finally try and make Alpine do something?

Overriding the default command


To get the Alpine image to do something interesting; we will want to overwrite the default command.

This can be done in the Docker CLI, by specifying the desired command, after the docker run command and image name.

For example, if we wanted the Alpine container to print “hello world”, we could use the standard Unix command, echo.

BASH

$ docker run alpine echo "hello world"

OUTPUT

> hello world

Great! We have finally managed to do something interesting with the Alpine image.

This is a big step, but how can we evolve from here? What if we tried to generalise our printing command?

We can do this by modifying the entrypoint portion of the default command (remember: the entrypoint is the portion of the default command which prefixes the command that comes after the image name).

This way we can have just the text we want to print after the image name.

We do this by adding the ‘–entrypoint’ switch to our Docker command, followed by the command we want to become the entrypoint.

BASH

$ docker run --entrypoint echo alpine "hello world"
$ docker run --entrypoint echo alpine "lovely day"

OUTPUT

> hello world
> lovely day

Wonderful! But this needn’t be limited to just echo, let’s use this approach to get some more information about the container, by printing the contents of a special file called ‘os-release’.

BASH

$ docker run alpine cat /etc/os-release

OUTPUT

NAME="Alpine Linux"
ID=alpine
VERSION\_ID=3.17.3
PRETTY\_NAME="Alpine Linux v3.17"
HOME\_URL="[https://alpinelinux.org/](https://alpinelinux.org/)"
BUG\_REPORT\_URL="[https://gitlab.alpinelinux.org/alpine/aports/-/issues](https://gitlab.alpinelinux.org/alpine/aports/-/issues)"

Make the container use cat as a base command

Can you make the Alpine container use cat as its base command? Can you print the os-release information using this approach?

BASH

$ docker run --entrypoint cat alpine /etc/os-release

BASH

$ docker run --entrypoint cat alpine /etc/passwd

or

BASH

$ docker run alpine cat /etc/passwd

We can see from the output that we are indeed running in an Alpine container and now have a powerful way to execute single commands in a Docker container.

Running a container interactively


But what if we wanted to run a series of commands? What if we wanted to be able to explore within a container?

We will look now at how we can create interactive sessions for using Docker containers.

Luckily this is straightforward, we simply need to modify our standard Docker command to include two new flags.

We need to add:

  • ‘-i’ to enable interactivity
  • ‘-t’ to enable a TTY connection (the ability to type things!)

Let’s try this with the Alpine image:

BASH

$ docker run -i -t alpine

OUTPUT

/ #

You are now inside the container! Let’s try some familiar commands.

BASH

/# cat /etc/os-release

OUTPUT

NAME="Alpine Linux"
ID=alpine
VERSION\_ID=3.17.3
PRETTY\_NAME="Alpine Linux v3.17"
HOME\_URL="[https://alpinelinux.org/](https://alpinelinux.org/)"
BUG\_REPORT\_URL="[https://gitlab.alpinelinux.org/alpine/aports/-/issues](https://gitlab.alpinelinux.org/alpine/aports/-/issues)"

You can see we get the same output as earlier, but this time the container remains live and we remain inside the container.

You can now exit the container by running:

BASH

/# exit

Using containers in this interactive way can be useful, but we have to be careful as (by default) our changes will not persist.

You can see this for yourself with the following process

BASH

$ docker run -i -t alpine
/# touch hello-world
/# echo "hi everyone" > hello-world
/# cat hello-world
/# exit

OUTPUT

> hi everyone

BASH

$ docker run -i -t alpine
/# cat hello-world

OUTPUT

> cat: can't open 'hello-world': No such file or directory

You can see that, while we were still interacting with the container, our file continued to exist. Once we left, it was gone!

Getting files into the container


Now that we are figuring out how to work with containers, let’s imagine a situation. We are working in a data science lab, and our supervisor has asked us to run some tools that they have developed; which they are distributing in Docker.

The first tool, called Random, generates a series of random numbers. The second tool, named Datamash, performs simple statistical analyses on single columns of data. Let’s figure out how to use these tools and some of the hurdles we will have to overcome.

First, let’s download the tools:

BASH

$ docker pull ghcr.io/uomresearchit/random
$ docker pull ghcr.io/uomresearchit/datamash

Great! But… how can we check what has happened? How can we know we have the images? Let’s check with:

BASH

$ docker image ls

OUTPUT

REPOSITORY                                       TAG                            IMAGE ID       CREATED         SIZE
ghcr.io/uomresearchit/datamash                   latest                         a0143c45d868   5 minutes ago   49.7MB
ghcr.io/uomresearchit/random                     latest                         b39bc463abfd   6 weeks ago     7.04MB

You should see output similar to the above, with the images we just pulled listed.

Now we have the tools let’s try running them. We’ll start by generating our data:

BASH

$ docker run ghcr.io/uomresearchit/random

OUTPUT

[...]
[Lot's of numbers!]
[...]
22307
21278
28211
21151
9209

We can see this produces a lot of output. Let’s write it to file instead:

BASH

$ docker run ghcr.io/uomresearchit/random > local\_data.txt

We now have a datafile we can use on our other tool. Our supervisor has told us a couple of helpful things about this container, apparently all you have to do is give the container the filename you want to analyse.

Let’s try that!

BASH

$ docker run ghcr.io/uomresearchit/datamash local\_data.txt

OUTPUT

Traceback (most recent call last):
File "//mean.py", line 7, in <module>
with open(args.filename) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'local\_data.txt'

We gave the right filename… So why hasn’t this worked? The answer is simple, but has profound implications.

The file we specified is present on our machine, but not inside the container! The container has its own, completely isolated filesystem.

This can really help, the container’s file are stored in a nice, safe, isolated place perfectly tailored for the application. But then, how can we give our file to the container, in order for it to be analysed?

Using mounts and volumes


Docker’s answer to this is to allow you to selectively expose files and folders on your machine to the container you are running.

Docker calls these bind mounts and they take the following form: A breakdown of the command for mounting a file into a container.

Using this command we specify the type of mount we want (more on this later!), the name of a local file (or directory) to mount to the container and the path and filename which we would like inside the container.

$PWD

You will notice we used an envirnoment variable (denoted by $) to specify the location of our local file. This is because the docker command needs the full path to a file to mount correctly, it wont work if you do ./local_file.txt!

Let’s try this out with our datamash container:

BASH

$ docker run --mount type=bind,source="$PWD"/local\_data.txt,target=/local\_data.txt ghcr.io/uomresearchit/datamash local\_data.txt

OUTPUT

Data file data.txt, selected.
Number of entries: 100.0
Sum: 1681522.0
Mean: 16815.22

Wonderful! The container works now. But that syntax is very clunky… Can it be shortened? The answer is yes, but it requires care - let’s investigate.

Try out the following command:

BASH

$ docker run -v $PWD/data.txt:/data.txt ghcr.io/uomresearchit/datamash data.txt

It works just like before! But there is a hidden danger here.

So far we have only discussed bind mounts. These are a simple mechanism for binding bits of your local filesystem into containers. There is another method for getting files into container, called volumes.

Volumes are the isolated, persistent filesystems for containers, they can be shared between containers and are very useful for retaining information between container runs.

However, they are not particularly useful for getting files from host to container!

Can you spot the difference?

BASH

$ docker run -v data.txt:/data.txt ghcr.io/uomresearchit/datamash data.txt

OUTPUT

Traceback (most recent call last):
File "//mean.py", line 7, in <module>
with open(args.filename) as f:
IsADirectoryError: [Errno 21] Is a directory: 'data.txt'

Why has this not worked? The answer is that Docker has created a volume! Not mounted our file! This is because the Docker command requires absolute paths, if a path doesn’t evaluate correclty then it assumes you are naming a new volume.

See this bit of the Docker man page:

\-v|--volume[\=[[HOST-DIR:]CONTAINER-DIR[:OPTIONS]]]
Create a bind mount. If you specify, -v /HOST-DIR:/CONTAINER-DIR, Docker
bind mounts /HOST-DIR in the host to /CONTAINER-DIR in the Docker
container. If 'HOST-DIR' is omitted,  Docker automatically creates the new
volume on the host.

For this reason, we would recommend using the full (clunky) syntax of the mount command.

Running services in Docker


This impermanence can be a challenge! What if we

Long running processes


docker container with python analysis built in and simple upload portal

Docker Exec


Copying files


docker cp

Docker ps


Managing the system


docker stats

Removing images

docker image rm

Removing containers

docker rm

Removing images and containers

docker system prune docker system prune -a

Sum up


docker run –rm –name mydocker –entrypoint thing –mount

in compose

Conclusion


We have learned about the lifecycle of Docker containers, what happens when they are started and how their behaviour is controled. We have learned to run containers in the Docker command line interace and how to interact with them directly.

Key Points

  • asdf
  • asdf
  • asdf

Content from Creating Your Own Container Images


Last updated on 2024-09-24 | Edit this page

Estimated time: 35 minutes

There are lots of reasons why you might want to create your own Docker container image.

Overview

Questions

  • How can I make my own Docker container images?
  • How do I document the ‘recipe’ for a Docker container image?

Objectives

  • Explain the purpose of a Dockerfile and show some simple examples.
  • Demonstrate how to build a Docker container image from a Dockerfile.
  • Compare the steps of creating a container image interactively versus a Dockerfile.
  • Create an installation strategy for a container image.
  • Demonstrate how to upload (‘push’) your container images to the Docker Hub.
  • Describe the significance of the Docker Hub naming scheme.
  • You can’t find a container image with all the tools you need on Docker Hub.
  • You want to have a container image to “archive” all the specific software versions you ran for a project.
  • You want to share your workflow with someone else.

Interactive installation


Before creating a reproducible installation, let’s experiment with installing software inside a container. Start a container from the alpine container image we used before, interactively:

BASH

$ docker container run -it alpine sh

Because this is a basic container, there’s a lot of things not installed – for example, python3.

BASH

/# python3

OUTPUT

sh: python3: not found

Inside the container, we can run commands to install Python 3. The Alpine version of Linux has a installation tool called apk that we can use to install Python 3.

BASH

/# apk add --update python3 py3-pip python3-dev

We can test our installation by running a Python command:

BASH

/# python3 --version

Once Python is installed, we can add Python packages using the pip package installer:

BASH

/# pip install cython

Exercise: Searching for Help

Can you find instructions for installing R on Alpine Linux? Do they work?

A quick search should hopefully show that the way to install R on Alpine Linux is:

BASH

/# apk add R

Once we exit, these changes are not saved to a new container image by default. There is a command that will “snapshot” our changes, but building container images this way is not easily reproducible. Instead, we’re going to take what we’ve learned from this interactive installation and create our container image from a reproducible recipe, known as a Dockerfile.

If you haven’t already, exit out of the interactively running container.

BASH

/# exit

Put installation instructions in a Dockerfile


A Dockerfile is a plain text file with keywords and commands that can be used to create a new container image.

From your shell, go to the folder you downloaded at the start of the lesson and print out the Dockerfile inside:

BASH

$ cd ~/Desktop/docker-intro/basic
$ cat Dockerfile

OUTPUT

FROM <EXISTING IMAGE>
RUN <INSTALL CMDS FROM SHELL>
RUN <INSTALL CMDS FROM SHELL>
CMD <CMD TO RUN BY DEFAULT>

Let’s break this file down:

  • The first line, FROM, indicates which container image we’re starting with. It is the “base” container image we are going to start from.
  • The next two lines RUN, will indicate installation commands we want to run. These are the same commands that we used interactively above.
  • The last line, CMD, indicates the default command we want a container based on this container image to run, if no other command is provided. It is recommended to provide CMD in exec-form (see the CMD section of the Dockerfile documentation for more details). It is written as a list which contains the executable to run as its first element, optionally followed by any arguments as subsequent elements. The list is enclosed in square brackets ([]) and its elements are double-quoted (") strings which are separated by commas. For example, CMD ["ls", "-lF", "--color", "/etc"] would translate to ls -lF --color /etc.

shell-form and exec-form for CMD

Another way to specify the parameter for the CMD instruction is the shell-form. Here you type the command as you would call it from the command line. Docker then silently runs this command in the image’s standard shell. CMD cat /etc/passwd is equivalent to CMD ["/bin/sh", "-c", "cat /etc/passwd"]. We recommend to prefer the more explicit exec-form because we will be able to create more flexible container image command options and make sure complex commands are unambiguous in this format.

Exercise: Take a Guess

Do you have any ideas about what we should use to fill in the sample Dockerfile to replicate the installation we did above?

Based on our experience above, edit the Dockerfile (in your text editor of choice) to look like this:

FROM alpine
RUN apk add --update python3 py3-pip python3-dev
RUN pip install cython
CMD ["python3", "--version"]

The recipe provided by the Dockerfile shown in the solution to the preceding exercise will use Alpine Linux as the base container image, add Python 3 and the Cython library, and set a default command to request Python 3 to report its version information.

Create a new Docker image


So far, we only have a text file named Dockerfile – we do not yet have a container image. We want Docker to take this Dockerfile, run the installation commands contained within it, and then save the resulting container as a new container image. To do this we will use the docker image build command.

We have to provide docker image build with two pieces of information:

  • the location of the Dockerfile
  • the name of the new container image. Remember the naming scheme from before? You should name your new image with your Docker Hub username and a name for the container image, like this: USERNAME/CONTAINER_IMAGE_NAME.

All together, the build command that you should run on your computer, will have a similar structure to this:

BASH

$ docker image build -t USERNAME/CONTAINER_IMAGE_NAME .

The -t option names the container image; the final dot indicates that the Dockerfile is in our current directory.

For example, if my user name was alice and I wanted to call my container image alpine-python, I would use this command:

BASH

$ docker image build -t alice/alpine-python .

Build Context

Notice that the final input to docker image build isn’t the Dockerfile – it’s a directory! In the command above, we’ve used the current working directory (.) of the shell as the final input to the docker image build command. This option provides what is called the build context to Docker – if there are files being copied into the built container image more details in the next episode they’re assumed to be in this location. Docker expects to see a Dockerfile in the build context also (unless you tell it to look elsewhere).

Even if it won’t need all of the files in the build context directory, Docker does “load” them before starting to build, which means that it’s a good idea to have only what you need for the container image in a build context directory, as we’ve done in this example.

Exercise: Review!

  1. Think back to earlier. What command can you run to check if your container image was created successfully? (Hint: what command shows the container images on your computer?)

  2. We didn’t specify a tag for our container image name. What tag did Docker automatically use?

  3. What command will run a container based on the container image you’ve created? What should happen by default if you run such a container? Can you make it do something different, like print “hello world”?

  1. To see your new image, run docker image ls. You should see the name of your new container image under the “REPOSITORY” heading.

  2. In the output of docker image ls, you can see that Docker has automatically used the latest tag for our new container image.

  3. We want to use docker container run to run a container based on a container image.

The following command should run a container and print out our default message, the version of Python:

BASH

$ docker container run alice/alpine-python

To run a container based on our container image and print out “Hello world” instead:

BASH

$ docker container run alice/alpine-python echo "Hello World"

While it may not look like you have achieved much, you have already effected the combination of a lightweight Linux operating system with your specification to run a given command that can operate reliably on macOS, Microsoft Windows, Linux and on the cloud!

Boring but important notes about installation


There are a lot of choices when it comes to installing software – sometimes too many! Here are some things to consider when creating your own container image:

  • Start smart, or, don’t install everything from scratch! If you’re using Python as your main tool, start with a Python container image. Same with R. We’ve used Alpine Linux as an example in this lesson, but it’s generally not a good container image to start with for initial development and experimentation because it is a less common distribution of Linux; using Ubuntu, Debian and CentOS are all good options for scientific software installations. The program you’re using might recommend a particular distribution of Linux, and if so, it may be useful to start with a container image for that distribution.
  • How big? How much software do you really need to install? When you have a choice, lean towards using smaller starting container images and installing only what’s needed for your software, as a bigger container image means longer download times to use.
  • Know (or Google) your Linux. Different distributions of Linux often have distinct sets of tools for installing software. The apk command we used above is the software package installer for Alpine Linux. The installers for various common Linux distributions are listed below:
    • Ubuntu: apt or apt-get
    • Debian: deb
    • CentOS: yum Most common software installations are available to be installed via these tools. A web search for “install X on Y Linux” is usually a good start for common software installation tasks; if something isn’t available via the Linux distribution’s installation tools, try the options below.
  • Use what you know. You’ve probably used commands like pip or install.packages() before on your own computer – these will also work to install things in container images (if the basic scripting language is installed).
  • README. Many scientific software tools have a README or installation instructions that lay out how to install software. You want to look for instructions for Linux. If the install instructions include options like those suggested above, try those first.

In general, a good strategy for installing software is:

  • Make a list of what you want to install.
  • Look for pre-existing container images.
  • Read through instructions for software you’ll need to install.
  • Try installing everything interactively in your base container – take notes!
  • From your interactive installation, create a Dockerfile and then try to build the container image from that.

Share your new container image on Docker Hub


Container images that you release publicly can be stored on the Docker Hub for free. If you name your container image as described above, with your Docker Hub username, all you need to do is run the opposite of docker image pulldocker image push.

BASH

$ docker image push alice/alpine-python

Make sure to substitute the full name of your container image!

In a web browser, open https://hub.docker.com, and on your user page you should now see your container image listed, for anyone to use or build on.

Logging In

Technically, you have to be logged into Docker on your computer for this to work. Usually it happens by default, but if docker image push doesn’t work for you, run docker login first, enter your Docker Hub username and password, and then try docker image push again.

What’s in a name? (again)


You don’t have to name your containers images using the USERNAME/CONTAINER_IMAGE_NAME:TAG naming scheme. On your own computer, you can call container images whatever you want, and refer to them by the names you choose. It’s only when you want to share a container image that it needs the correct naming format.

You can rename container images using the docker image tag command. For example, imagine someone named Alice has been working on a workflow container image and called it workflow-test on her own computer. She now wants to share it in her alice Docker Hub account with the name workflow-complete and a tag of v1. Her docker image tag command would look like this:

BASH

$ docker image tag workflow-test alice/workflow-complete:v1

She could then push the re-named container image to Docker Hub, using docker image push alice/workflow-complete:v1

Key Points

  • Dockerfiles specify what is within Docker container images.
  • The docker image build command is used to build a container image from a Dockerfile.
  • You can share your Docker container images through the Docker Hub so that others can create Docker containers from your container images.

Content from Creating More Complex Container Images


Last updated on 2024-09-24 | Edit this page

Estimated time: 60 minutes

In order to create and use your own container images, you may need more information than our previous example. You may want to use files from outside the container, that are not included within the container image, either by copying the files into the container image, or by making them visible within a running container from their existing location on your host system. You may also want to learn a little bit about how to install software within a running container or a container image. This episode will look at these advanced aspects of running a container or building a container image. Note that the examples will get gradually more and more complex – most day-to-day use of containers and container images can be accomplished using the first 1–2 sections on this page.

Overview

Questions

  • How can I make more complex container images?

Objectives

  • Explain how you can include files within Docker container images when you build them.
  • Explain how you can access files on the Docker host from your Docker containers.

Using scripts and files from outside the container


In your shell, change to the sum folder in the docker-intro folder and look at the files inside.

BASH

$ cd ~/Desktop/docker-intro/sum
$ ls

This folder has both a Dockerfile and a Python script called sum.py. Let’s say we wanted to try running the script using a container based on our recently created alpine-python container image.

Running containers

What command would we use to run Python from the alpine-python container?

If we try running the container and Python script, what happens?

BASH

$ docker container run alice/alpine-python python3 sum.py

OUTPUT

python3: can't open file 'sum.py': [Errno 2] No such file or directory

No such file or directory

What does the error message mean? Why might the Python inside the container not be able to find or open our script?

The problem here is that the container and its filesystem is separate from our host computer’s filesystem. When the container runs, it can’t see anything outside itself, including any of the files on our computer. In order to use Python (inside the container) and our script (outside the container, on our host computer), we need to create a link between the directory on our computer and the container.

This link is called a “mount” and is what happens automatically when a USB drive or other external hard drive gets connected to a computer – you can see the contents appear as if they were on your computer.

We can create a mount between our computer and the running container by using an additional option to docker container run. We’ll also use the variable ${PWD} which will substitute in our current working directory. The option will look like this

--mount type=bind,source=${PWD},target=/temp

What this means is: make my current working directory (on the host computer) – the source – visible within the container that is about to be started, and inside this container, name the directory /temp – the target.

Types of mounts

You will notice that we set the mount type=bind, there are other types of mount that can be used in Docker (e.g. volume and tmpfs). We do not cover other types of mounts or the differences between these mount types in the course as it is more of an advanced topic. You can find more information on the different mount types in the Docker documentation.

Let’s try running the command now:

BASH

$ docker container run --mount type=bind,source=${PWD},target=/temp alice/alpine-python python3 sum.py

But we get the same error!

OUTPUT

python3: can't open file 'sum.py': [Errno 2] No such file or directory

This final piece is a bit tricky – we really have to remember to put ourselves inside the container. Where is the sum.py file? It’s in the directory that’s been mapped to /temp – so we need to include that in the path to the script. This command should give us what we need:

BASH

$ docker container run --mount type=bind,source=${PWD},target=/temp alice/alpine-python python3 /temp/sum.py

Note that if we create any files in the /temp directory while the container is running, these files will appear on our host filesystem in the original directory and will stay there even when the container stops.

Other Commonly Used Docker Run Flags

Docker run has many other useful flags to alter its function. A couple that are commonly used include -w and -u.

The --workdir/-w flag sets the working directory a.k.a. runs the command being executed inside the directory specified. For example, the following code would run the pwd command in a container started from the latest ubuntu image in the /home/alice directory and print /home/alice. If the directory doesn’t exist in the image it will create it.

docker container run -w /home/alice/ ubuntu pwd

The --user/-u flag lets you specify the username you would like to run the container as. This is helpful if you’d like to write files to a mounted folder and not write them as root but rather your own user identity and group. A common example of the -u flag is --user $(id -u):$(id -g) which will fetch the current user’s ID and group and run the container as that user.

Exercise: Explore the script

What happens if you use the docker container run command above and put numbers after the script name?

This script comes from the Python Wiki and is set to add all numbers that are passed to it as arguments.

Exercise: Checking the options

Our Docker command has gotten much longer! Can you go through each piece of the Docker command above and explain what it does? How would you characterize the key components of a Docker command?

Here’s a breakdown of each piece of the command above

  • docker container run: use Docker to run a container
  • --mount type=bind,source=${PWD},target=/temp: connect my current working directory (${PWD}) as a folder inside the container called /temp
  • alice/alpine-python: name of the container image to use to run the container
  • python3 /temp/sum.py: what commands to run in the container

More generally, every Docker command will have the form: docker [action] [docker options] [docker container image] [command to run inside]

Exercise: Interactive jobs

Try using the directory mount option but run the container interactively. Can you find the folder that’s connected to your host computer? What’s inside?

The docker command to run the container interactively is:

BASH

$ docker container run --mount type=bind,source=${PWD},target=/temp -it alice/alpine-python sh

Once inside, you should be able to navigate to the /temp folder and see that’s contents are the same as the files on your host computer:

BASH

/# cd /temp
/# ls

Mounting a directory can be very useful when you want to run the software inside your container on many different input files. In other situations, you may want to save or archive an authoritative version of your data by adding it to the container image permanently. That’s what we will cover next.

Including your scripts and data within a container image


Our next project will be to add our own files to a container image – something you might want to do if you’re sharing a finished analysis or just want to have an archived copy of your entire analysis including the data. Let’s assume that we’ve finished with our sum.py script and want to add it to the container image itself.

In your shell, you should still be in the sum folder in the docker-intro folder.

BASH

$ pwd

BASH

$ /Users/yourname/Desktop/docker-intro/sum

Let’s add a new line to the Dockerfile we’ve been using so far to create a copy of sum.py. We can do so by using the COPY keyword.

COPY sum.py /home

This line will cause Docker to copy the file from your computer into the container’s filesystem. Let’s build the container image like before, but give it a different name:

BASH

$ docker image build -t alice/alpine-sum .

The Importance of Command Order in a Dockerfile

When you run docker build it executes the build in the order specified in the Dockerfile. This order is important for rebuilding and you typically will want to put your RUN commands before your COPY commands.

Docker builds the layers of commands in order. This becomes important when you need to rebuild container images. If you change layers later in the Dockerfile and rebuild the container image, Docker doesn’t need to rebuild the earlier layers but will instead used a stored (called “cached”) version of those layers.

For example, in an instance where you wanted to copy multiply.py into the container image instead of sum.py. If the COPY line came before the RUN line, it would need to rebuild the whole image. If the COPY line came second then it would use the cached RUN layer from the previous build and then only rebuild the COPY layer.

Exercise: Did it work?

Can you remember how to run a container interactively? Try that with this one. Once inside, try running the Python script.

You can start the container interactively like so:

BASH

$ docker container run -it alice/alpine-sum sh

You should be able to run the python command inside the container like this:

BASH

/# python3 /home/sum.py

This COPY keyword can be used to place your own scripts or own data into a container image that you want to publish or use as a record. Note that it’s not necessarily a good idea to put your scripts inside the container image if you’re constantly changing or editing them. Then, referencing the scripts from outside the container is a good idea, as we did in the previous section. You also want to think carefully about size – if you run docker image ls you’ll see the size of each container image all the way on the right of the screen. The bigger your container image becomes, the harder it will be to easily download.

Callout

## Security warning Login credentials including passwords, tokens, secure access tokens or other secrets must never be stored in a container. If secrets are stored, they are at high risk to be found and exploited when made public.

Copying alternatives

Another trick for getting your own files into a container image is by using the RUN keyword and downloading the files from the internet. For example, if your code is in a GitHub repository, you could include this statement in your Dockerfile to download the latest version every time you build the container image:

RUN git clone https://github.com/alice/mycode

Similarly, the wget command can be used to download any file publicly available on the internet:

RUN wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.0/ncbi-blast-2.10.0+-x64-linux.tar.gz

Note that the above RUN examples depend on commands (git and wget respectively) that must be available within your container: Linux distributions such as Alpine may require you to install such commands before using them within RUN statements.

More fancy Dockerfile options (optional, for presentation or as exercises)


We can expand on the example above to make our container image even more “automatic”. Here are some ideas:

Make the sum.py script run automatically

FROM alpine
RUN apk add --update python3 py3-pip python3-dev
COPY sum.py /home

# Run the sum.py script as the default command
CMD ["python3", "/home/sum.py"]

Build and test it:

BASH

$ docker image build -t alpine-sum:v1 .
$ docker container run alpine-sum:v1

You’ll notice that you can run the container without arguments just fine, resulting in sum = 0, but this is boring. Supplying arguments however doesn’t work:

BASH

docker container run alpine-sum:v1 10 11 12

results in

OUTPUT

docker: Error response from daemon: OCI runtime create failed:
container_linux.go:349: starting container process caused "exec:
\"10\": executable file not found in $PATH": unknown.

This is because the arguments 10 11 12 are interpreted as a command that replaces the default command given by CMD ["python3", "/home/sum.py"] in the image.

To achieve the goal of having a command that always runs when a container is run from the container image and can be passed the arguments given on the command line, use the keyword ENTRYPOINT in the Dockerfile.

FROM alpine

COPY sum.py /home
RUN apk add --update python3 py3-pip python3-dev

# Run the sum.py script as the default command and
# allow people to enter arguments for it
ENTRYPOINT ["python3", "/home/sum.py"]

# Give default arguments, in case none are supplied on
# the command-line
CMD ["10", "11"]

Build and test it:

BASH

$ docker image build -t alpine-sum:v2 .
# Most of the time you are interested in the sum of 10 and 11:
$ docker container run alpine-sum:v2
# Sometimes you have more challenging calculations to do:
$ docker container run alpine-sum:v2 12 13 14

Overriding the ENTRYPOINT

Sometimes you don’t want to run the image’s ENTRYPOINT. For example if you have a specialized container image that does only sums, but you need an interactive shell to examine the container:

BASH

$ docker container run -it alpine-sum:v2 /bin/sh

will yield

OUTPUT

Please supply integer arguments

You need to override the ENTRYPOINT statement in the container image like so:

BASH

$ docker container run -it --entrypoint /bin/sh alpine-sum:v2

Add the sum.py script to the PATH so you can run it directly:

FROM alpine

RUN apk add --update python3 py3-pip python3-dev

COPY sum.py /home
# set script permissions
RUN chmod +x /home/sum.py
# add /home folder to the PATH
ENV PATH /home:$PATH

Build and test it:

BASH

$ docker image build -t alpine-sum:v3 .
$ docker container run alpine-sum:v3 sum.py 1 2 3 4

Best practices for writing Dockerfiles

Take a look at Nüst et al.’s “Ten simple rules for writing Dockerfiles for reproducible data science” [1] for some great examples of best practices to use when writing Dockerfiles. The GitHub repository associated with the paper also has a set of example Dockerfiles demonstrating how the rules highlighted by the paper can be applied.

[1] Nüst D, Sochat V, Marwick B, Eglen SJ, Head T, et al. (2020) Ten simple rules for writing Dockerfiles for reproducible data science. PLOS Computational Biology 16(11): e1008316. https://doi.org/10.1371/journal.pcbi.1008316

Key Points

  • Docker allows containers to read and write files from the Docker host.
  • You can include files from your Docker host into your Docker container images by using the COPY instruction in your Dockerfile.