Using Container Software

One of the amazingly useful current trends in software development is “containerization“. This describes setting up a self-contained environment on a host computer system, that can run a separate operating system (OS), contain data and software not usually available otherwise. I find this very appealing e.g. for

  • testing a software package that is not available for my usual OS
  • testing my own software on a different OS
  • developing or running an analysis in fully reproducible settings
  • sharing software or analysis with reproducible settings

The main players in the field are Docker and Singularity. There are pros and cons for each, Singularity might be better suited for shared environments as you can run the containers with standard user rights.
The general idea of containerization is similar to virtual machines, this Docker page explains the differences.

Docker

To install Docker on an Ubuntu system, currently the following commands will do the trick:

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" 
sudo apt-get update
sudo apt-get install -y docker-ce

Installation of Docker on Mac OSX is straightforward with the download of the free community edition (CE) from here. Make sure you either add the software as a log-in item or manually start it before you attempt to use it. (You will see the little Docker whale in the task bar. )

Some key Docker commands are

# show all images available locally:
docker images
# show all containers that are already running:
docker ps
# start a new container from an image that can be fetched from the remote Docker repository.
# a simple test:
docker run hello-world
# or a small Linux system:
docker run --rm -it phusion/baseimage:0.9.22 /sbin/my_init -- bash 

A good way to move data in and out of the container is by mounting a specific directory, e.g./home/felix/testproject to the tmp folder:

docker run --rm -it --mount type=bind,source=/home/felix/testproject,target=/tmp phusion/baseimage:0.9.22 /sbin/my_init -- bash

The standard way to create new images is by defining all installation steps in a Dockerfile. However, to share a pre-build environment with your own data it is sometimes easier to freeze the container you have build up. Your can do this by starting a base system, e.g. the phusion/baseimage mentioned above, installing all the software and data you like and exporting the container:

# find container ID:
docker ps
CONTAINER ID        IMAGE                      COMMAND                  CREATED       
c24c11050fc5        phusion/baseimage:0.9.22   "/sbin/my_init -- ..."   4 seconds ago 
# export the current state in a compressed archive:
docker export c24c11050fc5 | gzip > my_new_image.tar.gz
# import and run the image later or on a different computer:
docker import my_new_image.tar.gz my_container
docker run -it my_container /sbin/my_init -- bash

Singularity

Installation on Mac OSX requires the Vagrant and the VirtualBox system first. Instead of using the brew system often recommended, I found it better to install the dmg packages directly from the Vagrant site and the Oracle VirtualBox site. Good instructions for different systems are also given here.
After installation you can start the VirtualBox Manager and set up an Ubuntu image:

# get the image:
vagrant init ubuntu/trusty64
# start the virtual machine:
vagrant up
# get to the command line:
vagrant ssh

or directly start a Singularity image with:

vagrant init singularityware/singularity-2.4
vagrant up
vagrant ssh

You can stop a VM running in the background with

vagrant halt

It is possible to use Docker images in Singularity, you can pull from the Docker hub and build a local image:

sudo singularity build phusion-singularity-image.simg docker://phusion/baseimage:0.9.22

This should get you started with containers.
Make sure to keep an eye on disk consumption, in particular Docker data seems to grow significantly in the background (See issue here)! I prefer to move Docker (the “qcow2” file) to a fast external disk.