Containerization¶
When installing software, you may come across applications that have complex chains of dependencies that are challenging to compile and install. Some software may require very specific versions of libraries that may not be available on CCR's systems or conflict with libraries needed for other applications. You may also need to move between several workstations or HPC platforms, which often requires reinstalling your software on each system. Containers are a good way to tackle all of these issues and more.
Watch this "Intro to using containers at CCR" workshop to gain an understanding of containers and how to utilize them at CCR:
Warning
Do not use containers with preinstalled system modules. Any software you may need should be installed within your container.
Containerization Fundamentals¶
Containers build upon an idea that has long existed within computing: hardware can be emulated through software. Virtualization simulates some or all components of computation through a software application. Virtual machines use this concept to generate an entire operating system as an application on a host system. Containers follow the same idea, but at a much smaller scale and contained within a system's kernel.
Containers are portable compartmentalizations of an operating system, software, libraries, data, and/or workflows. Containers offer portability and reproducibility.
- Portability: containers can run on any system equipped with its specified container manager.
- Reproducibility: because containers are instances of prebuilt isolated software, software will always execute the same every time.
Containers distinguish themselves through their low computational overhead and their ability to utilize all of a host system’s resources. Building containers is a relatively simple process that starts with a container engine.
Container engines¶
Docker is the most widely used container engine, and can be used on any system where you have administrative privileges. Docker cannot be run on high-performance computing (HPC) platforms because users do not have administrative privileges.
Apptainer (formerly Singularity) is a container engine that does not require administrative privileges to execute and was developed specifically for HPC environments. Therefore, it is safe to run on CCR's HPC platforms. Docker images are widely available for many software packages, therefore a common use case is to use Apptainer to run Docker images. Users can also build Apptainer containers to run on CCR's clusters.
Apptainer¶
Apptainer is a containerization software package that does not require users to have administrative privileges when running containers, and can thus be safely used on Research Computing resources. Much like Docker, Apptainer is a containerization software designed around compartmentalization of applications, libraries, and workflows. This is done through the creation of compressed images in the .sif format which can be run as ephemeral containers. Unlike Docker, however, Apptainer does not manage images, containers, or volumes through a central application. Instead, Apptainer generates saved image files that can either be mutable or immutable based on compression.
Setting up Temporary Storage Directories¶
Images generated by Apptainer can be large. It is common for containers to exceed 10GB. By default, Apptainer stores cache files in /user/[YourCCRusername]/.apptainer/cache This means you can easily exceed your 25GB home directory quota while pulling a couple of containers! In addition to quota problems, you may also see error messages when pulling containers from remote repositories mentioning how /tmp does not have write permissions. These problems and more can be avoided by setting an environment variable before pulling containers. Below are several examples of how to do so. The first demonstrates how to set the Apptainer cache directory to a subdirectory in your group's shared project directory. NOTE: this may not be in /projects/academic as shown in this example and you'll need to make sure this subdirectory is created before running Apptainer:
export APPTAINER_CACHEDIR=/projects/academic/[YourGroupName]/[CCRusername]/cache
Alternatively, you can set the cache directory to your job's Slurm temporary directory which gets set to /scratch/$JOBID and is automatically deleted when your job ends. This may result in faster container downloads but if your group is using the same container for multiple builds, you'll want to use your shared project directory for the APPTAINER_CACHE location as shown above.
export APPTAINER_CACHEDIR=$SLURMTMPDIR
Pulling Images¶
Pulling images from public repositories is often the easiest method of using a containerized application. Be aware large containers can take a long time to download. We recommend you pull containers on a compute node in a running job. The compile nodes can be used as well; however, very large containers may not successfully build. Apptainer is not available on the CCR login nodes. See more on node types in CCR's Cluster documentation.
We can use the apptainer pull command to remotely download our chosen image file and convert it to the Apptainer .sif format. The command requires the container registry we would like to use, followed by the repository’s name:
apptainer pull <localname>.sif <container-registry>://<repository-name>
Where <localname>.sif is the name you choose for the Apptainer image.
A container registry is simply a server that manages uploaded containers. Docker Hub is the most widely used register. To pull a container image from Docker Hub:
apptainer pull docker://another:example
Running a SIF image as a container¶
SIF images can be run as containers much like Docker images. Apptainer commands, however, follow a bit more nuanced syntax depending on what you’d like to do. After pulling your image from Docker Hub you can run the image by using the apptainer run command. Type:
apptainer run <image-name>
Running a container will execute the default program that the container developer will have specified in container definition file. To execute specific programs in your container, we can use the apptainer exec command, and then specify the program:
apptainer exec <image-name> <program>
Much like specifying an application in Docker, this will allow a user to execute any program that is installed within your container. Unlike Docker however, you do not need to specify a shell application to shell into the container. We can simply use the apptainer shell command:
apptainer shell <image-name>
Example:
Say we have an image that contains python 3.7 as the default software, and we want to run python from the container. We can do this with the command:
apptainer run python-cont.sif
If the default application for the image is not python we could run python as follows:
apptainer exec python-cont.sif python
File Access¶
By default, when using Apptainer, only /user/$USER (your home directory), is available within any given container. This means that a user will need to bind any other required folders to the container’s directory tree. Furthermore, a container will also have access to the files in the same folder where it was initialized ($PWD).
To bind any additional folders or files to your container, you can utilize the -B flag in your Apptainer run, exec, and shell commands:
apptainer run -B /source/directory:/target/directory sample-image.sif
/projects:
apptainer run -B /projects/academic/[YourGroupName]:/projects sample-image.sif
Alternatively, you can bind directories by utilizing the APPTAINER_BINDPATH environment variable. Simply export a list of directory pairs you would like to bind inside your container:
export APPTAINER_BINDPATH=/source/directory1:/target/directory1,\
/source/directory2:/target/directory2
Then run, execute, or shell into the container as normal.
Building Images with Apptainer¶
Compute Node Use Only
We recommend you build and pull containers on a compute node in a running job. Apptainer is not available on the CCR login nodes and some features may not work on the compile nodes. See more on node types in CCR's Cluster documentation
In the event that a container is unavailable for a given application, you may need to build your own container from scratch. Apptainer allows a user to build images using a definition file. Just like a Dockerfile, this file has a variety of directives that allow for the customization of your image. A sample image would look something like this:
Bootstrap: docker
From: ubuntu:24.04
%files
source-file /opt/destination-file
%post
apt-get update -y
apt-get install -y nano
apt-get install -y gcc
%environment
export PATH="/opt/my-program/bin:$PATH"
%runscript
echo “hello! I am a container!”
%labels
Author Your Name
ContactEmail myemailaddress@institution.edu
Name My First Container
%help
I am help text!
This definition file contains examples for the following sections:
| Section | Description |
|---|---|
| Header | Choose an existing container to start from. |
| %files | Copy files from your system into the container. |
| %post | List of steps to install and configure software. |
| %environment | Set environment variables within the container to help find and run installed software. |
| %runscript | Set an optional default behavior to run with the container. |
| %labels | Add information or metadata to help identify container and its contents. |
| %help | Add text to help other users run the container. |
Header¶
The first section is called the header. This must be the first section of your definition file and is required to build a container. The example above uses an Ubuntu docker container as its base. There are configurations you can use for the header. This example demonstrates how to use a docker container as the base image.
Files section¶
The %files section is used to copy files from the machine that is running Apptainer (the “host”) into the container that Apptainer is building. This section is typically used when you have the source code saved on the host and want to extract/compile/install it inside of the container image.
The syntax for the files section is:
%files
file_on_host file_in_container
file_on_host is in the same directory as the .def definition file, and where file_in_container will be copied to the container’s root (/) by default. You can instead provide absolute paths to the files on the host or in the container, or both. For example:
%files
/home/username/my-source-code.tar.gz /opt/my-program-build/my-source-code.tar.gz
Post section¶
The %post section contains any and all commands to be executed when building the container. Typically this involves first installing packages using the operating system’s package manager and then compiling/installing your custom programs. Environment variables can be set as well, but they will only be active during the build (use the %environment section if you need them active during run time).
The example definition file uses an Ubuntu container as it's base, so all additional software is installed with apt-get. The additional -y flag is used to automatically accept any interactive dialogue, as container builds are non-interactive. Without this flag, the container build will hang and eventually fail.
After installing dependencies, you can proceed with the necessary steps to install and configure your software. If using the default installation procedure, your program should be installed in and detectable by the operating system. If not, you may need to manually set environment variables to recognize your program.
Build failures
At the moment not all commands in the post section can run successfully on CCR's cluster due to privilege issues. These errors may be resolved with the --ignore-fakeroot-command flag when using apptainer build, though in many cases this will not work. If you are running into build failures due to this issue, you will need to build your Apptainer container on your local machine and then transfer it to the cluster.
Environment section¶
The %environment section can be used to automatically set environment variables when the container is actually started.
For example, if you installed your program in a custom location /opt/my-program and the binaries are in the bin/ folder, you could use this section to add that location to your PATH environment variable. The example definition file demonstrates this syntax.
Runscript section¶
The %runscript section allows you to set optional default run behavior for the container. The commands in this section are placed in a special file within the container which is executed when using the apptainer run command. This default behavior of a container is overwritten when using the apptainer exec comnmand. It is not required to create a runscript to successfully build or run software with a container, though it may be helpful in some cases.
Labels section¶
The %labels section can be used to provide custom metadata about the container, which can make it easier for yourself and others to identify the nature and provenance of a container.
The syntax for this section is:
%labels
LabelNameA LabelValueA
LableNameB LabelValueB
apptainer inspect my_container.sif.
Help section¶
The %help section can be used to provide custom help text about how to use the container. This can make it easier for yourself and others to interact and use the container. You can inspect the help text for a container with the command apptainer run-help my-container.sif.
Additional sections¶
Apptainer definition files have many sections available when building definition files. The above example highlights a small subset to quickstart building Apptainer containers. For a full accounting, as well as more thorough guidance on using the sections covered above, please refer to Apptainer's documentation here.
Building the container¶
Once you have written your Apptainer definition file, you can build the application with the apptainer build command, as follows:
apptainer build <localname>.sif <recipe-name>.def
Did your build fail?
Not all containers will build on our systems. If you run into an error, you may need to install Apptainer on your local machine and create your container there. Then upload it to CCR's systems. Instructions to do this can be found in the documentation for Apptainer.
Now, run the container you just built:
apptainer run test.sif
“hello! I am a container!”
Building MPI-enabled images¶
MPI-enabled Apptainer containers can be deployed on CCR's systems with the caveat that the MPI software within the container may have a similar (not necessarily exact) version of the MPI software available on the system. This requirement diminishes the portability of MPI-enabled containers, as they may not run on other systems without compatible MPI software. Regardless, MPI-enabled containers can still be a very useful option in many cases.
Here we provide an example of using a gcc compiler with OpenMPI. CCR's system uses an Infiniband interconnect. In order to use an Apptainer container with OpenMPI (or any MPI) on the cluster, OpenMPI needs to be installed both inside and outside of the container. More specifically, the same version of OpenMPI needs to be installed inside and outside (at least very similar, you can sometimes get away with two different minor versions, e.g. 2.1 and 2.0).
Once you’ve built the container with one of the methods outlined above, you can place it in your home or project directory and run it on a compute node. The following is an example of running a gcc/OpenMPI container with Apptainer. The syntax is a normal MPI run where multiple instances of an Apptainer image are run. The following example runs mpi_hello_world with MPI from a container.
module load gcc/11.2.0
module load openmpi/4.1.1
mpirun -np 4 apptainer exec openmpi.sif mpi_hello_world
GPU-enabled Containers with Apptainer¶
It is possible to run GPU workloads within Apptainer containers. To do so, merely add the --nv flag when you use run or exec commands like so:
apptainer run --nv <image_name>.sif
NVIDIA also hosts a number of containers as part of their own container library. This will allow you to run more up-to-date versions of software like PyTorch which may have older versions installed in CCR's software environment. These can be pulled and run by Apptainer, though the pulling process can take several hours depending on the image size. As stated previously, we recommend you do this on a compute node in a running job.
NOTE: The URL format for using the NVIDIA container registry with Apptainer is:
docker://nvcr.io/nvidia/name:version
ARM64 Containers¶
NVIDIA makes available containers for the ARM64 CPU architecture. CCR has ARM64 processors available in the arm64 partition of the UB-HPC cluster. Research groups interested in utilizing these nodes must request an alloction in ColdFront for this partition. When attempting to pull ARM64 containers (these are named with the suffix -igpu) from NVIDIA's container library, you must do this from an ARM64 node or the build will fail.
Tip
When running jobs on the arm64 partition, please be aware of additional environment setup required.
Example GPU container workflow¶
Check out this video for an overview of using Python in containers, virtual environments and containers, and containers from NVIDIA:
The Python-based software applications that we get the most requests for (i.e. Pytorch and Torch Lightning) are updated frequently and aren't particularly easy to install with Easybuild. For this reason, we recommend utilizing the NVIDIA containers that are available for free, are updated frequently, and work on CCR's systems. NVIDIA provides a Framework Containers Support Matrix which has information on what software versions are included in the containers as well as what type of prerequisites the containers may have. The deep learning framework container packages follow a naming convention that is based on the year and month of the image release. We recommend using the 24.xx and 23.xx versions but the 22.xx containers should also work; they just have older versions of Ubuntu and Python.
Tips on how to navigate the NVIDIA catalog:
- Looking at the Pytorch information, we see a description of the package on the main page that includes instructions on how to run the container, what is in the container, and more on the Pytorch software itself (NOTE: container run commands are in Docker format which is slightly different than Apptainer)
- On the left is the latest version name (tag) and when it was published. This tag name is version that you'll input in the pull, run, and exec commands
- If you want a different version than the latest, click on the
Tagstab at the top of the page - Make sure to select a version that supports AMD64 architecture (not the containers named
-igpu), unless you're using anarm64node
CCR currently provides Pytorch 1.13.1 as a software module, which is from October 2022. There have been many releases since then and much interest in Pytorch 2. For this example, we will pull a container with Pytorch 2.6 installed.
1: Login to CCR's HPC environment
2: Request resources to run an interactive job. These containers can be quite large and this process can take quite a long time. Make sure to request a job walltime of at least a few hours.
3: Once on the compute node, you'll run the following to set your cache directory and pull a container from the NVIDIA library:
export APPTAINER_CACHEDIR=/projects/academic/[YourGroupName]/[CCRusername]/cache
cd /projects/academic/[YourGroupName]/[CCRusername]/container-directory #This should be whatever directory you want to store your container in
apptainer pull docker://nvcr.io/nvidia/pytorch:25.08-py3
4: When this completes you should see a file in your directory named like the container version you've downloaded with the .sif extension. Let's shell into the container. Once in the container, check out the version of python you've got, load pytorch and check the version:
apptainer shell --nv pytorch_25.08-py3.sif
Apptainer> which python
/usr/bin/python
Apptainer> python --version
Python 3.12.3
Apptainer> python
Python 3.12.3 (main, Nov 6 2024, 18:32:19) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
2.8.0a0+34c6371d24.nv25.08
>>> exit()
Apptainer> exit
exit
5: Now you've got an updated container, running a much newer version of pytorch than what CCR provides! That's great but what do you do if you want to install additional software? We recommend using a python virtual environment. You can store this virtual environment in your group's project space so that it is outside the container and backed up. If you're not familiar with virtual environments, check out CCR's virtual environment documentation. For this example, we'll install a popular python package imageio in a virtual environment and access it in the container. This time when we start our container, we're going to bind mount our group's project directory so we can access it in the container.
Planning to use this in Jupyter?
In the case of creating a virtual environment to use with a Jupyter Notebook, it is important to bind mount your project directory in the container using the full path. If that isn't done, when attempting to access the kernel in a Jupyter notebook, the virtual environment will not be accessible.
apptainer shell --nv -B /projects/academic/[YourGroupName] pytorch_25.08-py3.sif
Apptainer> cd /projects/academic/[YourGroupName]
Apptainer> python3 -mvenv --system-site-packages myenv
Apptainer> source myenv/bin/activate
(myenv) Apptainer> pip install imageio
... installation messages ...
Successfully installed imageio-2.37.0 numpy-2.2.2 pillow-11.1.0
(myenv) Apptainer> python
Python 3.12.3 (main, Nov 6 2024, 18:32:19) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import imageio
>>> print(imageio.__version__)
2.37.0
>>> exit()
Apptainer> exit
NOTE: In CCR's virtual environment documentation, we do NOT use the --system-site-packages option when creating a virtual environment in our example. Here we ARE using this option because we want the virtual environment to use all of the Python packages that come pre-installed in the NVIDIA container. We do NOT want to do this when using CCR's software environment modules because we may see conflicts between the different Python packages that get installed.
6: Optional: Do you need to use this container and virtual environment with the OnDemand Jupyter app? If so, you'll need to install ipykernel in the virtual environment and create a kernel. See CCR's Jupiter documentation for instructions.
7: Test this out using a GPU node:
apptainer shell --nv -B /projects/academic/[YourGroupName] pytorch_25.08-py3.sif
Apptainer> source /projects/myenv/bin/activate
(myenv) Apptainer> python
Python 3.12.3 (main, Nov 6 2024, 18:32:19) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import imageio
>>> print(torch.__version__)
2.8.0a0+34c6371d24.nv25.08
>>> print(imageio.__version__)
2.37.0
>>> num_of_gpus = torch.cuda.device_count()
>>> print(num_of_gpus)
1
>>> exit()
IMPORTANT NOTES:
This virtual environment is accessible outside of the container as well as inside, but some key things to keep in mind:
- Make sure to create the virtual environment IN the container.
- The virtual environment includes system site packages which will only work inside the container. So although you can access it outside of the container, it can really only be used inside.
- Make sure to specify the full path of the virtual environment python executable in any scripts you run otherwise it will automatically use your container's python. In this example that would be:
/projects/academic/[YourGroupName]/myenv/bin/python3 - Not all packages will install correctly inside a virtual environment. See CCR's virtual enviroment documentation for more information.
- Because you are intentionally attempting to use already installed python packages with your virtual environment, you may run into conflicts. In that event, you may need to create your own container from scratch installing all the packages you need rather than using a pre-built one from NVIDIA. However, you can start with the NVIDIA container of your choice as the base image.
- If you're using a container, we do NOT recommend also using modules CCR's software environment. These will most likely conflict.