We help UGent with automating the installation of various scientific software packages for their High-Performance Computing (HPC) platform, relying on the EasyBuild framework. We primarily add support for new software based on the current needs of their scientists. So far, we have added (or kept up to date) 215 packages and helped to implement new features to the EasyBuild framework itself.
What is EasyBuild?
EasyBuild is a package management tool for building and installing software. Its primary focus is on managing scientific software on HPC systems. The image bellow is an example of an High-Performance Computing system at UGent.
This HPC system is part of the central IT department of UGent and is used by researchers from UGent, industry and various other knowledge institutes.
EasyBuild currently supports about 3.400 software packages in multiple versions. The total amount of software recipes is almost 20.000, including essential scientific software like TensorFlow or AlphaFold, optimized for high performance on HPC clusters.
Unlike conventional package managers (Apt, DNF, or similar), which install pre-compiled software, EasyBuild compiles each package directly from source code. While slower and more error-prone, compiling from source code is often necessary on HPC systems.
Scientific computations are usually very demanding. If you recall when your computer last froze because you clicked on the wrong button and waited with a spinning cursor – scientific software is just like that, only you are often left waiting for the result overnight or over the weekend.
And while HPC hardware is already fast, each piece of software needs to be compiled with custom settings to use the full capabilities of the machines.
Each cluster of machines in the HPC center is optimized for a different purpose and needs different settings. Configuring and correctly compiling software for a specific system is a large part of EasyBuild. These customizations often include:
using the best CPU instruction set for vectorization (SIMD, e.g., AVX-512),
accelerating numeric computations using the fastest matrix calculation library (Intel MKL, OpenBLAS),
making the software run in parallel on many processors and nodes (OpenMP, MPI),
or delegating calculations to GPUs (CUDA, OpenCL).
EasyBuild fully automates the installation process. When installing a software package, it makes sure it knows how to build the package and its dependencies. It then downloads the source code for all components and runs the build procedure for each component, often using build tools like Make, CMake, Pip, or others. Finally, it runs the set of tests for each package and ensures the installation is successful.
EasyBuild is increasingly becoming a reference tool for scientific software installation in HPC centers around the globe. Notably, LUMI, the consortium around Europe’s most powerful supercomputer and top-3 HPC system globally, has recently announced to systematically rely on EasyBuild as primary installation tool.
EasyBuild is a complex tool, and there are always potential improvements to make: adding new functions, improving the tool’s ergonomics, or supporting new types of software. For example, the ongoing project EasyStack aims to manage all the software on an HPC cluster in a centralized fashion, in a single configuration file.
These improvements make EasyBuild a better tool for scientists and package managers, but developing them takes time and resources. Inuits aid UGent and EasyBuild maintainers in the day-to-day work of packaging up new scientific software, which helps them focus on processing issues related to the HPC infrastructure and the strategic work of improving EasyBuild itself.
EasyBuild and the software managed by EasyBuild use a wide variety of technologies. Working on them requires someone who has a broad base of knowledge and who is not afraid to dive deep into unknown and sometimes ancient source code. At Inuits, we pride ourselves on our expertise in various technologies, and we are always eager to learn more.
Our most common task is preparing new software for installation. When a scientist needs a package that EasyBuild does not yet know how to install, we receive the request as an issue in a shared GitHub repository. What follows is a period of research and investigation, where we need to figure out how best to install it.
It is relatively straightforward with new software with up-to-date instructions. Still, sometimes the request is for, e.g., an ancient Fortran package a dozen years old. Such software requires adapting its installation instructions, diving deep into its source code, and changing (patching) it to work in the current year.
When we figure out the instructions, we need to translate them into a form that EasyBuild can process. The installation instructions consist of two parts:
EasyBlock: a generic set of instructions that usually has the commands for a single build tool.
EasyConfig: contains the specifics of a single software package: the name, version, source code location, and dependencies.
An example: let’s make soup
Let’s assume we want to make and serve pumpkin cream soup to use a cooking metaphor. We would use a Soup EasyBlock, which would contain the generic instructions: we need a pot, a stove, and a knife; we need to chop most ingredients, cook them in the pot for a certain amount of time, and pour the result into a bowl before serving.
A PumpkinCreamSoup, on the other hand, would tell us that this specific soup requires a pumpkin, some onions, garlic cloves, and some seasonings. We need to cook for 40 minutes, and also there’s an additional step: after cooking, we need to use a blender to make it a cream soup. To make sure everything went well, we’ll taste the soup and make sure it is a little sweet – but not too much.
An EasyBlock, then, is a general instruction template for a broad class of software that we fill in with the specifics later in an EasyConfig. We’re only creating new EasyConfig files most of the time, as software that requires entirely custom installation instructions is quite rare.
How we work together
We share our work in progress in a GitHub repository shared with UGent’s HPC team. When necessary, we collaborate on more challenging tasks. Sometimes, we are invited to contribute to the EasyBuild framework when there are fewer software requests. Most recently, we have contributed to the above-mentioned EasyStack project.
The first question a researcher asks when beginning work on an HPC system is “Is my software available?” When that is not yet the case, the researcher would have to invest time in getting the software up and running themselves. At UGent we aim to provide a central software installation as a service to our researchers. As much and as swiftly as possible, we do the central software installation for them, so the researcher can save time.
Living up to that philosophy requires tremendous efforts however: the UGent HPC systems now have more than 3000 distinct scientific software packages installed centrally, and the number of requests from researchers is steadily growing. It is difficult to get ahead of the ever-increasing demand.