12 Jun 2025 - Douglas Thain
The Floability Project is an NSF funded research project to enable the rapid and portable deployment of notebooks expressing complex scientific workflows across a wide range of cyberinfrastructure. Our research team is a collaboration between the University of Notre Dame, the University of Missouri-Columbia, and the University of Illinois.
I recently gave this talk at the University of Wisconsin to introduce Floability and the key concept of a Backpack:
The Floability project aims to address the gap that exists between two worlds. Interactive Notebooks are widely used to develop, visualize, and share data analysis codes: they are convenient and interactive, but ultimately limited in computational power. HPC and HTC Clusters provide facilities for scalable computational power, but are accessed in a batch oriented manner through command line tools. Given that a wide swath of scientific investigation begins in the notebook phase, how can we connect these two worlds so a notebook workflow developed on a laptop can be made to scale up on a cluster.
Deployment of complex notebooks is a significant challenge for even experienced users. A notebook containing a complex workflow depends on a rather complicated environment that is provisioned outside of the notebook itself: a notebook might require a set of Python libraries, a body of executables, the notebook software itself, large scale data needed for the analysis, and specific computing resources (cpus, gpus, memory, disk, etc) in order to succeeed. Simply moving a notebook file from here to there won’t work.
The backpack is the key organizing principle to solve this problem. A backpack contains a notebook along with a declaration of all the other resources needed to execute that notebook: workflow, software, data, and cluster resources. You can see some examples of backpacks in the https://github.com/floability/floability-examples repository.
And now, the Floability tool takes a backpack specification and deploys it into a cluster. This requires instantiating an environment (e.g. Conda environment, Docker container, virtual machine) on a cluster head node, containing the notebook and its software and data dependencies. And, it also includes provisioning multiple worker nodes in the cluster with their distinct software and computing resources. These nodes run workers from a framework such as Taskvine, Dask, or Parsl, and communicate back to the head node to form a complete system. Once deployed, the user can interact with the notebook in the normal manner.
A particular challenge when constructing a backpack can be determining what the actual software and data dependencies of the application are.
The SciUnit technology from the DICE Lab at Missouri is the key to solving this problem. A notebook with unknown dependencies can be run in
an audit
mode to trace the libraries and files actually used. These can be moved into the backpack verbatim, or converted into explicit
pip
or conda
dependencies.
The Floability project is still in the early stages. We have made our first software release which you can install via Conda and check out some example backpacks containing applications in geosciences and high energy physics. We are continuing to develop the technology and applications, so stay tuned for more developments.
« Prev: Getting Beyond Stack Overflow for CS Students