March 26, 2019 - by Tim Robinson, CSCS
Traditionally, high-performance computing users have interacted with supercomputers through command-line interfaces, with compute-intensive jobs scheduled in “batch mode” via workload managers such as Slurm. Driven in part by the emergence of data science as a dominating field – and the convergence of Big Data and HPC – CSCS has recently seen increased efforts to bridge the gap between scientific computing on the desktop and in the supercomputing worlds. One such effort – the Jupyter Notebook – has gained significant interest and adoption in recent years both as a research tool and as a teaching aid.
The Jupyter Notebook is an open-source web-app that allows users to write portable documents containing executable code, narrative text, and equations, and to visualize the results of running the code directly in the web browser. The name comes from a combination of the three core programming languages of Jupyter – Julia, Python and R – though Jupyter is not limited to these languages. Jupyter ships with IPython as its default programming language (“kernel” in Jupyter parlance), but developers have contributed kernels for dozens of other languages, ranging from Haskell to Co-Array Fortran.
The Jupyter Notebook is maintained by a nonprofit organization called Project Jupyter, who develops open-source software and services for interactive computing. Project Jupyter is also responsible for JupyterHub – software that spawns and manages notebooks in a multi-user environment – and Jupyter Lab, which is the next-generation version of the Jupyter Notebook. CSCS has provided a JupyterHub service since the middle of last year, giving users the ability to spawn Jupyter Notebooks on the compute nodes of our flagship Cray XC40/ XC50 supercomputer, Piz Daint.
CSCS is pleased to announce a number of enhancements to the JupyterHub service, accessible at https://jupyter.cscs.ch. Firstly, CSCS has made JupyterLab the default notebook format spawned from JupyterHub. JupyterLab uses the same notebook document format as the classic Jupyter Notebook, but – amongst many other advantages – it offers the ability to work with multiple documents (or other activities) side by side in the work area using tabs and splitters. Secondly, the ability to use MPI (Message-Passing Interface) across single or multiple-node notebooks via IPyParallel and MPI for Python (mpi4py) has been added. Finally, support for Distributed Dask – a dynamic task scheduler for parallel, distributed computing in Python – has been enabled.