March 26, 2025 – Interview by Santina Russo

Torsten Hoefler, first of all, congratulations! What a phenomenal thing for you to win the ACM Prize. What was your first reaction when you were told?

I was completely blown away. Truth be told, I had trouble believing it at first. I needed a few days for the reality to sink in, and my mind is still catching up. But the more official emails related to the Prize I received, such as from the official Nobel Prize photographer who will take my picture for the list of laureates, the more it became real.

What does the Prize mean for you personally — and can you already estimate what it will change for you professionally?

I am of course very proud. It is a very satisfying feeling to have been chosen for this distinction. But it’s still not easy for me to wrap my head around it’s full significance. A friend told me: Use it wisely. What that means or will mean for me specifically is an interesting question in itself. I still have to find out. Concerning the near future, I know that winning the prize also means travelling a lot. I will give presentations and speak to fellow scientists all over the world to disseminate the advancements we made in artificial intelligence (AI) and high-performance computing (HPC). But I believe the biggest professional impact of the prize will be for the members of my group.

What were the achievements that earned you the prize?

It was not one thing but many pieces, mostly developed collaboratively with other scientists or my group members, all of which advanced and accelerated the training of AI models on HPC systems by a factor of 10 to 1000. Considering that training machine learning (ML) models on vast computing infrastructures is one of the most substantial investments humanity is currently making, and that countries and IT companies are investing tens of billions of dollars, it’s evident that speeding up training of ML models has a huge impact.

Could you explain the principles of these developments?

The earliest developments concerned network design and programming, which are crucial because the machines for AI training are the largest worldwide — like ‘Alps’ at CSCS, which is currently the biggest AI-capable supercomputer in Europe. The real key to building these machines is the question of how best to connect their thousands of GPU accelerators to train and run AI models as efficiently as possible. I have spent a large part of my career building interconnection networks for HPC, and some of the developed principles are now the basis of supercomputers like ‘Alps’. Another part of connecting GPUs is programming them at the full system level. Together with others, I developed key parts of a collective communications library for a programming interface called Message Passing Interface (MPI), which enables coordination of many processors to work on one application.

What about accelerating ML training, as you said?

Here, the idea is to reduce complexity when it’s not needed. The first of these principles concerns quantization — that is, the level of detail of the individual vector numbers that build up ML models. The question is, how many bits are required to store these numbers, how accurate do they really need to be? The default is 32 bits, even 64 bits. However, for many parameters, this accuracy is not needed and consequently wasteful. Incidentally, our brain works with only about 4 bits. The solution we pioneered together with collaborators in 2017, was to selectively represent some parameters using only 4 or 8 bits, resulting in a factor 10 acceleration of an ML-model’s training. The second idea involves sparsification: Instead of including the entire AI neural network, the AI algorithm is streamlined to only create connections between neurons that are relevant for solving the specific problem. That often only involves a small part of the network, the rest can be neglected. So instead of activating the entire AI neural network, the relevant parameters are distributed sparsely across the model, which again saves computing resources. Together, these individual pieces of work have substantially advanced AI training. And these ideas are now even applied to accelerate scientific computing.

You are the first scientist working in mainland Europe to ever win the ACM Prize for Computing. Considering the rapid developments in AI worldwide — what does you winning the prize mean for AI in Switzerland and Europe?

It validates the ideas and ambitions we have in not letting the United States and China be the only ones driving the field of AI. And it certainly shows that we can do excellent and influential work in Switzerland. The prize is reflecting also on CSCS, as it was awarded for the exact things that are now most important at CSCS, especially with the ‘Alps’ supercomputer’s powerful AI capabilities, which will have a strong impact on the entire Swiss computer science and research landscape.

Last question: What challenges are you tackling next?

A big one is the jump from large language models (LLM) like ChatGPT to reasoning language models. Fact is that we will soon run out of data to train LLMs with and will need to rely on synthetic data to train and build natural language models. For models to reason based on synthetic data, they need to be capable of learning completely by themselves, with none or very little human supervision. We are at the beginning of this exciting innovation, and my group and I look forward to contributing. Another big area is AI for science. In my group we are working on how to improve weather and climate simulations using AI, for example. Maybe an even bigger topic is AI for health. We are working on a foundation model driven by medical imaging data that captures the entire biomechanics of the human skeleton — as a basis for surgeons to more systematically understand what they need to do. With AI we can reach the next level in these and many other fields. Now is the time to figure out how we can accelerate scientific and social progress using these new data-driven methods. Maybe the biggest challenge lies in accepting and getting used to the fact that we don’t need to fully understand how these methods achieve their result for them to work. They will still support science and accelerate progress.

Torsten Hoefler

is a Professor of Computer Science at ETH Zurich, where he directs the Scalable Parallel Computing Laboratory (SPCL), and the Chief Architect for AI and Machine Learning at the Swiss National Supercomputing Centre (CSCS). He received his PhD in Computer Science from Indiana University in 2008 and obtained his first professor appointment at the University of Illinois Urbana-Champaign. Since 2012, he has been at ETH Zurich, where he was granted tenure in 2017 and promoted to full professor in 2020. He is an ACM Fellow, a IEEE Fellow, and a member of Academia Europaea. Among other distinctions, he received the ACM Gordon Bell Prize in 2019.

The ACM Prize in Computing

The ACM Prize in Computing is awarded by the Association for Computing Machinery (ACM) to recognize individuals for early- to mid-career fundamental and innovative contributions to computing. Honourees are selected for achievements that, through their depth, impact, and broad implications, exemplify the highest accomplishments in the discipline. The award includes a prize of 250,000 US dollars. Alongside the Turing Award, it is considered the highest distinction in the field of computing, and the highest recognition given to mid-career scientists.

Read the news article by ETH Zürich on Torsten Hoefler winning the ACM Prize in Computing.