Primeur Magazine: Can you tell us a little bit about 20 years ago? How was it then with supercomputing? Do you still remember?
Erich Strohmaier:Certainly I remember quite some things. Looking back and looking at today, I actually see that there are a lot of similarities in the situation which we had in the early nineties when we started the TOP500 project in 1993 and today. In the early nineties we had the attack of the killer machines and we had a lot of changes in the landscape of computers. There were a lot of newcomers, new companies building systems like the Thinking Machines, the Connection Machine, the Intel IPSC hypercube, and the Kendall Square machines. A lot of architectural innovation was going on. The Cray Y-MP and the T90 were still the big old guards of the old type of architectures but there was a lot of innovation going. It was driven by the old technology with ECL as a chip technology having reached its limits now today.
We were in a very similar situation in some ways in the mid-2000s. Then the scaling broke down and now we are approaching the end of Moore's law. The effect is the same as in the early nineties. There is a lot of innovation going on in terms of architectures. We see a lot of companies developing chips - nowadays often specialized for applications like artificial intelligence and machine learning - but again, there is a lot of innovation going on. This is reflected in the TOP500, again by us seeing a lot of new architectures coming into the list. Customers, especially research laboratories, being willing to try out these architectures to get a advantage and to make progress in their ability to advance science and to produce more cycles for science.
Primeur Magazine: At ISC 2020 Digital, you presented the latest TOP500 and especially the Top 10 was completely shaken up with four newcomers and a new machine on position one.
Erich Strohmaier:Yes, we have four newcomers in the Top 10. The Top one position has been captured by the Fugaku supercomputer, which is co-developed by the Japanese Research Institute Riken R-CCS and by Fujitsu. The new number one is almost three times as fast as the previous number one, the Summit system at Oak Ridge National Laboratory with IBM and with NVIDIA chips. For Fugaku, another aspect is interesting, because not only it is a Japanese system but especially because it is clearly a new architecture which in some ways is very reminiscent of old architectures. It uses ARM cores on the node as the compute cores but they are improved. The developers have added very wide electrodes on those cores which provide a lot of performance to the chip overall. That is reminiscent to the architectures we had in the eighties and nineties. The basic architecture is very well understood. Fujitsu has combined it with the third generation Tofu Interconnect D, the proprietary interconnect of Fujitsu jitsu which stands for "toroidal fusion network". Together, between the torus fusion open network and the hybrid memory cubes, Fugaku is using poor memory for achieving very high memory bandwidth. Between the vector enabled on cores Fugaku achieved almost half an exaflop on the Linpack benchmark.
The system is also very interesting because the vector lanes are built in a way that they are using the precision to improve the performance. They have demonstrated this by running a new benchmark, the HPL-AI, which is used for machine learning. Fugaku is very well positioned to be used for machine learning and artificial intelligence.
Primeur Magazine: Fugaku also has a special interconnect?
Erich Strohmaier:Yes, it has a new version of the proprietary Fujitsu interconnect called Tofu. Fujitsu has developed Tofu for the K computer and has been using it in their own line of supercomputers. Tofu D is the newest version of this and it is tightly integrated on the nodes of the Fugaku supercomputer and the corresponding Fujitsu commercial product. It is one of the reasons that Fugaku can achieve such high performances and efficiency in a lot of codes.
Primeur Magazine: There are also other new machines in the Top 10. Can you tell us a bit about them?
Erich Strohmaier:One system which comes to mind which is Selene at rank 7, which is built by and installed at NVIDIA in the United States. This already gives a first indication because NVIDIA is so big in the machine learning market these days. This gear towards machine learning is probably one of the applications they are using not just for traditional floating-point computations but machine learning applications where the NVIDIA GPUs are certainly very prominent in the market these days.
Erich Strohmaier:Indeed, just listed before the NVIDIA machine is the other second new commercial system in the list which is the HPC5 in Italy at rank 6. It is installed at a commercial company, Eni SpA. It is one of the highest positions any such system installed at a commercial customer has ever achieved in the list.
Primeur Magazine: If we take a broader look, how are installations at commercial sites doing overall?
Erich Strohmaier:The commercial market over the years has become the bread and butter market for large scale computing systems. However, at the very top end in the top 50 positions, we have mostly research installations. If they can, research centres buy larger systems than industry typically does. Industry is more oriented on cost efficiency. They do not buy machines for prestige, for really pushing to the edge as much as government-funded research institutions typically do. The two marketplaces, the industrial and commercial marketplace, and the government, research and academic marketplace do have very different profiles in terms of that they tend to buy different architectures. They tend to buy from different companies as well, although the distribution or the ratio between those two markets in different countries can be very different.
Primeur Magazine: If you look at the vendors for those systems, you can see quite a big difference there, don't you?
Erich Strohmaier:Yes, the commercial market nowadays is totally dominated by the three big Chinese manufacturers: Lenovo, Inspur and Huawei. These three are producing the lion share of systems for the commercial markets. At the same time, most of their systems are installed in China which makes China these days the biggest consumer of supercomputers in terms of number of systems. That is very different from the research market where the United States, Japan and Europe are still very dominant in terms of number of system. They tend to buy from American, European and Japanese manufacturers, not from Chinese manufacturers.
Primeur Magazine: The number one is also the first ARM-based system in the Top 10. Is that a change in architecture or is it just exchanging Intel for Arm?
Erich Strohmaier:The Arm architecture allowed Fujitsu to build a very cost-efficient base architecture which they could then accelerate and which they could then improve by adding their custom vector units to the Arm core itself. This certainly is something which is very appealing to a lot of vendors in the market to have an alternative to the very tightly controlled Intel-based chip architecture. The side effect of this is that the vector units in the Arm chip are very tightly integrated with the CPU itself. This is something which is much harder or not possible to do if you stick with Intel.
Primeur Magazine: Do you think that there will be a growth so that we see more and more of those Arm-based systems?
Erich Strohmaier:There have been rumours about it for some time so I would not be surprised but currently we only see four Arm-based systems in the TOP500, three of which are based on that new Arm chip by Fujitsu. These are Fugaku; a commercial system from Fujitsu from their new product line which is equivalent to Fugaku, installed at Osaka University; and the smaller test system of Fujitsu itself. The fourth one is actually an interesting one because it is installed in the United States and it is based on the Thunder Arm implementation. It is an American built and installed system. It is an alternative to the Fujitsu system.
Primeur Magazine: In Europe they are working on a processor in the European Processor Initiative (EPI), which is also based on Arm but uses a RISC processor to get speed. Did you have a possibility to look into that architecture?
Erich Strohmaier:I am not familiar in detail with that but a particular appeal of the Arm solution is that you can integrate any units you use for achieving your top speed and achieving performance. You can integrate that very tightly with the processor, be that vector units like Fujitsu did, be that something like an advanced RISC core or other things. You are not forced to put your extra CPU on the other side of the PCI bridge or anything like that. You can put it much closer to the core and that helps a lot with optimizing and reducing data movements and things like that. That is a big appeal of the Arm architecture and that is one of the big advantages happening. This makes me believe that we are going to see more of these solutions in the future.
Primeur Magazine: There is a lot of shaking going on in the Top 10. Does that hold for the whole TOP500?
Erich Strohmaier:This time actually no. This list is really special in the sense that we had a record low turnover. If you just look at the end of the list how many systems from the last list did not make it this time because they did not make the threshold and landed beyond 500, that number for this list is 51. This is almost by a factor of two smaller than anything we had seen ever in the past. It used to be 200 or 250 systems. The last five years or so, there are more of a 100 systems that fell off the list. The slowdown is due to the breaking up of Moore's law and how that affected the market. This time, however, we really saw a net decrease in the turnover in the list. Of course, the interpretation is open and it is hard to make an interpretation to be sure about it with one data point only. We think it is the coincidence of a continuous slowdown in the increase in speeds of the chips which slows down procurement cycles, together with delays in installations or even outright cancellations of new installations due to the current pandemic COVID-19 situation.
Primeur Magazine: The Fugaku system and other new systems are also geared towards artificial intelligence and especially machine learning. What are the developments there?
Erich Strohmaier:The general trend of course is that in the commercial market as well as in the research market the deep neural networks have caused some sort of a revolution which started a little more than 10 years ago. It has really picked up speed the last five years. Nowadays, many centres and many commercial customers are very interested in the topic. They want to be prepared for it from the hardware side. That is why the ability of an architecture to really accelerate machine learning algorithms has become very important. One aspect of that is that architectures which can reduce the precision in their calculations and gaining speed when they do, are really very advantageous. That is a general trend we are seeing in the market. The Fugaku is one example and Selene, the NVIDIA system is another example for this type of architectures.
The installation sites are very much getting ready for machine learning at large scale, even though currently architectures might still be different. Overall in the market place there are installations such as the Clouds, Google or Amazon, where companies are having their own custom accelerators. You can only access them if you sign into their Cloud. That is a different business model, that is basically a hardware-as-a-service business model. We are going to have to see how this plays out in the future, which of the business models will gain the outright ownership for the hardware-as-a-service and is going to win out in in this market. Big Data and particularly machine learning is certainly one of the very hot topics in high-performance computing in general these days.
Primeur Magazine: Thank you very much for this interview.