Primeur Magazine: People are always interested in the big machines and the top machines. What is new on this front?
Jack Dongarra:I guess the big exciting news is a new number one system, the Japanese machine called Fugaku. It is made by Fujitsu and sitting in Kobe at the RIKEN Institute. This machine will be the new number one in the TOP500. It is a big machine based on the Fujitsu implementation of the ARM processor. It has a Linpack number of 415 Petaflops. Its peak performance is 514 Petaflops. It has 152.000 nodes in it. Each node is composed of one of these ARM processors. There are no accelerators. It is not an accelerated machine. It is based on ARM but this version of ARM has a vector extension associated with it to do vector operations. It is a very impressive system in a number of ways. It has received a very high efficiency for Linpack. It has a very high efficiency for other benchmarks as well. Just looking at the number one machine, Fugaku with 415 Petaflops, we see that the number two machine is the Summit machine at Oak Ridge. This machine by comparison has 149 Petaflops for the Linpack benchmark. Thus, the difference is 415 compared to 149 Petaflops between the number one and the number two in terms of the performance.
There is another benchmark which we have called HPCG. HPCG is much more memory-intensive. It stresses the memory of these machines a great deal. For the HPCG benchmark, Fugaku achieved 13 Petaflops, which puts it at number one with the HPCG benchmark. By comparison the Summit, which was also at number one previously, achieved 2,93 Petaflops. The difference is 13 compared to 2,93 Petaflops between the number 1 and number 2 machines. That is a big difference. Fugaku achieved 2,5% of the theoretical peak performance for HPCG. Summit achieved 1,3 percent of its peak performance for HPCG. Those are the high end benchmark numbers.
There is one more benchmark that we have run on this machine, which is called the HPL-AI benchmark. HPL-AI is intended to stress short precision. It runs the Linpack benchmark. It carries out a factorization using 16-bit floating-point arithmetic and uses some mathematical techniques to enhance the performance to 64-bit performance. The performance of Fugaku for the HPL-AI benchmark was at 1,4 Exaflops. One has actually reached over an Exaflop for that benchmark. By comparison, the Summit achieved 0,55 Exaflops. This is quite a bit of difference between those two machines: roughly a factor of three in terms of performance for that benchmark.
The Japanese have put together a very impressive computer. It is a large scale machine, it uses no accelerators, is very good for a number of applications, having a very fast and very good interconnect between the processors. It has a very high bandwidth and has quite a bit of fast memory. It is up and running today. The people in Japan have a number of applications that have actually been run on this machine already. It just started its full operation in May. Just in the past months the machine has come up fully and they are getting real applications run on the system. It is a very impressive machine and we are looking forward to see how it performs in real scientific problems.
Primeur Magazine: Yes, it sounds indeed like a really impressive machine. In the US, when we understand correctly, we have to wait until next year to have a fast machine. In Europe, we have the EuroHPC machines, the pre-exascale machines. When we look at their specifications, we don't expect that they will be able to jump to the first place in the TOP500.
Jack Dongarra:When the real exascale machine come online, they will perhaps be in a position to be ahead of the Fugaku machine. The US Department of Defense (DoD) has three computers at exascale. They have the Aurora machine, the one that is going into Argonne; Frontier which is coming to Oak Ridge; and another machine, El Capitan, going into Lawrence Livermore. Those are the three exascale machines but those are not due until next year so we have at least one year to wait for those machines.
Primeur Magazine: The three big EuroHPC machines in Europe are neither due before the end of the year or perhaps early next year to come into production. Did you look at the specifications for those machines?
Jack Dongarra:I have not looked in detail at the specs for the European systems. The specs are interesting but until the machine is actually put in place and running, and ready to test out on a benchmark, it is difficult to get a full appreciation for how well it is going to perform.
Primeur Magazine: Absolutely. You mentioned already that the Fugaku system is quite interesting from an architectural point of view. There are also lots of other architectural changes in HPC in general. Can you tell a little bit about that?
Jack Dongarra:What we see are large numbers of nodes. Many machines use accelerators. That is continuing. The interconnects are getting faster. There is use of more high speed memory, and associated with it, stacked memory. The Fugaku machine is using stacked memory. All memory is on-chip, stacked and very high bandwidth associated with it. We see these machines with a very good injection bandwidth across the networks. We now see machines with not only 64-bit arithmetic and 32-bit arithmetic but also 16-bit arithmetic. Fugaku is a machine, like Summit, as native 16-bit floating-point arithmetic which is potentially useful for some applications, not all, but some applications. It is there, primarily because of the move towards machine learning that can get away with using that short precision. I would say the movement is in that direction of providing for different kinds of arithmetic: floating-point arithmetic and also integer arithmetic. I believe the Fugaku machine goes down to even lower for integer based arithmetic, again to stress the need for machine learning and AI. All of these machines are very large and have considerable memory and considerable energy requirements associated with that. This is something else that has to be taken into consideration.
Primeur Magazine: You already mentioned three benchmarks. Will that remain the trend to have several separate benchmarks to look at the different aspects of machines?
Jack Dongarra:I think it makes sense to have many benchmarks. The most important benchmarks are the applications that you run on your machines. So, getting benchmarks that match applications are the right thing. Some of the benchmarks that we have are historical. Linpack is an example of an historic benchmark. It provides some insight but it should not be used to really judge solely how fast the machine is going to run on modern applications. In order to do that we need to have some smaller version of the applications that we can run on these machines to get a better handle on how they are gonna perform. Things like Linpack give just a hint of what the capabilities are. We may see quite different trends when we have real applications running on these machines.
Primeur Magazine: We don't know what the cost is of the new Japanese machine. Perhaps this will be revealed at one point in time, but we can remember the time when a definition of a supercomputer was a machine that cost 25 million dollar. Now it seems that if you don't have five hundred million dollar, you cannot buy a supercomputer. Is that also at trend or is this only true for the big machines and is the real high performance computing going on somewhere else?
Jack Dongarra:If you want to be number one I think that is a good round number to
think about. The three DoD machines that I mentioned have a price tag of six hundred million dollar each. This is 1,8 billion dollar that is going to be spent on those three machines and that is just the hardware. This does not include the software. It does not include the applications. It does not include the ongoing running of those machines. That has to be taken into account. Being number one is an expensive proposition but these machines have tremendous capability. You buy a machine so you can carry out leading-edge science.
Having the capability to carry out leading-edge science is really what is driving the requirements and the needs for these computers. I equate these machines to things like our most sophisticated instruments. We think about things like the Hubble telescope. That is a machine which is up there in space. I don't know the exact price tag but it's probably on the order of a billion dollar and providing you a tremendous wealth of information. New science is being learned and discovered as a result of that. The same thing is true of these supercomputers. We should think of them as very large instruments which we use to push back the frontiers of science. Paying that price is just part of the cost of doing science at the leading edge.
Jack Dongarra:It is an adjustment I have to say. I was known for doing quite a bit of traveling but now my travel speed is zero miles per hour. I used to have a very high miles-per-hour number. It is a matter of getting used to it. Can I do the work? Sure, I can do the work from home. I am sitting in my office in my home in Oak Ridge, Tennessee. I have been here since the end of February. I have been able to be productive I would say in this position. How long it is going to last. how long it is going to keep on, I cannot predict of course. None of us can, but I am prepared to stay here until it becomes safe outside and we can again resume travel. I am sure at some point, I will again resume travel and be perhaps even next year in Frankfurt.
Primeur Magazine: We are looking forward to see you again in Frankfurt or in some other place. Virtual meetings are nice but also pose a lot of problems. Besides, it is much nicer to see you in person.
Jack Dongarra:A lot more happens in person I have to say so I look forward to seeing you again too.
Primeur Magazine: Thank you very much.