4 Feb 2019 Frankfurt - Each year,Primeur Magazinesits together with Thomas Sterling and Satoshi Matsuoka to discuss the state of exascale computing in the world, and perhaps do some predictions for the upcoming year. We recap the current year but do not look back, only forward. How close are we to exascale? When will it be reached? The bumpy road seemed a bit less bumpy this year. Satoshi Matsuoka is now at Riken, overlooking the Post-K supercomputer developments in Japan. Thomas Sterling pointed to the Summit that could be a prototype architecture, a blueprint for a real exascale system. So, is the finish line in sight? Let us listen to the experts.
Primeur Magazine:So we are here together to discuss another year on the road to exascale. We started eight years ago, and then we thought we would be there by now. But luckily it looks like we still have some years to come. Let us start with what happened during the past year. I think several things have happened, so we do have something to talk about.
Satoshi Matsuoka:Personally, I have become the director of Riken CCS, the former AICS, which is responsible for the K-computer and the next-generation Post-K supercomputer. So, that has been a big change for me. Now, I am responsible for one of "the" exascale machines, the Post-K. Concerning exascale: the impression is that what will be exascale is finally revealed. The Summit machine is not completely exascale, but it is almost there. So you can speculate that much of the exascale machines will look very much like an extension of the Summit in some way. Of course, there will be architectural differences. So, I think it is going to be finally there: exascale will be achieved. We have very good confidence.
Thomas Sterling:From the US point of view: we went forward and backwards at the same time. I actually think this is constructive. I don't see it as a problem. Forward certainly: Summit is a prototype of one possible exascale machine, and I know the people at Oak Ridge. Nobody is more qualified to try it out, to use it, to benefit from it, and to take out the bugs. But at the same time, Aurora, which was supposed to be a simpler machine of a very different architecture, was, in some sense, a failure. The project was canceled, although it was supposed to be a 200 Petaflop/s peak system as well. This was due in part to one of the vendors, Intel, that was providing key parts, but canceled the technology base, called Knights Hill processor. Rumor has it that they could not get the ten nano-meter parts in the required time and at the needed yield. This will have to be confirmed by them and Argonne National Labs, the intended deployer. But other possibilities are, I have not heard the announcement, that they do not intend to continue the Xeon Phi family at all. So that is unknown If so, this is very significant.
I would say there is a deeper seriousness in writing programs. I think there has been an increased investment in software environments that will be needed. But when it comes to the actual programming model of future machines I think we have yet to ascertain and to determine what that will be. So that is a future project. There are many incremental steps to take for moving forward. I agree with Satoshi that Summit is a template for one likely exascale system type in the near future.
Satoshi Matsuoka:One thing I may add is that I think the convergence of AI and HPC, of course, the resurrection of AI, is enabled by HPC. There is significant penetration of AI in the overall work flow of HPC, and that is signified by the fact that Summit itself is a perfect AI machine. But we will get to that later.
Thomas Sterling:Another observation is about the use of GPUs. Its penetration in the TOP500 was at one time little more than 10%, but now it is approaching 40% in a relatively short time. That is a significant growth.
Primeur Magazine:The Summit is a US machine. In China, the Tianhe-2A was upgraded with new processors. Can you say something about that?
Satoshi Matsuoka:The Tianhe-2A is an upgrade. What they did, was to remove the Intel Knights Corner processor. Then they put in the Chinese designed Matrix 2000 processor in its place. This was signified by the fact that there was a US embargo for the Knights processor, and their plan was to use those. So, because they were not able to obtain the Intel Knights processor they had to develop one on their own. In some sense, for the Chinese, this may have been beneficial, because it forced them to develop their own technology. If you look at the performance, of course, they have boosted their performance on Linpack, but if you look at the power consumption, the Gflops/W is very poor. It is only about 4 Gflops per Watt despite the fact that it is a very Flops-oriented machine. They lost memory bandwidth, because they had to go with conventional DDR-technology. The actual memory bandwidth has decreased. For running real applications, that means a lot. It remains to be seen whether these kinds of impediments will be resolved for the next exascale machines. They had to settle for this chip with its significant shortcoming as a replacement because they had to put it together in a relatively short time. Whether that signifies endangering there exascale plans or not remains to be seen.
Thomas Sterling:TaihuLight was a major step for the Chinese program from where they had been previously. All that Satoshi said is true. Compared to where they were with the Tianhe-2, this was a significant event, as Satoshi has indicated. They built this up with almost no international technology. That was the giant step. If they make the same degree of innovation in their next generation, and they have multiple threads going forward and leapfrogging each other (I am not making any predictions), then certainly one could see that momentum; it could work out for them. The biggest problem as is the case for the rest of us, is programming. They are rapidly improving their culture of applying these machines to real world problems. There appears to be a positive slope there that has to continue.
Satoshi Matsuoka:But still these machines have almost zero software ecosystem. They have to develop everything on their own. That can be a huge impediment. At the end of the day, all the hardware is possible, but it is the software that makes everything work. Especially with the TaihuLight, not the Tianhe, they are making a significant investment. But still, compared to the US or Japanese systems, the lack of ecosystems is thoroughly evident. Whether that will be an impediment to them, will be something they have to resolve.
Thomas Sterling:Just a comment: they have a lot of money and a big workforce.
Satoshi Matsuoka:Lots of workforce: very bright people.
Primeur Magazine:The Japanese machine will be an ARM based system, does this not lack an ecosystem too?
Satoshi Matsuoka:The Post-K is designed as an ARM system. It took a long time for us to say "we need an ARM". There were a lot of fights about it but at the end of the day we were able to put ARM through. The fact is that it is not just ARM, it is ARM with SVE, a new Scalable Vector Extension instruction set, proposed initially by ARM, then refined with the help of Fujitsu, who is also building the Post-K. SVE is a new vector instruction set that is very much refined. It is a real vector instruction set, unlike AVX 512. The machine also has a lot of technical innovations which will make it very fast. It has high performance stacked memory, a very high bandwidth interconnect, very high bandwidth memory, and, of course, high flop/s. It has an extremely low power design, and also accommodates some new features for AI and Big Data. For example, short precision floating point arithmetic, like FP16, as you see in GPUs, and 8-bit integers.
The Post-K processor can be described as a multi-core CPU with 48 cores. However, it is really performance-wise more like a GPU: very high throughput. In real-world processing we expect that this will exhibit significant speed-up over standard CPUs, like the ARMs for example. Significant, I mean not by percentages, but by factors, while retaining the ease of use. It is a large CPU at the end of the day. You can program it like any other CPU. You can port over ARM, or Power or whatever HPC work flow and it is very good. And then the machine itself, it is homogeneous. It is a very, very large system. It will be the largest ARM system in the world. Maybe the largest supercomputer ever, period.
Primeur Magazine:What do you mean with "large"?
Satoshi Matsuoka:In number of nodes. It will be the largest system in the world. Look forward to this in 2020-2021. We already have the chips.
Primeur Magazine:But what about the ecosystem?
Satoshi Matsuoka:As I said, it is ARM. ARM licenses the IP, and there are 21 billion ARM chips produced every year. So the ecosystem is, of course, there. The mission is to build a high-end server-based ecosystem, especially centered around SVE, but there are a lot of players in there, fueled by the fact that Cavium now has a credible chip. There are multiple ARM systems that will be coming online, like the one at Sandia National Laboratories, and the one in Bristol. The Astra system at Sandia will be 2.3 Petaflop/s. So, the ARM ecosystem is finally moving. But these systems are still mid-sized. They are a couple of Petaflop/s each. There will be a significant build-up towards Post-K: we are collaborating with all the major partners: ARM, other players, to work on the ecosystem.
Thomas Sterling:I have nothing essentially to add, and, as an outsider, I am perhaps not even well-informed, but I want to make the following comment: underlying this progress is a philosophy, and a culture of deliberate attention to detail, high-standards of quality, and learning from past experiences. There is a continuous sequence of when a machine has been delivered at the highest end in Japan for high-performance computing. It has ordinarily been an unprecedented class machine, and I expect no less from Post-K.
Satoshi Matsuoka:Perhaps we can give some more details about the Post-K next year.
Primeur Magazine:So, we have to be back next year. Basically, you see ARM processors now getting to give some real supercomputing power. You see the Chinese processors bringing HPC performance. Will there be more of those? Apart from Intel and AMD?
Satoshi Matsuoka:Going back to the overview of the year: 2018 has been an enormously interesting, productive year. Because now we have a tremendous number of new ideas about chips that populate the overall HPC ecosystem. AMD has made a comeback. They have very credible chips: both CPU and GPU. Their plans are very aggressive and attractive. Intel itself, although they ditched the Knights, they have some alternative plans which could be very interesting, especially their Deep Learning chip. NVIDIA has the Volta and this has proven to be very good. It is in the Summit, and I hope in the ABCI. They are all very highly ranked with diverse use cases, not just HPC. For ARM processors: Cavium and Fujitsu made lots of announcements. There now are multiple ARM chips that are high-performance. So, it has been a very productive year so far and this will continue forward with all this diversity, allowing for significant progress both by competition and the availability of these very attractive chips.
Thomas Sterling:I would say there has been a tremendous amount of coalescence and quality engineering progressing forward. I think that if we are going to see something new that is different from or deviate from what has already been produced, it will be because some new model, some new approach emerges, which is not out of the question. But obviously we are not going to predict that. I think that there is concern at Intel right now about their future, and I also think there is still an undecided unification: will it be extreme heterogeneity or will there be extreme homogeneity? I tended to favor the latter, but the success of the former in the short-term is a strong indicator.
This is part I of the interview:
The interview was conducted during ISC 2018 in Frankfurt.