Primeur Magazine:I think that would be a good point to move over from hardware to software and applications. Are there any new developments there?
Satoshi Matsuoka:I think, I may already have said this, there is significant penetration of the use of AI, of machine learning to augment conditional processing of simulation that is more ab initio: you have the governing equations, and then you try to discretize it, and solve it with traditional simulation. We have been doing this for many years. Whereas AI for learning is more empirical. So it is the difference between, if you take like water, and you fill a glass with water, and you shake it, and you ask a six-year old child: what does it look like? It will say: well, it is water. It is very likely that a six-year old does not know the Navier Stokes equations. So, why does the child say it is water? Well, because it learned it empirically. There is some form of function that describes this motion as water, that manifests in the brain by a screening process of some neural network. But that is a very powerful paradigm. Now, people with the resurrection of deep learning thanks to HPC are trying to augment these traditional disciplines of ab initio simulations. There are various ways to do it, but at least people are trying these and they are getting fantastic results by combining empirical and first principles/ab initio results. From my perspective, that is the correct science, because both are very important to attain continuous speed-up.
Thomas Sterling:I think all of that is true. There is an inflationary period here of inherent influx and driven work to expand and explore the possibility of machine learning, deep learning methodologies applied to large data sets and I look forward to it free running until it kind of peters out.
Satoshi Matsuoka:And even to simulation. There is lots of work that tries to replace traditional simulation with this empirically based training. Just as I said, the child can tell by the motion of the water that it is water by empirical means. Or take an artist. Tell the artist to draw me some leafs. He will do it. That is the power of empirical training. Solving physical problems with empirical methods may not be used for everything, but still they are a very powerful tool.
Thomas Sterling:That is an important point. While it is a Brave New World, for many things, and really needs to be pursued, although we cannot see all the consequences, it already is shown that it does not apply to everything. One area in which it has been very successful at a first glance, is language translation. It is very good: on the order of 80+ % accuracy. But most languages are actually low-resource languages. What that means is that there is too little training material to be able to use these techniques effectively. That is true for the majority of languages in the world, even ones with a large population are not currently documented in a way they can be applied.
The other problem is in the area of negative and positive false results. The results we get today are very good: in the range of 80-90% accuracy. The errors are not that important, except in the case of false negatives. Now, I am moving out of the language area to facial recognition, another supposed success, but when the extremes matter, the 99% match, it proves that other techniques sometimes have to be considered. This is not in anyway putting down machine learning or deep learning, it is simply pointing out that there are limitations because of the statistical nature of the methodology.
Satoshi Matsuoka:Another way of putting this is you cannot reproduce what you have not learned. So the other way of using machine learning has been to use it as a filtering process: if you have lots of datasets, you do lots of parametric studies to these simulations, and it overwhelms you in terms of computational requirements. By using machine learning to do the initial filtering, giving the past experiences on these datasets: what will be the likely candidate? Then you narrow down your candidates and then you do your first calculations. That is another technique of how they use machine learning, combined with traditional simulation. Again, it is the interplay that is very important. It is not that one is a replacement of the other. It is really, you know, combined methods that will be very, very effective.
Primeur Magazine:The discussion is also: is AI driving HPC? Or is HPC driving AI? You have the pictures where they show a dog, which shows either AI as the tail or HPC. But is one really driving the other? Or are they complementary?
Thomas Sterling:I think they are mutual enablers. I think the awareness of the potential of machine learning motivates reconfiguration of existing and near-term technologies. It is driving the design of special purpose devices. There are any number, tenths or so, of custom chips created right now to deal either with machine learning in the broad sense or even special workloads that are being produced. I think the other thing is more true: The ability of HPC to handle the level of data and the complexity of data structures that the human mind simply cannot deal with is expanding the interest and the use of machine learning techniques into areas that formerly had much more primitive approaches. I think that is a Quid Pro Quo.
Primeur Magazine:Does it also get into areas where previously people were working? In some areas, algorithms decide on what people do.
Thomas Sterling:This is very much work in progress.
Satoshi Matsuoka:Empirical based methods have their merits and their drawbacks. You have to understand that. Coming back to my original example, people will not be scared if a child can say that water is water. That is what it is supposed to do. I think there is an overreaction to the capabilities of machine learning from people that used ab initio calculations. They fear inaccuracy. Of course, there will be inaccuracies, but again it is a complementary thing. You should not neglect the power of machine learning, of empirical based methods, because it may not be accurate. It is up to smart application people to exploit that property.
The other aspect from the infrastructural standpoint is although, as Thomas mentioned, there are lots of special processors. Right now, machine learning is done mostly on fairly general-purpose chips with general purpose CPUs and GPUs. We know they perform well on most HPC workloads but if you take, for example, Google Tensor processors, it is pretty clear that this processor is very hard to use on general HPC workloads, especially for ab initio calculations. As we go to more specialized architectures, the question is whether this hardware that is extremely specialized to machine learning, if it achieves volume, can one exploit that, or is it so specialized it cannot be used for first principles calculations as was possible with GPUs. The jury is still out on this. That kind of paints a picture of the whole ecosystem. How much do you want the specialized one to differ? The jury is out. So far. It will be a very interesting development in the next several years if we specialize the hardware architecture for machine learning, whether the software can exploit it, or is too alienated which makes it only good for machine learning. Of course, we can use it, but we cannot use it in general, like for first principles calculations.
Primeur Magazine:Some time ago, a few years ago, you said the stuff we are doing with social media and the like, was not really Big Data, because if you look at what gets in and out of a machine, a supercomputer was bigger than the whole Internet. Of course, at that time it was true, but with all this Edge computing that we have now, is that still the case? Or is there now real Big Data outside of the science field?
Satoshi Matsuoka:It completely turned into supercomputing power. Let me give you an example. There is a Chinese company called Sensetime, which does facial recognition. That is facial recognition for smart phones, but they also do facial recognition for surveillance. That is their expertise. You might think it is not an HPC problem. But because they have to do considerable training of their deep learning networks, that company had bought one day a system just dedicated to this one. They have a system with 8.000 GPUs. That is like one of the TOP10 supercomputers. So, a single company working on something very much related to social networks, has a supercomputer. Also even social networks themselves now work with images, videos, and other types of correlated data. The web of information is becoming tremendously complex compared to the past. Now I get a new requirement for supercomputing power to resolve this. In fact, that is exactly what Facebook, etc. are doing, They are having supercomputing in their cloud to take care of all this stuff.
Thomas Sterling:I do not think I have anything to contribute to this, other than in the case of facial recognition. I happen to know something about it, in being involved in a project that is using alternatives to AI techniques, using a different modelling and for the set of conditions that we are targeting, casinos, and schools. For these problems we find that alternative techniques are actually more accurate, faster, and require lower resources. You cannot extrapolate in any way from that. I do not mean to imply other than there is still opening in differentiation. I look forward to see the evolution of these ideas created by motivated people for an ever expanding problem set, and I think this is an experiment in its own right as Satoshi implies. And I am not going to preordain the outcome.
Primeur Magazine:One thing we need to talk about is international cooperation.
Satoshi Matsuoka:As always, we foster international cooperation. This year was no exception: In fact, every year I see increasing progress in terms of international collaboration at all fronts. From not only application science disciplines, but also more for the computer science and engineering part. For example, with the Post-K we are developing a machine, but we have high expectations that both the Post-K chip and the technology therein, will have significant worldwide penetration via collaboration with international partners. A machine like ABCI was also achieved by significant collaboration both inside and outside Japan. Collaboration of different companies, collaboration with different parties that provided different AI expertise to design and procure the machine.
Of course, there is continuous collaboration there. For example, there is the DK Panda group in the Unites States because they do MPI for GPUs. We need their collaboration and expertise. If we have a very large machine like ABCI with several thousands of GPUs but dedicated to AI and machine learning, what kind of high-performance communication do you need? I think, admittingly there is significant competition especially for exascale. Again, there will be next generations of machines which will be orders of magnitude faster. There will be significant competition in that regard, but on the other hand, at the application level, at the software level, there is significant collaboration.
I see no impediment where countries need collaboration in hardware and architectures in the future. One reason is these machines are getting very expensive. Just like with particle accelerators or large telescopes there is significant international collaboration. We believe that supercomputing is no different. There will have to be international collaboration to design the machine, to facilitate the machine. The second reason is, you know, science is all about collaboration, and there is increased need for it, even now the amount of science adoption of these high-end computing systems is on the increase. It is surprising how many disciplines are still not exploiting these features, but it is increasing every day. Some of the fields that you might think would not use supercomputing at all, are now using supercomputing, not just for ab initio but, for example, for IoT, AI or pattern recognition that is now being utilized. That is greatly broadening the applications of supercomputing power to many disciplines.
Actually, again it is the nature of science to collaborate. So on the infrastructure side we collaborate as well.
As a very concrete measure, RIKEN has collaborations with DOE centers in the US: explicit government-to-government development collaboration. We have another government collaboration with CEA in France. We also have a set of collaborations with various organizations that involve, for example NSCA in the US, and the Barcelona Supercomputer Center in Spain. We also try to establish other collaborations, more ground level, with centers like Oak Ridge, and CSCS in the context of ADAC, which we also try to have Riken get part of. I think collaboration is being fostered everywhere, it is increasing and I think that is driving science and the progression of these architectures.
Primeur Magazine:What are the exascale expectations for the coming year?
Satoshi Matsuoka:Coming back to exascale: next year there will be a lot of details revealed about our future machines. Certainly, Summit is one template. As Thomas mentioned, there are many other types of architectures, many types of machines running. A lot of this information will be revealed over the course of the next year. We will have very clear visions of our exascale options and how to move beyond. As Thomas said, this was a pivotal year, and it continues to be so, and it is very exciting and also coupled with the fact that we are so much outside collaboration influence with AI, Big Data, Quantum, and Neuromorphic. There are so many subject matters. Also, we are getting credible results that would not have been possible with high-end supercomputing. All this combined is leading to very exciting times.
The only problem is, we train people in HPC but they get swiped up by Silicon Valley companies that pay three times the salary. Talent retention has become a big problem. Having said that, we have these situations where it is very attractive, when there are significant salaries or opportunities, that is a great field to a young generation that envisions it. There is clearly evidence, for example in Japan, if you ask a 13-year old of middle grade school, 'What do you want to become?' In the past, it was like a sports player, like soccer, or positions like medical doctors. Now, the number one is IT, and I think that is because younger generations feel that IT is exciting, provides lots of opportunities, changes the world, and also gets paid well. So, we are getting lots of interest and expect significant incoming talent in these areas. That will only help to progress the field because we will be having lots of smart people entering the field.
Thomas Sterling:This was an exciting year, and it laid the foundation for establishing progress in more directions next year of even higher quality. I really think a corner has been turned. I hope I am not too optimistic - it will be a pivotal year. Seriously, I think this year is going to be really important and international.
Primeur Magazine:Thanks a lot, and talk to you next year.
This is part III of the interview: