25 Sep 2017 Frankfurt - Each year, now already for the eighth consecutive year,Primeur Magazineasks Supercomputer experts Satoshi Matsuoka and Thomas Sterling how we are doing on the Road to Exascale. This year's interview took place at ISC'17 on June 21, 2017. What has not changed is the prediction when we will enter the exascale era - somewhere in the early 2020s - however, there is now more clarity on what the systems will look like. And this year we saw the rise of Artificial Intelligence (AI) and machine learning to levels unseen before. But does that mean we can forget about HPC and traditional exascale computing? And what will be post-exascale? Could neuromorphic computing and quantum computing be an answer? Or are we just talking about those because we do not really have an answer for the Post-Moore era yet? Let us hear what Thomas Sterling and Satoshi Matsuoka have to say about it. We publish this interview as a series of four articles:
Primeur Magazine:So we can go to the next topic?
Thomas Sterling:The importance of symbolic computing, AI, machine intelligence, the general body of problems that are not conventional.
Satoshi Matsuoka:In AI, there has been an explosion of interest as we all know. Here at ISC this year, there was an AI day and I think it contributed to the increase in attendance. Obviously there is a resurgence of AI: the first rise was early neural network in the sixties/seventies; after that, there was a rise of symbolic computing with the Japanese 5th generation computing, signified by the programming language Prolog, also seen in countries such as the UK, all of which fizzled in the nineties, resulting in twenty years of AI Winter. Now we have machines that are several orders in magnitude more powerful compared to those early days: HPC capabilities grow by three orders of magnitude every ten years, and as a result in twenty years we have machines that are a million times faster, which has made expensive algorithms such as used in neural networks - Stochastic gradient descent - to be finally usable for deep neural networks. In fact, the theory was there fairly early in the past; researchers came up with the foundations of these theories as early as in the 1970s. However, even the supercomputers of those days were powerless with respect to the massive computational demand. Now we have the capabilities, as advances in HPC had brought these to reality, resulting in the recent resurgence.
Now in modern AI we have lots of concrete demonstrated capabilities: AI beating humans in Go; auto-driving, and so on. Other applications, to some extent, are rather hidden under the hood. For example, Google Translate and many other machine translations now use deep learning based algorithms which is a complete change underneath, providing a lot more accuracy without being explicit.
So, certainly the resurgence of AI has been driven by HPC. And this is causing both benefits but also some "unrest" in the HPC society so to speak. On one hand you see the tremendous opportunity offered by the convergence between HPC and AI, where AI is being used for analytics and Big Data, where the synergy will merit both sides. But then workloads may compete, and people building the infrastructure may specialize for AI and not for HPC. We may see, for example, specialization to very low precision arithmetic since in neural networks we can use much lower precision floating point such as FP16 or even 8-bit integers, compared to standard HPC simulations that assume single or double precision. This trend may be a diverging factor between HPC and AI where the latter may have the advantage, because so much of the attention and funding goes to it. In fact in Japan and China some people see AI as having much more immediate and opportunistic value compared to HPC simulation. So, at the least there could be a shift in investment from pure HPC to at least some converged infrastructure, or even investment at the sacrifice of HPC; a troubling scenario.
Primeur Magazine:When we are talking about AI systems, are we then talking about NVIDIA? Or also others?
Satoshi Matsuoka:Also others. NVIDIA is certainly leading the change in AI-capable chips. But certainly other companies such as Intel and Fujitsu have been quite proactive in announcing their own deep learning chips. For instance, Intel announced two chips, the Knights Mill and Nervana Lake Crest, where Knights Mill is more general purpose, while Nervana Lake Crest is much more special purpose for AI. Fujitsu is also designing the Post-K chip as a general purpose ARM HPC server microprocessor, but at the same time is building a completely different new chip called DLU which is 100% focused on Deep Learning. Google announced their Tensor Processor Unit (TPU). Some people are building chips that are meant not for Deep Learning but more for symbolic processing like graphs and automata, and these are very special purpose chips as well. We also have seen Google announcing their Tensor Product Unit (TPU). Overall, special-purpose architectures for AI in general have become extremely vibrant. At the same time, these areas are so vibrant that it may be taking away some of the attention from conventional HPC.
Thomas Sterling:I do not disagree with anything that Satoshi said. But I have some caution. I too am a pattern matching creature like a Deep Learning machine. I noticed how our field, in particular cycled through one hype after the other, and that does not mean there is not value to each particular topic. But we have a natural tendency to expect if something is growing at any point in time, it will continue to grow forever. Sometimes we get trapped by our own terminology. When we say machine learning, we think we are talking about a machine actually learning something, and that implies understanding, but it does not. And furthermore, if we do not understand how it works, it is a kind of curve-fitting in a highly N-dimensional space, and we appreciate that, but we do not actually know precisely what it is doing on anything.
We must also point out there are rather kludgy requirements, such as enormous training data sets and that the notion of machine learning is lying within the scope of the training set. So "AI" adapting its entire meaning to the notion of merely extension to neural nets is just silly. That does not mean that it is not an interesting component. If one wanted to use the human brain as an analogy, you will find that most of the neurons are not used in actual knowledge gaining. They are like neural nets, filters, they are associative processors of especially image and sound, and of course, what makes it really interesting, with a time dimension as well. That is saying that there is validity to that aspect of the functionality. But the higher brain factors, the symbolic brain and the point where we are able to associate it with, I am going to say, reality, which means derived from first principles, what in fact deep philosophers think that all reality is, is illusionary. We just think we understand something - but maybe we are in a giant Matrix with Keanu Reeves. But there is a much greater part of the challenge in front of us, that nonetheless matches Satoshi's notion of how machines evolve. They will be symbolic, they will not be these large "webs" of waiting for the results. That is good news though: there is much more in AI in front of us than behind.
Satoshi Matsuoka:I completely disagree.
Thomas Sterling:That is good. I have never seen Satoshi wrong before. But this is perhaps the first time.
Satoshi Matsuoka:I completely disagree. Why did the second resurgence of AI, the symbolic processing of Prolog, etc., fail? Because it was purely symbolic! People thought symbolic logic would solve all the world's problems, but I can present a very simple thought experiment to prove my point. Here is Coke being poured into the glass. Now shake the glass and have someone to look at it, and then ask what is in the glass? One will see water-like liquid, and since it is brown, one may assume it is Coke. But if it is transparent, you assume it is water. If it is oil, it behaves differently because of higher viscosity. Now, what is the governing physical equation for liquid? It is the Navier-Stokes equation. But then we question ourselves: is our brain solving the Navier-Stokes equation like an HPC machine? Probably not, and in fact most people do not even understand what it is, unless they had good mathematical and physics training. But then why can you tell this is water? Because you have trained yourselves over years of watching water, that something that looks and behaves like this is water, which represents exactly the non-linear function that this water is supposed to be behaving under. People cannot describe to you in equations or in words why they recognise this as being water, but the neural network in their brain is essentially replicating a solution to the Navier-Stokes equations in real time. And you are not actually solving this equation, but it is something you have learned. In fact there are theories that say that neural networks can approximate any type of non-linear functions. It is my opinion that the mistake that the AI researchers and developers of the eighties made, was that they believed that symbolic processing is all we need. My counterargument is that even symbolic logic itself is an instance of a generative phenomena of these non-linear functions we acquire in our neural nets as lower basis. And that is my strong belief.
Thomas Sterling:It certainly is a brilliant hypothesis. Here is the fundamental reason that my very smart colleague has inadvertently gone off into bitterness over the failure of the Japanese programme, which by the way the US followed perfectly right down to failure too. It is the following: every part of his argument is anthropogenic. He assumed that the embodiment of intelligence is in fact a human manifestation. But intelligence in fact can have nothing to do with how the human brain operates. The human brain had to evolve from a period when trilobites ruled the planet 540 million years ago. And to get to where we are today it had to find an obscure path that went through all of this time. We do not have to do that with artificial or machine intelligence. But what we do have to do, and I think Satoshi would agree, is that we have to define intelligence, not intelligence just as a goal, but intelligence as a property. I would assert, that intelligence is in fact an algorithm in the broader sense. I sensed this in Satoshi's comment, and I agree in the narrow sense, when he was talking about symbolic logic, and I would not disagree with him on this. But that is not a limitation of symbolic computing. So understanding and defining this, why did they fail? I think, and I am not sure, not because of the underlying or simplistic symbolic manipulation but because of the fact of an intermediate abstract architecture that represents and embodies both the structures of information, both in terms of contents and in terms of objective functions and then the workflow takes advantage of that. It is easy for me to say, but impossible for me to prove at the moment, I acknowledge that. But I think that there are broader spaces of consideration.
And that will make your article more interesting, because we have a true point - counterpoint. I hope to live long enough to find out what is the right one.
Satoshi Matsuoka:We can continue to argue about this but that is not the focal point of our conversation. So let us go back to a more pragmatic topic.
Thomas Sterling:I accept your key point. That is the scale of your system in either case, that it will do what we are talking about, this is likely to be in some use of the term exascale.
Satoshi Matsuoka:So, this exploding body of work in Artificial Intelligence has a significant potential. It is my firm belief that, of course, HPC plays a chief role, not only by just providing flop/s but also for many of the supportive elements that constitute the infrastructure, including memory, interconnects, and I/O as well as data manipulations thereof; all combined can be said as HPC affecting AI.
But there is a counter-path where AI affects HPC in the following way. Until now most of the simulations in HPC have been done under the dogma of first principles, i.e., there are some governing equations and then you somehow discretize these equations and compute them; but we are hitting scaling limits in various ways. There is a totally different approach which is more empirical. Of course scientific experiments were always empirical in nature, but now people are looking at simulations done by empirical means, which are driven by machine learning and AI to come up with much faster time to solution. An example is, in the old days weather forecasting used to be empirical. We looked at which day of the year, we looked at the average temperature, to predict weather. As science progressed you started drawing weather maps, but then you still had to be an expert forecaster to look at the weather maps to say whether it would be rain or sunshine and what the temperature would likely be. With supercomputers, weather forecast has gone from such to first principles simulation of transport equations and physics, and over time people try to go to ever finer resolutions to determine the weather. Now people are trying to apply AI to make forecasts, which in some sense is going back to the empirical means. Looking at these weather pattern trajectories but then using generative extrapolations capabilities of neural networks to predict future weather patterns, not from first principles but from what you have learned as a function of the knowledge you have acquired. Such a methodology is starting to prove to be successful in some domains, in some cases achieving many orders of magnitude speed-up with the same precision as when doing calculations from first principles. Of course, first principles simulation versus empirical learning and generation are not adversaries. For example, to do training you need lots of data, and simulation can provide you with such data. And in some cases even if you do the empirical calculations, we all have to check whether they are correct. So these are rather complementary, but still the people are now excited by the fact that AI / Machine Learning can provide a complementary approach to acceleration in a very different way to counter the limitations of scaling simulation speed-up. There are examples in nuclear physics, material sciences and so forth.
Read also the third part of the interview HPC applications towards exascale .