Primeur Magazine:Perhaps this is a good moment to move to applications? See what has happened there? Thomas, you mentioned for instance that now it has been proven that the physics theory of Supersymmetry is wrong?
Thomas Sterling:I would say that this particular point is a very exciting example of the use of many different aspects of computing, Big Data, machine learning to detect patterns, first principles modelling and dealing with the noise, and so on. And my point in having that particular example, is that I like to remind especially young people that negative results which get much less of the glory and much less attention are far more important to science than are the positive ones. A result that confirms, strengthens a hypothesis, but a negative result leading to another scientific track is not less interesting.
There are in the area of applications, and this would actually support Satoshi's point, rapidly growing necessary applications, such things as facial recognition, which, because of the inclement political times and the resulting points of violence, becomes evermore important, and to some degree successful. There is a small body of work that I and my colleagues have been involved in which is facial recognition, that overall has been very effectively done with machine learning, conventional machine learning. However, one of the problems, and this goes beyond this particular application, are the false negatives. Sometimes false negatives do not matter at all. When we look at a stream of video images or speech the false negatives do not matter unless for whatever very noisy reason, you get a string of them. But if it is something like facial recognition when applied to the problem of identification and tracking, it has been shown that an alternative technique, different to standard machine learning techniques of using 2D images, may give lower false negatives. Analytical equations that capture the data of 3D not only allow a significant reduction of false negatives in the noise but also a fast way of doing detection and association. And while this is not a proof of anything, it exemplifies that there are different ways in which to capture truth in the broader sense, and machine learning is one of them. But there are higher order analytical methods and models that can be applied as well.
Satoshi Matsuoka:That pokes on machine learning. It is true that there is an explosion of pragmatic applications of machine learning. But if I turn towards general HPC applications, I think that certainly pragmatic progress has been achieved with these multi-petascale systems too. Running applications on these multi-petascale systems now is becoming more the norm than the exception, but in actuality, involves hundreds of thousands of parallelism inside the application. In some sense this is what we have anticipated and what had been our objective. But now this has become real, even a lot of the industries use these highly parallel applications on their own multi-petascale machines, such as seismic oil field discoveries and pharmaceuticals - quite amazing. New algorithms resulting in massive parallelism of the applications have continued to progress. Applications are not just those for top-tier science, but engineering applications are becoming more prevalent, and they also demonstrate scalability, like last year's Gordon Bell finalists. There were at least a couple of realistic engineering applications, including one from the UK, actually programmed in Python, compiled to efficient code by a special compiler they developed to achieve over 10 Petaflops of performance. As such petascale is no longer an exception.
Thomas Sterling:A particularly important application because of the complexity of the phenomenology and the highly transient nature of the behaviour such as shock wave physics through heterogeneous reactive materials is one that we are only now beginning to address, again because of the non-linearity and the necessity of using adaptive methods. This is very positive and hopeful and will expand especially in the materials area but also materials chemistry for new opportunities in advanced multi-faceted materials. There is another growth area which certainly in total throughput enters the range of exascale. But this does not support the premise for building exascale machines. The use of ensemble computing for Monte Carlo techniques is a significant realm in many cases, either for high dimensionality problems or problems in noisy space, or in annealing problems where optimizations need to be found. That would suggest for purposed applications that the multiplicity of machines, such as TSUBAME 3, will at least for that range of problems be better than some future generation ultra tightly coupled exascale machine. And again I will bring up the notion that our focus on flop/s or even ops as the definition of exascale may in fact be less important for certain applications than the memory capacity and the memory bandwidth which is sometimes the constraining factor.
Primeur Magazine:What I liked about your presentation is that you mentioned we have the TOP500 and the Green500 and these other 500s. We have lots of benchmark lists, each with a different purpose.
Satoshi Matsuoka:Yes, it is backtracking a bit. When you look at these lists, one may argue whether they are useful or not: The TOP500, the Green500, the HPCG, and the Graph500. When looking at these benchmarks, although I am not putting down the Chinese efforts by any means, but still one can say the Chinese efforts have only won the TOP500, and this is just double precision Flop/s in dense linear algebra. In all other metrics, even like the Green500 which is relevant, it is symbolic that top-tier are all Japanese machines, and they are all AI focused, or at least dual purpose AI and HPC machines like our Tsubame3 which was no.1. In metrics like the Graph500 and also HPCG, the K-computer is still number one. Of course there could be other metrics like the HPC Challenge. (By the way in the HPC Challenge, the K-computer is still on top too.) So that means if your application is not double precision dense matrix multiply then the K-computer may still be the best platform after 6-7 years. And this divergence even gets more serious as we move forward, in that we cannot build a machine that satisfies all metrics to be no.1. So, this puts the question whether pushing double precision flop/s which is our metric for exascale is the right thing to do. There have been some trends that indicate this, and I think the resurgence of AI is even accelerating this phenomenon - if you need to make a choice, do you need to build an exascale machine or focus on AI performance? If you have limited budget, and you need to pick one, what do you do? Maybe the societal answer is you should build a more powerful AI machine.
For such decisions, I think the applications will drive these needs. It is still not yet clear what the outcome will be, because on one hand, a tremendous number of AI applications have come forward, but at the same time HPC has made tremendous progress in terms of making the petaflop/s computation the norm. New methods are emerging, while some old methods are being revised, like Monte Carlo. These are exciting times.