Primeur Magazine:Can we now turn to applications, obviously the application development?
Thomas Sterling:I think one of the biggest changes that we are seeing in the application arena, is the emerging realization of the need for much more highly productive environments and management of complex workflows to build the multiscale multi-physics applications of the future. The value of the bigger machines, both in processing and memory capacity is that more sophisticated models can be implemented delivering higher fidelity of results. I am not even referring to the big data side of this. I am talking about classical efforts such as simulation. We are also seeing maturation of the second generation of libraries that were built in part for performance portability and in part for productivity. These are more stable and one would hope that the libraries themselves are constructed for different machines in order to offer to the end user greater performance and portability across machine types, scale, and generations. The increase in certain domains and applications in particular means that there are significant advances in the area of meteorology and climate modeling, and this both in Europe, the US, and Japan but also places like South Africa where such results are critically important. There is a rapid increase in the amount of work in the area of complex materials and material characteristics, combined with Monte Carlo-like simulations for high dimensional problems, with much more sophistication in it, and the complex structures of the materials themselves. There is an increasing tendency for the potential impact of exascale computing to the application arena in many domains, including possible controlled fusion, but also in the more mundane areas such as modeling electrical storage. Energy storage is becoming one of many problem domains of increasing importance that may have staggering impact.
Satoshi Matsuoka:Maybe also in the areas of medicine, pharmaceuticals, and genetics there is a huge explosion in that space. To some extent these not only are compute and data intensive, they are very diverse and very complicated. The idea that a lot of people just want to run code is just not true. The entire workload of any sort that tries to capture somehow these complex phenomena involves multiple applications to be used in concert. These are increasingly using supercomputers now for realistic tasks. They have always said that, but what is changing now, is this immense capacity that these machines have both in terms of being a processing power and also having data handling capabilities. Many users are really exploiting these in a much more judicious way than in the past. Another architectural sight of these two things is that, despite all of the diversity, the machines are much easier to program. To some extent they are more standardized, and they are able to port libraries and so forth. The software ecosystem is richer, and also because of it the algorithms certainly have improved. They have gotten better to the point that they can scale easier even for their complex and irregular workloads. For example, in biology the workloads are very complex. There are many options to how they allocate the resources, make codes scale better by adjusting the size of the resources to use many processors for a single job versus how many ensembles they run. They have a huge leverage in making these decisions in a way such that the amount of resource complexity of the workflows involved is now quite tantamount. There is a very strong implication that now people are able to perform at unprecedented capability. At one time people were worried about finding applications that could fully absorb the full power and resources of future exascale systems. This concern is waning because in complicated workloads there are so many components in the ensemble application space, you can just use all the resources you want. This is giving some reality to exascale. Giving this capacity and this emphasis on capacity is creating a sensational appetite for the resources, and of course for getting the science done as a result.
We can find examples like climate modeling, genomics and medicine, simulation of human bodies like the heart and brain, again with coupling of organs and blood vessels, multi-resolution physics. It is a fact that the world is a complicated place being mapped naturally onto a machine. Some may find that we do not have enough resources to run them. Exascale is becoming much more realistic and a needed target than just exascale for the numbers. Ultimately, the achievement of exascale will drive the future of science and engineering, among other domains.
Thomas Sterling:As the next example: my team is involved in a sponsored project to do shockwave physics through reactive materials and this combines wavelet computations with finite element computations in a workflow that can only be determined at runtime because of the data sensitive nonlinearities involved. One area we are going to find ourselves drifting back towards is this old idea of strong scaling. This is because of problems that are no longer able to take advantage of the increasing scale into the hundreds of Petaflops using conventional practices. We find that important problems including those in microbiology that Satoshi referred to, can take a very long time to reach the steady state or the total integral time, and yet they cannot use more than a few thousand cores. When you are looking at a machine with a million cores they are off the map. Yet, these are very important problems. Satoshi also mentioned FPGAs and this has always been, at least for two decades, a tantalizing and attractive alternative approach to conventional practice for certain parallel algorithms. There are a number of extremely impressive examples, instances where FPGAs are used for applications that really make a big difference in performance compared to conventional, classical processors. I and my team are interested in the use of FPGA components not directly for the application but rather for system software - in synthesizing accelerators. All running applications, to whatever degree they exploit the runtime system software will benefit from this form of FPGA acceleration., But it is not the more challenging approaches that Satoshi is suggesting. If possible these would lead to very important breakthroughs, where for each application it does automatically synthesize logical constructs that accelerates those. Architectures like the Anton machine, which is a special purpose device, certainly suggests that such structures could be extremely useful and FPGAs are a way to do that much more rapidly and cheaply; although at lower density than ASICs. I enjoy working with FPGAs, but I am a little bit hesitant because of the reduced clock rate of FPGAs and the much lower density, although technology has really grown very impressively, especially with the number of devices, and on-chip functional unit types that can be put on the FPGAs that support capabilities like floating point units, good communication paths, and other things. Your question was about the diversity of components and subsystems made available. I think there is a new resurgence and excitement among young people with things like Raspberry, the really cheap but performing devices that allow people to put functional components together. It is a lot of fun and it is an incentive. I am not sure what the effect will be on our field but, if anything, it will attract more young people the way building Beowulf clusters did a generation ago.
Satoshi Matsuoka:Maybe it will generate more MPI programmers.
Primeur Magazine:I think it is good that young people are enthusiastic about programming, if they learn how systems are built.
Satoshi Matsuoka:The point is, there is always a payoff. Even weak-scaling, if it takes too long, is not useful. As a scientist you just do not run your application once, but you run it with different parameters many times. But again the immense scaling allows you to make these ensemble runs possible. With strong-scaling a lot of algorithms are being investigated to parallelize the application in the temporal domain. This is very difficult, but in some cases it is possible. It is not as efficient as in the spatial domain,. Strong-scaling is difficult to tackle and this may require, and will benefit from, specialized hardware or at least new architectures. That is where FPGAs are coming in, and systems like the Anton, and some others like the neuromorphic chips.
Thomas Sterling:That is a good point, and I had not considered the relationship between strong-scaling and FPGAs.
Satoshi Matsuoka:But they are important and do not require a lot of resources to implement. My point is that with exascale you do not have one application that consumes all the cycles. I think that kind of policy is going away for a much more healthy reality.
Thomas Sterling:One important area that supports your premise is the different algorithm properties that distinguish when weak-scaling is truly effective and where strong-scaling increasingly becomes critical. For high dimensionality algorithms Monte Carlo techniques become the methodology of choice and this is well supported by weak-scaling and probably will be well into the exascale era. But for single run applications even where the data set size is dramatically increased therefore apparently relying on weak-scaling, to retain accuracy equivalence more finer gained time steps are needed and therefore more incremental steps in the time domain. Ironically, this means that strong-scaling is necessary to support what initially appears to be a weak-scaling approach if time-to-solution is to remain essentially constant.
Satoshi Matsuoka:It is becoming more apparent because of the time domain issue. The problem is going to become big for high-resolution computation. You also have to make the time resolution so we can make progress because the time step is so fine-grained that they have to do different solvers or mandate very different types of resources. It is the nature of the numerics and mathematics involved.
Thomas Sterling:It is one of the reasons that machines could scale rapidly in terms of processing rate other than memory capacity and still continue to be of value but not in the near future.
Satoshi Matsuoka:It is really the fact that people are identifying problems for which even exascale is not sufficient, and these are realistic problems. That also represents the fact that people now have a far clearer view of the utility of exascale in various important domains like medical, environment, social simulation, fusion, and support.
Thomas Sterling:We are now beginning to enter the realm where certain society real-time problems, with hard real-time constraints, are coming into play such as facial recognition, or other image identification tasks. There is a tremendous amount of comparison that needs to be made very quickly in order to identify individuals of interest. I did read 1984. But it also may be an important tool in protecting a lot of people from premeditated harm.
Satoshi Matsuoka:In fact this whole business of data centric computing includes things like data classifiers, doing regression analysis, and prediction. These are becoming in some sense more predictive, more a generic, and more inductive approach to computing. Such computing challenges are becoming prevalent for these machine forms and scales. In the past the HPC applications were more grounded on first principles physics. They were primarily analytical in nature and very domain specific. You would not apply things for CFD to things like materials science. More data centric approaches are giving rise to a very different, but very important, class of applications. In fact these kinds of problems are sometimes very compute intensive as well. For example, deep learning is extremely compute intensive, despite that it is being categorized as big data. It is a misconception that big data is actually light in computing. It is very heavy in computing as well, and as Thomas said, real-time applications, like image recognition, and control, again require massive amounts of data, which are not only highly data intensive, but also, if you match that with simulation, like in weather prediction, are both data and compute intensive. These new high-end breed of applications founded the need to process observed phenomena in real-time and process them in various different ways, including more data centric ways. They are giving rise to demand in computing that can only be served by high-end infrastructures. The broadening of the application space is something we are really starting to see. Google is buying up to tens of thousands of GPU's. Why would Google do such a thing? If big data processing was not HPC-like, why buy all these GPU's? They are not doing any graphics work with it. The reason is they have a stream of HPC like workload and data processing.
Thomas Sterling:There are so many emergent problems that require both data processing and numeric processing such that they work synergistically together. Two examples: one is in the area that is doing signal processing for things like gravity monitoring, looking for gravitational waves, where the signal is the ratio of a small fraction of an atomic size versus multiple kilometres. Being able to extract the information signal from the enormous noise component requires actual signal representation through simulation of first principles of physics to drive the correlation. This is projected, in a very, very noisy signal, to extract the information. The second example is, and this relates to the European major investments in the human brain project, being able to take empirical data derived through experimentation of multilayered neural structures, and being able to compare that to simulation of neural structures, and calibrate one versus the other to do validation of the model. These are done together between the experimentation and the added data analysis.
Primeur Magazine:Another topic I want to shortly touch is industrial use of HPC. For instance in Europe you mentioned Horizon 2020 that wants to focus very strongly on SMEs using HPC. Of course most of the people that are here at the conference are from industry doing HPC. How is that type of HPC being accomplished?
Satoshi Matsuoka:The keynote, although it was technical, gave a detailed explanation of how Mercedes Benz is using HPC for designing cars. I was really impressed, not only about the cars themselves, but also the inside of the vehicles, the environment to assess the safety of the car. For example when the human body hits a part of the car in the course of a crash, how that is affected and simulated is more or less at human organ level. This is more precise than using dummies. Of course you cannot put real humans as test subjects, but they can now do this in simulation. That is pretty amazing with far reaching consequences for societal benefit.
When you think about the resources - they were really open about how much resources they are using. The usage of some of the resources they have would be matching high-end computing centers. They probably have multiple of these high-end machines, and now they have come to the point where they have simulations which require thousands and tens of thousands of cores to have the turnaround time of about half a day, or even like two days. Take for example these very detailed crashes. They said they run hundreds of these, so they must have a huge number of these runs in order to design these vehicles. That is one example. We see many other instances in other centers, like oil and gas, and like pharmaceuticals, and of course CAE, and other mechanical engineering disciplines. At our center 20% of the workload comes from the industry. They come from very diverse areas of industrial design. Overall, I think industry, at least big companies, are adopting HPC to advance their production goals. They are using it to the extent that they routinely do normal job runs. There are now not the 10s of Petaflops runs because their industry applications do not run this single one big job. They run lots of these medium-sized jobs. Their capacity requirements are immense. The question is alluded to also in the keynote whether SMEs can exploit this type of technology.
I think that is really a challenge that is facing all the countries, not just Europe, Japan, or China. How can we do technology transfer and how can we educate people to use such new technologies? How can SMEs with high skills in various fields of engineering, turn their business into HPC problems requiring at least hundreds to tens of thousands of cores? That requires education and infrastructure because these SMEs cannot afford big machines. It requires libraries and packages that can transfer from the highest high end computing being done in the labs, academia, to something usable in the industry. Overall this transition will inevitably come, as we move forward in the use of these resources, because otherwise people will no longer be able to compete. Skilled professionals in HPC are currently expensive to hire, as it takes a lot of money and time to train people. If the ease and cost of using HPC resources becomes cheaper within 5 to 10 years, there will be a transition to SMEs.
Thomas Sterling:Certainly there are other areas combining a numeric side with the data side in the financial and commercial applications. The money and banking market is a heavy consumer of fast turn-around computing through financial modeling. It becomes imperative, and I am told in microseconds, in terms of transaction processing of stocks and bonds, and other investment types. Also very important is the use of large systems for handling retail inventory marketing and advertising targeting. All these require enormous amounts of processing data, which changes even on a daily basis as they manage their complete inventory. Fraud detection is also becoming an extremely big business in itself. It is almost shocking how many billion dollars or euros are lost due to fraud in a year and how effective in simple cases your detection mechanisms could curtail this.
Satoshi Matsuoka:Fraud detection is for instance enormously effective for PayPal. Fraud detection in insurance companies looks at these technologies to detect frauds, just like credit card companies. HPC is very cost-effective because you cannot watch everything. It is not only that traders are making money, it is actually being used to save money for the consumers at large, to maintain a safe society. The same can be said for many other things. You already mentioned face recognition to detect some undesirable elements, to at least track them, which is a touchy subject, but then it could be useful for other things, for instance when your child gets lost. There have been a number of instances, more or less recently, where some cases have been really resolved by these surveillance cameras.
Thomas Sterling:One last example is commercial aviation, not in assimilation of air services, but rather in the control and management of passengers and airplanes. We have had incidents where United Airlines had to stop service for a small period of time, several hours, due to computer problems. In effect the air routes and the airplanes are almost saturated and it is a very complex NP-complete problem to decide exactly how planes should be going, and how the passengers should flow. It is constantly changing, also for reasons of weather patterns. You know it is a Butterfly effect: a pattern in one place can affect a pattern thousands of miles away.
Satoshi Matsuoka:The novel use of technology will certainly not just be like big data for consumer spending but will also be used in the field of artificial intelligence. There has been mention of the human brain project, part of which is to work on the neuromorphic computing and brain simulation. If you look at the potential of artificial intelligence, this could be huge. The potential of artificial intelligence could be huge, but is very compute intensive. What is most compute intensive is the learning process; that will be done on big machines. The application of the learned knowledge could be done on much smaller machines like embedded robots in cars for autonomous driving. But the learning processing itself cannot be done on the robots or by the cars. Data has to be collected, and then the learning process has to be done in a centralized fashion. Again, this likely will become a prevalent use of HPC technology as we move forward. The society needs robots in order to free humans from mundane activities like efficient agriculture and repetitive manufacturing tasks. Robots can pick foods much better than humans. They are much cheaper, and they can do this for 24 hours. We can free humans from this work and make our brains work on more productive things, like thinking about the next generation of supercomputers.
Thomas Sterling:I absolutely agree that some time, but not in the immediate future, machine intelligence will be the single heaviest consumer of cycles of computing. The structure Satoshi Matsuoka referred to, is quite likely, even with more computing power. The leap of such a structure will be an important area of the cloud. Ultimately we refer to the cloud as this great intelligence that is distributed, where all of the shared knowledge is used. There might be millions, eventually even billions, of such devices. Many of them will be robots, but not all. Truly distributed very localized devices will be highly integrated to form aggregated processing for data assimilation and control such as sensor nets and these too will be exascale in capability.
Satoshi Matsuoka:Coming back to reality. How to drive the ecosystem so that these technologies will be very important in some sense we know how to bring the high end down more directly. Now it involves reasonable parallel systems that can include easily thousands of processors. How to bring them down so that they will not be overwhelming to the normal engineers: that will be a big subject matter. Again that involves developments in automation, in packaging, better user models, better interfaces, and better programming models. So bringing that down to the masses will be a huge challenge, which in some sense is as important as the race to exascale. There are a lot of different things and although some problems look different, a lot of them are actually the same. How you program file systems, as Thomas said, is a difficult problem. It is a problem for exascale but it is also a problem for SMEs using HPC.
This article is part of a longer interview. The complete interview is divided in 6 articles: