This interview is published in four different episodes:
Part 2. Exascale software developments
Part 3. Knowledge processing and intelligent machines
Part 4. International Exascale Collaboration
Satoshi Matsuoka: So from a software perspective, there is a lot of research, not just exhibited here at this conference, but also at various other conferences that are trying to cope with the innovations in hardware and also with the innovations in applications, for example, expanding HPC to be applicable to Big Data. But all the underlying complexities to deal with, for instance, how do I achieve low power, you cannot achieve that with hardware alone. You need software. How to use non-volatile memory? How to deal with the immensely asynchronous systems? Even if you do not go to the extreme parallel, you have GPU-based systems, which are enormously asynchronous. You have millions of threads in the system. How to deal with the complexity and massive numbers of I/O? So there has been a growing trend, a very visible trend, in that the primary role of software research in HPC is now hiding the complexities from the user.
Software research is extremely vibrant, and really leading, not just in our own domain, but also leading the way for other domains in IT to adopt our latest results. A lot of people in software are now doing HPC research, because this is a very demanding field, but also the innovations in this field are actually duplicated to other fields of IT. Software, as a research field, is very active. I think this is also reflected in the commercial tools. There are new programming models like OpenACC; there are new standards, new tools. If we talk about the applications, that is a different thing. I am just talking about the software and the standards, not only the applications. But if you look at system software tools, again there is a very vibrant ecosystem that is forming.
Thomas Sterling: I agree with Satoshi that there is a tremendous activity in the area of software development, but most of it is worthless, although I think there will be many lessons learned. The reason is twofold. First in the metaphorical sense, a lot of it is looking under the lamp post, where there is an application that is relatively course grained - and there are a lot of it - both in the commercial as well as in the scientific area. We need to develop strong tools for decomposability and for work flow definition. I am very sympathetic to that work. But most of that work has the implicit assumption there is an underlying distributed memory software framework, that is MPI.
My, very narrow minded, view relates to strong scale, which I think is increasingly important for generality and for performance portability, that requires, as I have long preached, a more adaptive dynamic computation that addresses asynchronicity at many levels, including the near fine-grained level. Because of that reason, there is a different relationship than what we had in the past between the user programming interface and the roles and responsibilities there. This is a separation of concerns argument: The responsibility of the runtime system with respect to the application and the responsibility and the interrelation between the runtime and the operating system to the hardware resources environment. I absolutely agree with Satoshi that regarding the complexity, reliability and energy, we do not know how to address that. I personally feel that without doing that in the conceptual framework of a defined execution model, you cannot just plug these things together. We, for example, use a technique called side-path energy suppression for dealing with the control of the critical path to the execution with respect to the other paths and then modulating the performance, power and clock rates for those, thus keeping it down. You can show that this way you get optimal energy performance.
So I know how silly I sound, do not think I am blind to this. But my conviction is real. I truly believe we are developing important software for our current generation machines, not for next-generation machines. That is the only reason I say, ultimately, in the long-term, it is going to be useless.
Satoshi Matsuoka: There is always an argument, there are two incumbents at the programming layer. There is Fortran and then there is MPI. I do not want to offend any of my friends in the area (laugh).
I am very much in Thomas's camp concerning the upbringing. But looking at the reality of the existing situation, it is true that there is a lot of work being done in these areas. Fortran 2008 is a decent language. MPI has evolved to MPI 3: there are new features and then there are other extensions for resilience and fault-tolerance. There are all these existing trends that continue to be useful. On the other hand, because of the complexity of the underlying system, and also the complexity of the applications, people are looking at other abstractions, for example at PGAS languages. Since the days of High Performance Fortran, these had never become popular, but now we might be seeing some acceptance, because people do not want to move the data explicitly. Also, there is now a growing generation of people who are used to program in C++, or in highly asynchronous languages like CUDA. Our people are starting to use languages developed for the web world, like Python, or its scientific extensions, as one of the arsenals for an HPC, or Hadoop for Big Data processing workloads.
Now people are doing extensions to Hadoop to make it applicable to big supercomputers. Also the entire software base of HPC is really expanding. The way that in the Cloud people have invented new software abstractions, new models, new tools in order to cope with the complexities and new applications, the same is happening now with HPC. What is special about this is that old-school HPC had longtime resistance to such a change happening. There were a whole bunch of changes proposed from academia, but industry said: we just hack with Fortran. Then we have another hack with MPI and that solves the problem. That was fine when the range of applications were narrow with partial differential equations and when we had only a 16-node cluster.
Now companies are buying supercomputers with thousands of cores and then solve immensely complicated problems, not just a single application, but a very complicated workflow with petabytes of data to deal with. So simple tools, narrow tools, can only be a partial solution. Now where do we go from here? The jury is out! For some types of applications, especially regular ones, these asynchronous execution models with strong scaling are very important. But there are other applications that work fine with the existing models. The fact that we have a full variety and we can compose them is an important property.
Thomas Sterling: I have little to add to that. I agree with everything. One minor addition is that the combination of generality and portability is becoming increasingly more important, both in cost-effectiveness and software development, but also for the wider applicability and therefore the market of underlying machines. We have not been very good at this, but we have come to understand as we develop our software tools, one of the invariants of a programme, an algorithm or an application, will remain the class type of machines and the generation. And what are the specific concerns that have to be machine-independent? We have to devise a framework and therefore the software tools to incorporate this realization via abstraction layers. That only adds to what Satoshi said.
Satoshi Matsuoka: I think this is not only our view but also that of people like Thomas Schulthess, the current Swiss CSCS director, who is very vocal about this. People are coming to realize this as we approach exascale. It is not just a change in execution model, but it rather is the way we are thinking, that is now more like people in web or cloud. The use of supercomputers can no longer be like Formula 1 where we have super drivers that basically drive everything at racing speed no matter what type of car it is. HPC is now becoming more of an enterprise endeavour with very high diversity; thus we need a framework that must be very flexible.
Thomas Sterling: Right now in the US, there is a love affair with the idea of a DSL (Domain Specific Language), but a DSL is only a stopgap measure. While in the short narrow sense it actually to some small degree addresses it, in the longer sense it actually makes the issue more complicated rather than simpler because it creates these special platforms, this myriad of different DSLs has to be developed with the same characteristics.
Satoshi Matsuoka: What they do in web space is they take the problem one level higher. So what they do is you have tools to develop DSLs generically.
Thomas Sterling: A meta DSL.
Satoshi Matsuoka: Yes, an instantiation of meta programming. That is highly successful in web programming. I have seen too many programming language research, when a graduate student goes away, the whole language dies and there goes your ecosystem. Higher level abstractions such as metaprogramming could help you to preserve these ecosystems for a longer period of time. There are all these ranges of ideas regarding parallel programming, and such diversity is good in terms of HPC increasingly affecting the mainstream.
Of course the down side is that, one reason such higher-level abstractions in web programming work is by not looking at performance. So our mission is to put all these ideas, abstractions, the execution models and the combinations together in an efficient package. Ultimately for HPC, by nature performance is important, so how do you preserve it? That is one of the big challenges we are facing in HPC software research focused on abstractions and ease-of-use. Compromising performance by orders of magnitude by high-level abstractions compared to writing a hard core code may be acceptable on web programming space, but it is not acceptable in HPC.
Thomas Sterling: The idea of retaining performance is probably inadequate because it is in the wrong direction. When you have to face the concurrent challenges, of both efficiency and scaling to recover from the burden of exponential growth like parallelism, we really have to be more aggressive than we have been, not just to maintain the status quo.
Satoshi Matsuoka: Indeed because of that there are a lot of research and experiments being done, new software products being produced, overall making the HPC software field very vibrant. Participation at conferences like ISC, for instance, has increased tremendously over the years, partly due to those effects. Also the paper acceptance rates of these conferences have become more stringent, as the population of HPC software research has increased dramatically, but there are only so many HPC-related, top-tier refereed conferences. So such conferences are getting tons of paper submissions. Although the papers may be competitive with respect to quality and interesting new ideas, nevertheless today getting these papers accepted is extremely difficult.
Thomas Sterling: That in fact, is a problem. A moment ago I wanted to say, I personally believe the HPC field both in hardware and software is in a renaissance or in the beginning of a renaissance. But Satoshi is touching at something that is actually counter-productive: it is now harder to move forward intellectually, because there is only so much room for papers. What is happening is that more conservative papers usually dominate and take advantage, because they are more solid, more grounded in numbers, because they are incremental and because they do that work at the level of confidence. As a result, some of the interesting results are chopped off. I get a lot of rejections. The rejections are because we have not done it at the scale they want it, or we have not compared it to the other thing, or we have not done this or that. All of which is true. But as a result there is a black market of ideas that are not being reflected, because the conferences are not looking at where the really new thought-provoking and likely the transitional and transformative ideas come from. It is easier to accept papers that are good solid papers doing incremental work. I myself have a real problem with a lot of technical programme committees for this reason. Now again, that may just be sour grapes.
Satoshi Matsuoka: Certainly, there could be difficulties like that. But conferences like ISC - these big conferences - are important. Then there are very specialized academic conferences that are very competitive, and especially in those what Thomas mentioned could happen. But for example at ISC and SC there are a number of BoF (Birds-of-a-Feather) sessions where people present new ideas about software, and these are quite vibrant. For example, if one were to devise an extremely new asynchronous parallel software that would be even better than ParalleX, then one could organise a BoF on asynchronous execution models and present the results, instead of just submitting a technical paper. Big conferences are important because there are different varieties of outlet for presenting your results, and moreover, there is significant audience to interact with you.
But again, I go back to my recent point. A lot of activities in software are needed for exascale, and we know the target problems: resiliency, power, asynchronous scaling, programming, resource management, data management, memory hierarchy, and so forth. We know the problem space, and we are witnessing the birth of all these new solutions by aggressive software research in HPC. Who could have imagined that today we have a debugger or performance profiler that will work on a million cores? We have those for real now as we are approaching exascale. Twenty years ago we did not even believe in a million processors - rather we said: "That is science fiction", as at the time we had only a hundred cores here and there. So the corresponding software that would work with million cores was blue sky research. But they are a reality now, in both applications and system software. Towards exascale we are facing harder sets of problems with system scaling. But we have identified the issues, and although solving them is a challenge I am sure we can overcome them, as large scale systems infrastructure, not just HPC, is having to deal with scale.
Previous week: Part 1. Introduction - Exascale hardware developments
Next week: Knowledge processing and intelligent machines