Primeur magazine: One of the topics at this conference was the "missing middle". There seems to be a gap in architecture between the top machines in the TOP500 and the bulk of the systems. This is also related to slow HPC uptake in SME's. This seems to accelerate, but the problem is they cannot really learn from the top machines' experiences.
Satoshi Matsuoka: Yes, such has been noticed and there are hard data to back-up the trend. It can be attributed to several factors, but we are not sure which one is dominant. One factor could be that the top rank systems, because of their push for performance -, to increase beyond the Moore's law, and we all know that this is getting harder every year - use all kinds of new architectures, new techniques, new software layers to try to keep up with the assumed pace of the TOP500.
So the top systems now have very aggressive architectures, whereas the middle tier and below, they are typically bought by smaller institutions or companies - that are more conservative. Their uptake of such architectures might not be as vibrant as the top systems, and this has some effect on the ISVs because the ISVs' major customers are the users of systemsin the middle tier and below, whereas the top systems' users usually write their own programmes. So it can be considered a chicken and egg problem.
If that is the case there could be two inevitable outcomes. One could be that eventually there would be an uptake of the aggressive architectures, such as many-cores, so that the uptake will reach the mid tiers and below to mimic the top systems eventually. The other would be definite divergence: that there would be conceptual increase in difficulty in attaining performance in the high-end, such that this divergence would be even greater, in that there would be paradigm shifts in the high-end software and hardware architectures; then it would become even harder for the mid tiers to keep pace with the top systems. I do not know which will be the case.
The other factor could be the funding, because the Cloud has demonstrated that there is advantage to concentrate resources. For majority of the top systems they can claim similar advantages, indicating cost advantages resulting in increased funding incentives to have large systems. So that could be another reason.
Thomas Sterling: I agree with most of what Satoshi said. So I will just add a little bit of nuance to a couple of those points. The diversity is occurring now, but it is by no means the first time it has ever happened. As we look back over the last 40 years we see these different cases where the majority of people are working on one level with one modality and others are working at the very high end of the supercomputing. This time is different because this time both sides are still high-performance computing. Both are still pushing the edge beyond conventional enterprise servers or lap tops or desk side machines. One of the differences is that the two are out of phase with each other. The second issue is there are two very different objective functions. One is achieving sometimes competitively high performance, the other is accessibility. In case of accessibility, cost is a key issue.
The third difference is that we are now in a state of transition. The state of transition is not only that have we change from where we were,but we are going to continue to change to some place in the future where we will also stabelize at the high-end. We have not done that yet. In my view - and I may be wrong - anything we are doing now, we will not be doing in the future. It is not just a question of scaling up.
The analysis that my people performed surprised me a little bit: it showed that the canonical machine, the typical machine, and the machines at the very bottom of the TOP500 are practically right on top of each other. There is very little difference between those two classes of machines. Yet, the majority of the machines fall in that tiny space. So while this whole large performance space has another group: it is a bi-modality. This makes me think we will see this continue. We are also seeing a trend towards very light-weight cores,which is really a way to do light-weight threading.
The discussion today at this conference from Intel and AMD certainly re-enforced that. How many threads can you have running simultaneously on a socket and at what energy? This will continue to be pushed forward. The second trend that will happen, though it has not really started yet, but it is still in the research, is how the whole systems will interoperate. I have been predicting this for years. Maybe it will never happen, but we are seeing now that a truly dynamic method of execution will become a peer of the static regular method of execution. Then eventually technology enhancements will cause it to supersede static means. That tells me at the hardware level and the software level as well as at the programming model, we will continue to see a divergence.
The question then - and I do not really have the answer - is whether that will have a trickle down effect. This would mean for instance, and I do not expect this to happen, that MIC would end up in PCs. I am not predicting that; I just want to point that out.
Satoshi Matsuoka: That is a very good point. Currently, if you look at the processor architectures used in PCs and laptops they use the same cores as the big machines. Most of the processors for the consumer space now have an embedded GPU, which is essentially a many-core processor with extremely lightweight cores. Intel MIC has an origin in being designed as a GPU with the very same architectural philosophy.
Matsuoka: The point is rather that the performance divergence of the top tier system and the lower part of the TOP500 can be articulated if one observes the architectural trend of consumer devices, such as smart phones, tablets and also PCs that have long been predicted of becoming parallel architectures. They have come from a single core to 2 cores, 4 cores, and they also have GPUs. But now there is saturation - if a Gizmo-loving person on the street would want to buy an 8 core processor in a single PC, this is difficult currently, as Intel only sells consumer CPUs up to six cores, whereas the upcoming Xeons could have up to 15 cores for servers and supercomputers. So even here one can observe a big divergence, in that commodity consumer devices are no longer driving the high-end like it used to.
Another example would be, for ultrabooks the evolution of Intel Sandy Bridge to Ivy Bridge and, now to Haswell has resulted in fairly minimal improvement in overall performance. On the other hand, evolution of the Xeon versions of the same cores will expect tremendous jump in performance due to significant increase in the number of cores, enhanced floating point, as well as improved memory bandwidth with DDR4 memory.
Thomas Sterling: Performance on a per core basis, per socket or overall?
Satoshi Matsuoka: Overall. Technology has been invested to improve the overall user experience For consumers, the technology has been invested in prolonging the battery life rather than performance in execution speed, so the CPUs in commodity space are now saturated at 2 or 4 cores. Then accelerations are happening in the GPUs embedded in those, and that is where we are observing the majority of performance boosts in consumer devices , largely driven by rapid increase in the screen resolutions of smart phones and tablets. But maybe that could also one day be saturated once people believe that they have sufficient graphics performance, and technology improvements will be used not for performance in terms of speed, but rather to decrease cost, prolong battery life, etc. For consumer devices, such performance saturation is a fairly common occurrence, and leads to divergence against the high-end. This leads to a big question of whether the big eco system of architecture and software development, all the way down from the embedded space, smartphones and tablets, onto high-end PCsand servers, all the way up to big machines, would be sustainable.
In the past there was much more alignment when I could take several notebooks, build a small cluster, and that would perform well ; however, these days this would not be sensible, asthere is so much performance gap between the low end commodity and the high-end . Will such a trend scale upwards into causing divergence in the TOP500 as well? That is something I still do not know.
Thomas Sterling: I would say key saturation is the bandwidth on the pins of the sockets to the memory systems. That has not grown proportionally with the number of cores. That results in performance saturation.
Satoshi Matsuoka: I was actually focusing more on the saturation of user requirements.
Thomas Sterling: I see, I misunderstood.
The interview is published in four parts: