Stephen Pawlowski started with focusing on the scaling trends. The transistor dimensions will continue to scale 2 times every two years and will improve performance, reduce power and reduce cost per transistor.
Added metal layers and material improvements will continue to enable the interconnect scaling. SCRAM cell density will also likely continue so we can confirm that Moore's Law is alive and well.
But going forward, scaling will be as much about material and structure innovation as dimensional scaling, Stephen Pawlowski went on. Getting to exascale by 2020 requires a performance improvement of factor 2 every year. The address performance via innovations is enabled by Moore's Law.
The traditional scaling will not get us to exascale by 2020, warned Stephen Pawlowski. Key innovations were needed to keep us on track in the past such as many cores, etc. The 3D chip stacking has provided high density chip-chip connections with a small form factor while combining dissimilar technologies.
But we have to take into account the added cost. And what about the degraded power delivery, and heat sinking? And what is the area impact on lower chip? A lot of questions to answer, as stated by the speaker, who suggested 3D chip stacking using through-silicon vias.
Power and energy are the two issues, he stated while pleading for optical interconnects. The electrical interconnects might be a solution in the short term. The cost and power efficiency will determine the take-off point.
System integration of a wide range of heterogeneous elements is needed for performance and power, as the speaker explained, but we face potentially increasing leakage and transistor variations at lower voltages. The resiliency will continue to be a challenge. Developers are requiring "resiliency aware" circuits and applications.
Stephen Pawlowski presented some holistic solutions to limit the power, including:
He also stated that the DRAM is not scaling with Moore's Law. To acquire cost balance, developers have to make the physical size of the memory capacity much smaller but the speaker did not expect that to happen soon. Developers also need to improve the balance by using lesser memory per compute via threading. There is also a need to invest and innovate in new high density memory technologies.
Stephen Pawlowski suggested a two-step process. Step 1 would be to improve the thread scalability performance. The load/execution imbalance requires hardware and software support for thread pacing. The
false cache sharing should be solved with transactional mechanisms to avoid thrashing. The start-up overheads have to be addressed with dedicated hardware queues and broadcast support. The synchronization overheads can be approached with fine grain, low latency hardware sync support. Fast hardware reductions are also needed. The speaker pointed out that Amdahl's law has its limits.
Step 2 involves new high density memory technologies via new memory architectures and storage models.
The speaker expanded on spin transfer torque, phase change memory, and resistive memory.
In fact, there are two design options for supercomputing, Stephen Pawlowski explained: a processor with more than 10B of transistors on a die in 2020 or a processor with fewer transistors.
Option 1, consisting of a large die with more than 10B of transistors involves more cache, fewer cores, and "everything integrated". This solution enables on-package memory and a cache size beyond a certain threshold that is not utilized by the programmer, the speaker explained.
More cores will guarantee enough cache for HPC, and "everything integrated" but high FLOPS count on a die and the on-package memory becomes difficult to implement, warned Stephen Pawlowski.
Option 2 provides a cost-effective die that supports on-package memory. It would have a broad usage. With the right memory capacity per building block, it can address a large portion of the HPC market. As for the cost, the building blocks can replace the compute, the speaker explained.
The possibilities with the "building block" approach are twofold. The on-package memory has 8 to 10 times the bandwidth compared to external memory. At iso cost and memory capacity, the on-package memory enables 8 to 10 times of additional compute to be placed under the memory, the speaker explained.
Will the motherboard in 2020 be just a backplane of cards? he asked the audience. A redefinition of "best" is needed, he stated while pleading for a transition to realistic application performance benchmarks while holding on to the historic rate of progress.
To end his talk, Stephen Pawlowski stated that he believed in significant benefits for supercomputing with innovations resulting from Moore's Law.