What again were the main HPC themes during this past decade? The speaker refreshed the audience's memory with this impressing list:
The trends in highlight from 2004 to 2013, according to Thomas Sterling, were the international breakthroughs across the high-end. Multicore has become the dominant form of parallelism and scaling growth with Moore's Law. We also witnessed the dawning of the Age of Petascale. The turbo-charging with GPU accelerators has taken a huge flight. In the long run, however, we are facing the energy barrier.
Thomas Sterling also remarked that commodity clusters truly are the workhorse of HPC. But he had a big question as to what is the hard struggle to make it matter, namely the application vacuum. The programming models more and more are in crisis, as he summed up MPI+X+Y+ ...: what is seriously going on? In addition, there are the memory lags. We are doomed to 1 Petabyte of main memory years after petaflops.
Yet, there are important advancements in processor architectures, according to the speaker. We have seen the rise of multi-core Intel Xeon processors, from EMT64 to the Westmere line-up.
Thomas Sterling also mentioned the IBM PowerPC BlueGene family of machines with the types P and Q; the Fujitsu SPARRC64 which has been evolving for several years; and NVIDIA Tesla, the GPGPU accelerator.
Meanwhile, the Intel Many Integrated Core (MIC) is the main processing component of Tianhe-2 and TACC's Stampede.
The IBM PowerXCall 8i, released in 2008 for the Roadrunner, can be considered as a sidepath but it had the huge merit of opening the field to new types of architecture.
If we take a look at the processor trends, we can see that power is the key driver. The clock rates have nearly flat-lined. Multi-core socket architectures are the principal source of performance again. The accelerators show high visibility but their full long-term adoption is still uncertain. The light-weight processors are emerging. The X86 dominates the mainstream and Intel is continuously dominating and innovating. There is also more access for introspection sensing and control, as Thomas Sterling explained.
The memory technology equally has evolved, with DDR2 in 2004 and DDR3 launched in 2007, and JEDEC, as the final DDR4 specification.
As far as Ethernet is concerned, gigabit Ethernet has been the prevalent class of interconnects during the last decade.
Infiniband has become equally important, according to the speaker, as the scalable switched fabric communications technology. Since June 2012, Infiniband is the dominant interconnect in the TOP500 list.
Mature technologies were Myrinet-2000 and Quadrics, stated Thomas Sterling. Cray interconnects with Seastar, Gemini, and Aries/Dragonfly.
The intra-node commuication happens through the PCI-Express standard, that was defined in 2004, and via
When elaborating on the Hard Disk Drives, Thomas Sterling told the audience that since 2004, the single drive capacity grew. There have been several technology improvements, as well as an important interface evolution, mainly by the three dominant manufacturers.
Non-volatile Solid State Drives have reduced significantly in price per GB since mid-2000.
The Earth Simulator was the no. 1 for Japan in March 2002 and was nicknamed the 'computenik'. The system was based on SX-6 architecture. It was a good engineering machine, according to the speaker, who also stated that every 11 years, there is an increase by 3 orders of magnitude.
The BlueGene/L, made in the USA, ranked no. 1 in 2008 as the first Petaflops machine, also known as the
Roadrunner. In 2009, the no. 1 was the Jaguar, which Thomas Sterling referred to as the power hungry leader. But in 2009, there was also the Tianhe-1 from China.
In 2011, Japan reached the 10 PetaFlops with its K computer and in 2012, the USA striked back with the Sequoia, an IBM BlueGene system. Jaguar was also avenged with Titan in 2012. And, at present, in 2013, China tops the list with the Tianhe-2, the 30 PetaFlops dragon.
The energy per flop is improving, stated Thomas Sterling, so we are forced to exploit more concurrency.
The trends of the decade are that HPC systems have become really international in scope. The US, Japan, and China all have their number 1 systems deployed multiple times. There is a continued flops performance growth sustained in the HPC realm. However there are memory lags with a diminishing memory capacity per core. Thomas Sterling told the audience that there is no single solution: we are witnessing both homogeneity and heterogeneity. Perhaps we are facing a period of transition machines. Still, Thomas Sterling thought that something is missing.
If we are looking for the canoncial HPC system, what is the typical computer for the TOP500 list? From the archtictural point of view, it is a commodity cluster. Clusters continue to dominate. The processor is a 64-bit Intel SandyBridge. The canonical machine is ranked at #68. It is homogeneous with FDR Infiniband interconnect and it runs SUSE Linux. The Linux systems continue to outnumber other OS instances. The power consumption amounts to 538kW, or 0.739 GFLOPS/W. And the system is operated by acamdemia, according to Thomas Sterling.
The speaker stated that there are two worlds in the HPC community. The clusters represent more than 80% of the machines which is more than the 50% in 2004, with half the aggregate performance, and with a vastly larger user base.
The vendors were not highlighted by Thomas Sterling but he is convinced that they, nonetheless, have a major impact on HPC. HP has the largest deployed base this time.
The major concern consists in the applications, as Thomas Sterling stated. There are selected advances in N-body with a few brief cases of applications including simulated 1T dark matter particles to model dark matter structures in the Universe.
There are also advances in molecular dynamics with advances in AMR code simulating stellar formation with 34 levels, and progress in Magnetic Fusion Energy research.
Thomas Sterling concluded his overview with a number of predictions. For 2023 and beyond, we will be entering the nano-scale feature-size era with 3D stacking. There will come a new execution model which will be widely adopted to guide the system design, architecture and programming models. The runtime system software will be incorporated in the system software stack and supported through architecture mechanisms. The microprocessor as we know it will disappear and will be replaced by very lightweight embedded memory processors initially. The declarative programming interface will come to rise and shine, as well as the machine intelligence system management. There will be a deep understanding by the machine of what it needs to do. And finally, we will never get to Zettaflops using semiconductors and discrete numeric operations, Thomas Sterling finished his talk.