Big Data/AI versus HPC are opposite ends of the high performance spectrum but the twain shall definitely meet

19 Jun 2017 Frankfurt - After Erich Strohmaier presented his analysis on the new edition of the TOP500, Satoshi Matsuoka from the Tokyo Institute of Technology shared his insights about exascale with the audience. About ten years ago, workshops were initiated on exascale and it was said that exascale would be reachable by 2020. If we consider the projected performance development, is it possible to have exaflop power by 2020? The energy efficiency might be 50 GFlops/W in late 2020 but even that still remains to be seen.

Satoshi Matsuoka explained that the new TSUBAME 3.0 is a BYTES-centric architecture with a scalability to all 2160 GPUs. All nodes can access the entire memory hierarchy. TSUBAME 3.0 is a large scale, general-purpose, Japanese national production supercomputer with a peak performance of 12,1 PFlops. The machine will be in full operation by August 2017.

TSUBAME 3.0 consists of 15 SGI ICE-XA racks, 2 network racks, and 3 DDN storage racks, so 20 racks in total. It uses 7nm+ post Volta GPUs and has about 10.000 CUDA cores and 12,55 Teraflops/chip. The total number of chips amounts to 80.000. There are 4 Terabytes per node of hierarchical memory for Big Data. It has a scalable high-dimensional torus or Hypercube topology. The power efficiency is 14,1 Gigaflops/W. It rises up to a threefold with a 1 exa DFP peak.

Satoshi Matsuoka said that he DARPA Exascale report projection turned out to be fairly accurate but asked himself whether just getting FLOP/s is all that valuable.

Indeed, the projections 10 years ago were accurate but what is the reality? The K Computer reached the no. 1 in the HPCG benchmark. There is a 73% of the total execution time wait in communication. It is a bytes-rich machine with a superior bytes algorithm.

Satoshi Matsuoka also talked about the characteristics of Big Data and AI computing. He addressed graph analytics, such as for social networks. HPC and Big Data/Artificial Intelligence are opposite ends of the HPC computing spectrum but HPC simulation apps can also be categorized likewise.

There is a need for both bandwidth and capacity (bytes) in an HPC-Big Data/Artificial Intelligence machine. This is obvious for lefthand sparse, bandwidth-dominated apps.

There is leadership in system ratios for tier-1 memory. However, the interconnect is still lacking. The memory is decreasing.

Any Big Data in the system can be moved to anywhere via RDMA speeds at a minimum of 12.5 GBytes/s also with stream processing. The TSUBAME 3.0 machine is scalable to all 2160 GPUs, not just 8.

The Fujitsu SPARC64 Xifx, launched in 2015, has a memory BW injection with a value of 2:1.

The path to capable exascale requires targeting the system architecture and node design. There are still interconnect shortcomings for enabling enhanced topologies.

Satoshi Matsuoka also mentioned the number of TOP500s in the world, including the TOP500, the Green500, the HPCG benchmark, the Graph 500, and the Green Graph 500. The question is whether we need all these lists. He thinks yes because these are the metrics that we want in our systems, he concluded.

Leslie Versweyveld