HPC in Asia 2017 - Status report from Japan

12 Jul 2017 Frankfurt - At the HPC in Asia Workshop during ISC 2017 in Frankfurt, Satoshi Matsuoka, Tokyo Tech, gave an overview of the HPC developments in Japan. In the current TOP500 list, there are currently two Japanese machines in the top10. The Oakforest-PACS at position 7, and the K computer at 8. But if you look at other lists, for instance the HPCG list, a Japanese machine, the K computer - is nr. 1 and another Japanese machine is nr. 4. If you look at the Green500 again a Japanese machine, the TSUBAME 3.0 is nr. 1. In fact, the first 4 places on the TOP500 are for Japanese machines. These machines are, to some degree, AI focused machines. So the most efficient machines in Japan, in terms of Green are not real HPC machines, but are AI/Big Data systems.

Japan has a High Performance Computing Infrastructure (HPCI) that has some similarity with PRACE in Europe. The combined capacity of the supercomputers involved is about 40 Petaflop/s. There is a national HPCI Allocation Process for research to get access to the systems.

The Flagship 2020 project will lead to the next national flagship system in 2020, the Post K system. Smaller Tier-2 systems will be leading in the mean time. Co-design is considered key for the new flagship supercomputer.

Each of the 9 supercomputer centres in Japan has 10-year upgrade plans from 2015 to 2025.

The flagship 2020 project has a dual mission. First it should develop the next flagship computer, tentatively called "post K". Second it should simultaneously develop a range of application codes to run on the "post K" that help to solve major scientific and societal issues. The design target is 100 times the performance for capacity computing of the K computer at a power consumption of 30-40 MWatt. The budget is around 110 billion JPY (around 1 billion euro). Fujitsu will invest another 30 billion JPY and the running costs will be around 10 billion JPY per year for a 6-year period.

Some details of the "post K" computer which is currently under design are already known. The CPU will be ARM based with SVE and FP16 extensions. The peak will be in the multi-hundred of Petaflop/s. The memory will be 3-D stacked DRAM with Terabytes/s bandwidth. The interconnect will be a TOFU3 CPU-integrated 6-D torus network. The machine is being designed and will be manufactured by Fujitsu. The development is led by Riken.

Unfortunately, on August 10, 2016 it has been announced that there will be a 1-2 year delay. Operation will now start in 2021, not 2020.

There have been nine priority application areas defined: Innovative drug discovery; personalized and preventive medicine, Hazards and disasters induced by earth quakes and tsunamis, manufacturing; fundamental laws of the universe; environmental predictions using Big Data; new functional devices and HPC; innovative clean energy systems; high-efficiency energy creation and usage.

There are also four exploratory application areas including neural circuits; formation of exoplanets; frontiers of basic science; interaction models of socio-economic phenomena.

The current fastest machine, the Oakforest-PACS is a very large Xeon Phi Knights Landing machine. It is manufactured by Fujitsu. It is the fastest Omnipath connected machine in the world.

The TSUBAME-3 machine has a BYTES centered architecture with 2160 GPUs. The machine has 540 SGI ICE XA nodes, each with Intel Xeon CPU and two NVIDIA Pascal GPUs and 245 Mbyte of memory. It has 47,2 AI Petaflop/s and 12,1 HPC Petaflop/s peak performance. Full operation is expected in August 2017. The system has a very dense design with 144 GPUs and 72 CPUs per rack. The average PUE is expected to be 1,033. The machine is water cooled.

There has been a tremendous growth in interest in AI in Japan. Three national centres have been established. METI-AIRC with a budget of 200 million euro per year; METI-AIP with a 50 million and MOST NICT wit a 50 million euro budget. The latter is for brain related AI research. There is an inter-ministry commitment for about 1 billion Euro over 10 years for AI research.

The METI-AIRC is looking at HPC helping out AI research. There is a joint lab established for BD/AI joint research using large scale HPC BD/A infrastructures. Satoshi Matsuoka is leading that lab.

Currently they re building the world's first open AI infrastructure which is called the ABCI: AI Bridging Cloud Infrastructure. The AI performance will be 130 - 200 AI-Petaflop/s. It will use less than 3 MWatt power and have a PUE of less that 1,1. It will be operational somewhere on Q4 2017 or Q1 2018.

There is already a ABCI prototype, called AAIC. The system vendor was NEC and the system has been operational since March 2017. The system has 400 NVIDIA Tesla P100s. For Big Data the system runs Apache Spark. It runs at 8 AI Petaflop/s.

The real purpose of the ABCI is to serve as a blueprint for an AI machine so we can have copies of this machine everywhere.

In further development Japan could see its first AI exascale machine - measured in FP16 or AI flop/s exascale system somewhere in 2019.

Ad Emmen