To build an efficient execution environment in terms of power consumption, productivity and usability, the application developers are involved in the design. There has to be a mutual understanding of both the computer architecture and system software, and the applications. So the developers have to look at performance predictions to find out the best solution with constraints, including power consumption, budget and space.
There will be an international collaboration with DOE-MEXT and JLESC, the Joint Laboratory for Extreme Scale Computing, an international, virtual organisation whose goal is to enhance the ability of member organisations and investigators to make the bridge between Petascale and Extreme computing.
Next to the international collaboration, there are also domestic collaboration opportunities with the universities of Tsukuba, Tokyo and Kyoto, Yutaka Ishikawa told the audience, as well as with the different communities, including the HPCI consortium, the PC Cluster consortium, and OpenHPC.
The target performance of the "Post K" machine will be 110 times that of the K computer by the capacity computing and 50 times that of the K computer by the capability computing. The power consumption will be between 30 and 40 MW. The power consumption of the K computer is 12.7 MW.
The "Post K" hardware consists of a manycore architecture with a 6D mesh/tonus interconnect and a 3-level hierarchical storage system, using silicon disk, magnetic disk, and storage for archiving.
As for the system software, Yutaka Ishikawa explained that a multi-kernel Linux with light-weight kernel will be used. There will be a file I/O middleware for the 3-level hierarchical storage system and application and an application-oriented file I/O middleware, as well as an MPI+OpenMP programming environment and a highly productive programming language and libraries.
Yutaka Ishikawa explained what has been done so far. In terms of the hardware, an instruction set architecture has been established. The team is now continuing to design a node architecture, a system configuration and storage system.
In terms of the software, an OS functional design has been developed, next to a communication functional design and a file I/O functional design. Programming languages and mathematical libraries have also been established.
The instruction set architecture is deployed by Fujitsu. Fujitsu's HPC CPU will support ARMv8. The "Post K" fully utilizes Fujitsu's proven supercomputer micro-architecture. Fujitsu, as a lead partner of ARM HPC extension development, is working to realize an ARM-powered supercomputer with high application performance. ARMv8 brings out the real strength of Fujitsu's micro-architecture, Yutaka Ishikawa explained. He promised that detailed features will be announced at Hot Chips 28 , a symposium on high performance chips, that will be organized August 21-23, 2016 in Cupertino, California, USA. ARMv8, a next generation vector architecture for HPC, will be introduced in the session "GPUs and HPC Processors" on August 22, 2016.
Fujitsu's inheritances involve FMA, math acceleration primitives, inter core barrier, sector cache, and hardware prefetch assist.
In the international collaboration, there are more than ten research topics. The collaboration categories are collaborative development of open source software, evaluation and analysis of benchmarks and technologies, pre-standardization interface coordination, standardization of mature technologies, and collection and publication of open data.
Yutaka Ishikawa also gave an example of the system software collaboration with DOE-MEXT, which consists of memory management for a new memory hierarchy, together with Argonne National Laboratory, and of developing MPICH and LLC communication libraries. In the MPICH software structure, the CH4 is the successor of the CH3, the current abstract network device interface.
In terms of collaborative development of open source software, the Argonne National Laboratory will contribute with the CH4 hackathon for LLC and Riken AICS will contribute with a part of the CH4 implementation.
RIKEN is also working on Big Data assimilation technology in order to revolutionize very short-range predictions of severe weather. Yutaka Ishikawa explained how the simulations are carried out. The results of 100 ensemble simulations are read by data assimilation processes and the data size in total is over 1.7 TB. RIKEN collaborates with Northwestern University for the I/O benchmarks and pnetCDF implementations for scientific Big Data.
McKernel (manycore Kernel) is running on Intel Xeon and Xeon Phi. It is important to understand the benefit of a lightweight kernel and the differences of McKernel and mOS. The plan is to develop a standardization of the API for the lightweight kernel. There will be two meetings per year and a researcher will visit Intel for a few months.
XMP, XcalableMP, is a directive-based language for distributed memory systems, explained Yutaka Ishikawa. It is a PGAS language for a large scale distributed memory system with an HPF-like concept and an OpenMP-like description with directives. There are two memory models: Global View and Local View. The Global View involves PGAS, with an image of a large array distributed into partial ones in the nodes. The Local View is MPI-like and a co-array notation is allowed.
RIKEN AICS, the University of Houston and the University of Tsukuba will work on the extension of the Partitioned Global Address Space (PGAS) model with language constructs of multitasking, involving multithreading, for manycore-based exascale systems (XcalableMP 2.0).
Argonne National Laboratory, RIKEN AICS, and the University of Tsukuba will work on a runtime design for PGAS communication and multitasking using the Argobot lightweight user-level thread.
Yutaka Ishikawa concluded that Fujitsu decided that "post K"'s CPU is based on the ARM V8 with HPC extension. The usability will be improved in comparison with the K computer by changing the architecture. This includes more wide-range community support.
The system software stack for the "Post K" is being designed and implemented with the leverage of international collaborations. The software stack developed at RIKEN is open source but it also runs on Intel Xeon and Xeon Phi. RIKEN would also like to contribute to OpenHPC.
McKernel will be deployed in Oakforest-PACS, which is an Intel Knights Landing-based supercomputer that will be operated by the University of Tsukuba and the University of Tokyo in 2016. The peak performance is about 25 PFlops.