The K computer has a performance of 10,51 PFlops, which is realized by the 8-core high performance processor. The machine has liquid cooling, a Torus network, and a high density rack, Takumi Maruyama explained.
In processor development, Fujitsu has had a perpetual evolution of over 60 years. The company has been developing processors for mainframe, UNIX, HPC and Artificial Intelligence.
Takumi Maruyama expanded on the SPARC64 Xlfx chip for HPC. This chip has 32 computing cores and 2 assistant cores. The HPC-ACE2 is provided with Fujitsu's ISA enhancements. It has sector cache, meaning that it provides cache with software controllability.
The SPARC64 XII chip has been developed for Unix with 12 cores x 8 threads and Software on Chip. There is 32 MB L3 cache and embedded MAC and IOC with 20nm CMOS.
Takumi Maruyama also described Japan's Post-K computer development project. RIKEN and Fujitsu are currently developing the post-K computer, which is aimed to be the most advanced general-purpose supercomputer in the world. The goals of the project are to provide application performance, low power consumption, user convenience, and the ability to produce ground-breaking results.
The Fujitsu processor, that will be developed for the Post-K computer, is adopting ARM ISA and enhanced Tofu interconnect. This processor inherits and enhances the K computer's innovative features.
The Post-K processor supports FP16, according to Takumi Maruyama. It provides optimized precision for a wide range of applications with superior performance and reduces the required bandwidth and power consumption. The target applications involve existing numerical applications and brand-new applications such as Deep Learning.
The upcoming AI processor, developed by Fujitsu, is called DLU, which stands for Deep Learning Unit. The architecture is designed for Deep Learning with a low power consumption design and optimized precision. The goal is to reach a tenfold performance/watt compared to the competitors, Takumi Maruyama announced. It will have a scalable design with Tofu interconnect technology. This has the ability to handle large-scale neural networks.
The DLU design target is to create a high Deep Learning performance/watt. However, high performance and low power is not easy to achieve at the same time, Takumi Maruyama warned the audience. In fact, these are conflicting demands. For high performance more transistors with a higher frequency are needed in comparison with less transistors and a lower frequency when we talk low power.
This means that a new architecture is required for the DLU to achieve the target. The architecture is domain specific with optimal precision and also massively parallel, Takumi Maruyama explained, evolving from high precision to optimal precision, from sequential to massively parallel, and from general to domain specific. This requires many cores with an on-chip network.
The domain specific cores will be newly designed ISA with a simplified µ-architecture, fully software visible and controllable, using heterogeneous cores, DPE and large RF, according to Takumi Maruyama.
The combination of few large cores and many small execution cores results in more performance with less power consumption, compared to a conventional homogeneous structure. The DPU execution will execute DL operations based on master core's control.
The DPU consists of 16 DPEs connected with on-chip network. The DPE includes large RF and wide SIMD execution units to realize an efficient Deep Learning engine. The RF is fully software controllable unlike the cache to extract the full hardware potential, Takumi Maruyama explained.
Fujitsu's Deep Learning Integer realizes the necessary accuracy for Deep Learning with only a 16- or 8-bit data size. In fact, the Deep Learning Integer has shown similar accuracy with FP32 for Deep Learning.
Takumi Maruyama said there will be multiple generations of DLUs over time, as is currently the case for the HPC, UNIX and mainframe processors.
The Fujitsu processor design style is standard ISA with FJ enhancements and newly developed ISA. It is shared and simple with a software visible micro-architecture. It uses the latest semiconductor technology and has a shared design infrastructure for the circuit and methodology, involving a team of people.
Takumi Maruyama explained that the Fujitsu processor direction is general purpose as well as domain specific. There will be a wider variety of processors in the future to meet different requirements. He promised that Fujitsu will continue to develop cutting-edge processors to meet the needs of a new era.