Will we have a data centre in a box by 2022? Most probably, says Satoshi Matsuoka at ISC'14


24 Jun 2014 Leipzig - At ISC'14 in Leipzig, the second keynote presentation was held by Satoshi Matsuoka from the Tokyo Institute of Technology in Japan. The topic was the convergence of supercomputing and extreme Big Data. Satoshi Matsuoka is convinced that we will evolve to the data centre in a box which means that the large data centre as we currently know it, will become Jurassic. This data centre in a box will have 10 Petaflops performance, 10 Petabytes of memory, 10K nodes, and 50 GB/s interconnect speed. So it will have a tremendous amount of memory. Satoshi Matsuoka predicted that the Tsubame4 edition planned for 2021-2022 will be a K computer in a box with a convergent architecture that will have 1/500 size, 1/500 power, 1/500 cost, and a 5-fold DRAM + NVM in comparison to the current state-of-the-art HPC system.

Satoshi Matsuoka started with talking about the Tsubame supercomputer. In 2010, it was the greenest supercomputer in the world. The machine has won a lot of awards. In 2011 it won the ACM Gordon Bell Prize and in 2013, the Tsubame 2.5 was the no. 1 in Japan in single precision FP, with 17 Petaflops.

With the Tsubame series of supercomputers, there has been an evolution towards exascale and extreme Big Data. Innovation is a key word as is the green agenda, Satoshi Matsuoka told the audience. A very fast I/O is important as well.

The speaker defined Extreme Big Data as data with a storage demand of a Yottabyte per year. The global Scientific Information and Computing Center in Tokyo is studying the issue.

Current Big Data are not really that big, explained Satoshi Matsuoka. The typical real definition for Big Data is the mining of people's privacy data to make money. Corporate data are usually in a data warehoused silo with a limited volume which seldom amounts to Petabytes. Processing involves simple O(n) algorithms, or those that can be accelerated with DB-inherited indexing algorithms. Executed or re-purposed commodity web servers linked with 1 Gbps networks running Hadoop are also considered as Big Data.

However, Satoshi Matsuoka warned, future extreme Big Data will be about mining Terabytes of silo data, Peta- to Zetabytes of data, and an ultra high Band Width data stream.

We will have tons of unknown genes which will require directly sequencing uncultured microbiomes obtained from a target environment and analyzing the sequence data, as well as finding novel genes.

The size of metagenomic sequencing data is only growing. It can only be handled by the biggest supercomputer.

Satoshi Matsuoka mentioned the example of the sequence analysis of the human oral microbiome. This requires more than 1 million nodes per hour produced on the K computer. It is the world's most sensitive sequence analysis based on the amino acid similarity matrix.

There are also extremely large graphs that have recently emerged in various application fields. The benchmark to handle this is the Graph500 'Big Data' Benchmark.

Cloud architechtures will almost certainly dominate a major chunk of the part of the list, according to Richard Murphy from Sandia.

Satoshi Matsuoka told the audience that in reality there are no Cloud International Data Centres (IDCs) at all however when comparing the top supercomputers - the K computer and Tianhe-2 - versus the global IDC. K has 88.000 nodes and 800.000 CPU cores.

73% of the total execution time is communication time, stated Satoshi Matsuoka. A supercomputer has 1500 nodes compute and storage. Compared to this, a Cloud Data Centre has 8 zones with a total of 5600 nodes, and an injection of 1GBps/node.

What does 220 Tbps mean for the supercomputer? asked Satoshi Matsuoka. The Tsubame 2.0 network has twice the capacity of the global internet, which is being used by 2.1 billion users.

The supercomputer is using the entire network.

Satoshi Matsuoka showed some historical hierarchical IDC (International Data Centres) networks. They include 10Gbps - 1 Gbps of consolidation and are driven by economics and Internet workloads. The performance is limited by incoming North-South traffic and an incoming request may create a 10-fold amount of messages but only a 3-4-fold amount of East-West internal traffic.

With Extreme Big Data this will change, predicted Satoshi Matsuoka. The IDC will grow 30-fold in 10 years with a server unit sales flat. The HPC however will grow 1000-fold in 10 years, meaning that the CAGR will be 100% for HPC while IDC will only have a CAGR of 30-40%.

Observations at multiple times are treated simultaneously, stated Satoshi Matsuoka, when expanding on the challenge in global data sharing among weather services with massive sensors and data assimilation in weather prediction. This is very I/O intensive and will require future non-silo extreme Big Data applications. The same goes for large scale metagenomics and ultra large scale graphs and social infrastructures.

Cloud IDC has a very low bandwidth and efficiency, stated Satoshi Matsuoka. Supercomputers are batch-oriented.

He mentioned the Japanese Big Data-HPC convergence projects such as the JST CREST post Petascale and the Advanced Computing and Optimization initiative by Katsuki Fujisawa.

Satoshi Matsuoka expanded on the architecture for extreme Big Data with an overview of the storage capacitiy of the Tsubame2.0/2.5, amounting to 11 Petabytes.

Japan is developing a Tsubame 3.0 protoytpe system with advanced next gen cooling including 40 compute nodes that are oil-submerged with 1200 liters of oil.

How should we design local storage for next-gen supercomputers? Satoshi Matsuoka asked. The requirements are a capacity of 4TB and a read bandwidth of 8GB/s.

The Tsubame 4 planned for 2020 will include DRAM+NVM+CPU with 3D/2.5D Die Stacking. This will become the ultimate convergence of Big Data and Extreme Computing with a direct chip-chip interconnect.

The extreme Big Data interconnects will have a non-uniform access and a low latency write/read, explained Satoshi Matsuoka.

The extreme Big Data algorithms will include graphs, sorting, clustering, and spatial data features and will be adapted to deep memory architectures and to many-core architectures. There will also be an interactive scheduler for extreme Big Data-based analysis.

JST CREST is a sister project, stated Satoshi Matsuoka. It is about large scale graph proecessing using NVM and hybrid-BFS.

The most recent Green Graph500 list was issued in November 2013. It measures the power-efficiency, using TEPS/W ratio. Results have been issued on various systems such as the Tsubame and K cluster. In June 2014, their results show a 6-fold improvement than in November 2013 with a no.1 ranking for the Graph500.

Sorting for extreme Big Data will happen by using a single node to the utmost capacity, explained Satoshi Matsuoka. The sorting will have long and variable length keys with implementations for GPUs and multi/many core CPUs. The hybrid parallellization scheme will be combining data-parallel and task-parallel stages with up to 100 million string keys per second.

Satoshi Matsuoko also made a performance prediction. There will be a 2.2-fold speedup compared to a CPU-based implementation when the number of PCI bandwidth increases to 50 GB/s. There will also be an 8.8% reduction of overall runtime when the accelerators work 4 times faster. The GPU implementation of splitter-based sorting has a weak scaling performance.

Extreme Big Data Programming will be done with DSLs, Libraries and APIs. There will be a software framework for large-scale supercomputers with optimizations for GPU accelerators, assigning a warp per key for avoiding warp divergence in Map/Reduce, stated Satoshi Matsuoka.

There are many existing graph analytics libraries but the aim is to create an open source highly scalable large scale graph.

The design will be based on the extended X10, fully utilizing the MPI collective communication and with native support for hybrid parallelism.

Satoshi Matsuoka also talked about XPregel optimizations on supercomputers, used for supporting larger domains than a GPU device memory and temporal blocking for communication avoiding. This performs multiple updates on a small block, before proceeding to the next block with single GPU performance, a 3D 7-point stencil on a K20X GPU with optimized TB, and a 10-fold larger domain size.

Satoshi Matsuoka however warned against the programming cost. Communication reducing algorithms efficiently support larger domains. The programmeing cost is a complex loop structure with complex border handling.

The memory hierarchy management will be done with runtime libraries. HHRT is for GPU supercomputers

The net result is quite impressive, stated Satoshi Matsuoka. It is beyond GPU memory efficient. The execution will come with a moderate programming cost.

Extreme Big Data is equally about system software and distributed objects, including distribution, instrumentation, scaling, and resilience. The extreme scale I/O is for burst buffers and one has to provide POSIX I/O interfaces. IBIO uses ibverbs for communication between clients and servers. Furthermore, it is necessary to exploit the network bandwidth of InfiniBand, explained Satoshi Matsuoka.

The end results are very good. Increasing the performance of the Parallel File System does impact the system efficiency.

The L2 C/R overhead is a major cause of degrading efficiency, so reducing the level-2 failure rate and improving the level-2 C/R is critical on future systems, stated Satoshi Matsuoka.

There are several extreme Big Data international efforts going on right now in Europe, the USA, China and future collaborations with Japan are being planned.

The US has initiated a Department of Energy (DoE) mission with regard to Extreme Scale Science involving genomics, high energy physics, light sources, and climate and driven by exponential technology advances.

In China, the third phase of the supercomputer project is upcoming with the HiTech 863 project supported by MOST. Satoshi Matsuoka said it will be a 2-years plan for 2015-2016 with less budget. It forms the preliminary research for an exascale system.

In Europe, the Horizon2020 Programme is starting up with a focus on HPC and Big Data Cells. The European Commision has a whole bunch of initiatives in store, concluded Satoshi Matsuoka.

Leslie Versweyveld