The three objectives in 2011 with the first Mont-Blanc project were to develop an HPC prototype based on current mobile embedded technology; to learn from the experience and plan for a future architecture; and to port real scientific applications on the prototype. In the second Mont-Blanc project the team starts from two ideas. The first is to extend the work of Mont-Blanc I by increasing the number of scientific applications, extending the support of the OmpSs programming model, developing productivity tools, and planning the next generation Mont-Blanc architecture. In second place, the current Mont-Blanc project wants to explore the ARM 64-bit platforms because the market has changed in the meantime. Therefore there is a need to know more about the market of ARM-based platforms for mini-clusters. Next to this, Mont-Blanc II will dive into the issues of fault tolerance and resiliency. All these actions are backed up by the dissemination activities of the end-user group in the project.
In the history of architectures in the TOP500 one can discover waves of technology appearing. The last big wave is x86 that is still dominating the TOP500. If we look a bit more carefully at what is happening we can see that we sometimes have commodity technology which is going back to HPC. This is not only true for x86 starting from the desktop and going back to the TOP500, according to Filippo Mantovani, but also for other examples such as the RoadRunner that was based on the same chip used in the PlayStation 3. The success of GPU is also originating from commodity markets such as gaming in which you have large volumes and at the same time computational capability that can match the requirements of HPC.
If we look at the volumes of mobile phones and tablets that are sold in the last two years, we see that this is growing double-digit while servers and PCs are almost stable. The performance lines of mobile processors and microprocessors are growing closer and closer, as Filippo Mantovani showed in his graph. Mobile chips are pretty cheap because they are being sold in a very large volume.
The first Mont-Blanc prototype was built out of ARM multi-core processors. The team used a development kit for Android phones integrated in HPC fashion. Most of the prototypes were small to medium clusters tuned to run Linux and the Mont-Blanc software stack. At the end of last year, the team installed and tested two racks in the Barcelona Supercomputing Center. The first objective was to have a prototype up and running in which the team can develop software and applications.
The prototype is built out of mobile chips. At this moment, it is fairly old technology, consisting of Exynos5 Dual System-on-Chip. The system has two cores and an embedded GPU which are sharing the same memory. There is also some local storage that is very fast and high tech and a network. All these modules are fitting in a credit card form factor. The team assembled many of these in two racks. The team put 15 of them together in a blade. Nine of these blades form a chassis and the team can install as many chassis as it wants in a rack. In the end, the team had 8 chassis installed.
At the very beginning the GFLop/W was very low but the team is now getting better and closer to the Green500. Anyway, there still is a lot of space in order to improve the energy efficiency.
Filippo Mantovani went on to talk about the software stack that has been developed over the years. There is a cluster management interface, a queuing system, and support of most of the scientific libraries on ARM architectures. There are developer tools in order to debug and do performance analysis. There is the support of the OmpSs programming model for this kind of platform. It is a platform that allows developers to experiment, according to Filippo Mantovani. There is also open source to manage the GPU part.
However, there are some limitations in the use of commodity mobile technology. There is only dual core which means that there are problems for applications that need a large number of threads and for overlapping computation and communication. There is no ECC protection in the memory and no DMA support. There are no standard server I/O interfaces and no network protocol for off-load engine. In addition, the thermal package is not designed for sustained full-power operation. The team has to wait for the next system-on-chip that will be produced to solve some of these problems. There are implementation decisions and not unsolvable problems. The only need is a business case to justify the cost of including new features. In the meantime the team learns how to mitigate the effect of the existing limitations.
Filippo Mantovani also showed some examples of the problems you can have when deploying the prototype. In the Alya RED application which measures the electro mechanics of a rabbit's heart, the team encountered the difficulty of running the application on 512 cores and then with a double number of cores and ended up with the same performance. The team also has to execute control at node-level and at blade-level. A second example is the weak scalability of the Lulesh application due to problems in the transmission. The team therefore needs tools to monitor the platform, the temperature, the power, and the behaviour of the application in a very fine grain in order to understand the inside of the system.
The ongoing work in the Mont-Blanc project will address the modelling of next generation architectures. The team will also investigate the ARM 64-bit mobile and server market. The team aims to port new applications on the prototype and will improve the programming model. In addition, it will concentrate on monitoring the prototype in order to gain insight in fault tolerant techniques.
Mont-Blanc is like the first move in a poolgame, Filippo Mantovani concluded. The balls are going all over the canvas. The team is using mobile technology in order to have a cheap solution for high performance computing. However, if you drop the HPC requirement, you can create a cheap platform to do scientific computation. If you move into the other direction, you can start considering the server technology that is coming out with 48 cores of 64-bit, these are huge chips but this is a rather conservative approach. All this technology can be useful in other environments, like the automotive industry.