16 Jul 2015 Frankfurt - Following the ISC15 Conference in Frankfurt, Germany, a series of workshops was organized the day after the conference.Primeur Magazineattended the workshop on European Exascale Research organized by researchers from the Barcelona Supercomputing Center and the Juelich Supercomputer Center. The introductory keynote was provided by Gilad Shainer from the HPC Advisory Council who talked about the various approaches that are now being considered in order to move forward in the race towards exascale. The magic word is co-design in order to get some real results. Gilad Shainer took us into the world of co-design to discover that there are multiple pathways that can be followed in co-design.
Gilad Shainer presented the HPC Advisory Council as a body that tries to involve more people into HPC. This is one way to promote exascale, he told the audience. One of its initiatives is the Student Cluster Competition. At ISC15, the Student Cluster Competition was present with eleven teams from China, Europe (Estonia!), Asia (India!), Africa and America who worked for three days on Linpac and performance issues in trying to build their own HPC system. The award ceremony formed the apotheosis of this big event for students in Computer Science.
Discussions about exascale will go on from now to 2024, supposed Gilad Shainer. The first petaflop system was the Roadrunner. Today, we have about 66 petaflop-sustained systems in the TOP500. In the USA the CORAL initiative recently announced three new and innovative systems.
If we take a global perspective, we see that Japan announced they would build an exaflop systen in 2020-2021. Gilad Shainer thinks that from a global perspective 2024-2025 makes more sense to reach exascale.
He told the audience that all the Indian villages will be connected with fiber. The aim is to connect 10.000 villages. In India, they will build HPC centres in several places. India is entering very strongly into the race.
Europe is working hard to get to exascale but is not ready yet but more and more effort is being done. There are several things that are about to change such as moving from SMP to cluster and from single core to multi-core.
The way to go to the next level is co-design, however. It has multiple aspects. Exascale will be enabled by co-design. There are four different approaches in co-design including hardware-to-hardware co-design, software-to-software co-design, hardware-to-software co-design, and co-design between industry and academia. There are a lot of middlewares out there. You need to unify all those elements.
In addition there is standard and non-standard as well as closed and open source, Gilad Shainer explained. Collaboration in open source is the future.
Exascale needs to be configurable and programmable. The Baidu company is working on it.
Co-design needs to start from a discrete level focus. The entire framework of applications has to respond to the system. Feature elements need to be co-designed. Every element in the data centre has to become a co-processor. If you bring elements of application into the data centre, each element can become a co-processor by itself, Gilad Shainer explained.
Virtualisation seems a lot of software. It is for data centres that want to share their resources. Can it be the answer for resilience of exascale? By separating the software from the hardware part, you are resilient for the hardware errors. If you get more failures, the system has more checkpoints and restarts. This needs to be tackled. You need to eliminate the overhead of the virtualisation aspect.
Gilad Shainer presented a few co-design examples to the audience.
From the interconnect perspective there is an increasing bandwidth. In fact, there is no limit. It is a different story for the latency. The latency is limited to 500 nanoseconds at this point. Even if you get the latency to zero, that is it. How to go to the next level, Gilad Shainer asked.
Tens of microseconds of latencies will still be remaining. You need to take the entire framework into the calculation. Some processing will run on the adapter, some on the switch, etc. In any case, you can take a framework and break it up to reduce the latency from tens of microseconds to a single digit of a microsecond.
In 2016, this will happen. This is going to impact all other systems.
Software and software co-design is being addressed in the new Unified Communication X (UCX) project, Gilad Shainer said. This software framework will eliminate different software frameworks that existed before. The collaboration exists of those founding members that each had its software framework for running HPC applications:
The idea is to also do software-to-software co-design.
Now we have a new project that took all these discrete elements in order to unify them into one framework, Gilad Shainer explained. If you have more users and companies working on that same framework, this is where innovation and performance are going to come out.
There are three elements that will build the solution:
1. The creation of one unified framework instead of multiple frameworks.
2. The supporting of many variations of infrastructures. Here, Gilad Shainer warned to not create a high-level API because you then create an overhead of the software. UCX limits itself to the hardware level of each of the infrastructures. It eliminates some of the software interfaces.
3. It has to be open source to drive innovation and community development.
Another aspect of co-design is a hardware-to-hardware co-design. GPUDirect is an example of hardware-to-hardware co-design. It was announced several years ago. Now we are getting into the fourth generation of GPUDirect, Gilad Shainer explained. The idea is to take the GPUs and improve their application performance running on top of it. The starting point is that when you are running an application on the GPU, the GPU needs to communicate the data in and out by moving the data from the GPU to the CPU memory and then from the CPU memory you can send it to another server. Then it needs to go from the CPU there to be copied to the GPU.
It was a big overhead because everything needed to be copied from the CPU to the GPU. The GPUDirect was the result of a collaboration between hardware companies that enabled running the buffers and mapping the GPU memory as an extension to the CPU memory. The GPU can go and access the outside directory through peer to peer option PCA express. The result was that the latencies moved from 22 microseconds to 2 microseconds.
In fact, GPU went out of HPC to other domains also, Gilad Shainer said.
Now there is work on the GPUDirect 4.0 which actually also moved into the control system to be entered directly by the GPU, not just the data path. The GPUDirect 4.0 will be able to reduce application latencies to another 20-25 percent.
The co-design approach is critical for exascale, Gilad Shainer insisted. It is important for companies, laboratories and universities to work together in order to enable those co-designs.
Gilad Shainer concluded his talk by telling a little bit more about the activities of the HPC Advisory Council.
The HPC Advisory Council is a worldwide HPC non-profit organisation of about 400 members. It bridges the gap between HPC usage and its potential. It provides best practices and a support and development centre. It explores future technologies and future developments. It is leading edge solutions and technology demonstrations.
The HPC Advisory Council is bringing HPC to music in a special project now. HPC music is an advanced project about HPC Computing and Music Production dedicated to enable HPC in music creation. Its goal is to develop HPC cluster and Cloud solutions that further enable the future of music production and reproduction. HPC in music can replace a whole orchestra and even get finer results.
The HPC Advisory Council is also organizing multiple conferences all over the world. The next ones are in China in October 2015 and in South Africa in December 2015.