Back to Table of contents

Primeur weekly 2015-09-28

Special

Some perspectives on which approach to choose in the race towards exascale ...

System monitoring for energy-efficiency in the MontBlanc and DEEP-ER projects ...

Focus

In the following two years, 2016-2017, the European Union will spend over 150 million euro on HPC, ranging from HPC support for industry to exascale development ...

Quantum computing

D-Wave Systems announces multi-year agreement to provide its technology to Google, NASA and USRA's Quantum Artificial Intelligence Lab ...

Focus on Europe

PRACE to issue Newsletter 16 ...

Leibniz Supercomputing Centre in Garching near Munich to host 59th International HPC User Forum ...

Middleware

New support for CAMERA to develop computational mathematics for experimental facilities research ...

Hardware

Nor-Tech pioneers low-cost supercomputer solution ...

Curtiss-Wright launches new OpenHPEC Initiative to bring supercomputing software tools to embedded COTS systems ...

ADVA Optical Networking and PSNC trial 400G DCI technology in Research and Education Network ...

CENIC awarded International Networking Grant from NSF ...

Mellanox and Ixia demonstrate industry-first interoperability of 100Gb/s Ethernet platforms over 2km of optical fiber with silicon photonics transceivers ...

Applications

New 'stealth dark matter' theory may explain mystery of the universe's missing mass ...

One step closer to a new kind of computer ...

TACC supercomputers power RNA-seq analysis tools at summer bioinformatics workshop ...

LLNL joins Rensselaer Polytechnic Institute to promote industry adoption of supercomputing ...

Desika Narayanan unlocks the secrets to the brightest galaxies in the universe ...

Health care organisations select Mellanox InfiniBand-based Cloud ...

The Cloud

Dew helps ground Cloud computing ...

Computer scientist seeks stronger security shroud for the Cloud ...

Rescale and CTC announce strategic partnership to provide Cloud HPC Platform in Japan ...

Some perspectives on which approach to choose in the race towards exascale


16 Jul 2015 Frankfurt - Following the ISC15 Conference in Frankfurt, Germany, a series of workshops was organized the day after the conference.Primeur Magazineattended the workshop on European Exascale Research organized by researchers from the Barcelona Supercomputing Center and the Juelich Supercomputer Center. The introductory keynote was provided by Gilad Shainer from the HPC Advisory Council who talked about the various approaches that are now being considered in order to move forward in the race towards exascale. The magic word is co-design in order to get some real results. Gilad Shainer took us into the world of co-design to discover that there are multiple pathways that can be followed in co-design.

Gilad Shainer presented the HPC Advisory Council as a body that tries to involve more people into HPC. This is one way to promote exascale, he told the audience. One of its initiatives is the Student Cluster Competition. At ISC15, the Student Cluster Competition was present with eleven teams from China, Europe (Estonia!), Asia (India!), Africa and America who worked for three days on Linpac and performance issues in trying to build their own HPC system. The award ceremony formed the apotheosis of this big event for students in Computer Science.

Discussions about exascale will go on from now to 2024, supposed Gilad Shainer. The first petaflop system was the Roadrunner. Today, we have about 66 petaflop-sustained systems in the TOP500. In the USA the CORAL initiative recently announced three new and innovative systems.

If we take a global perspective, we see that Japan announced they would build an exaflop systen in 2020-2021. Gilad Shainer thinks that from a global perspective 2024-2025 makes more sense to reach exascale.

He told the audience that all the Indian villages will be connected with fiber. The aim is to connect 10.000 villages. In India, they will build HPC centres in several places. India is entering very strongly into the race.

Europe is working hard to get to exascale but is not ready yet but more and more effort is being done. There are several things that are about to change such as moving from SMP to cluster and from single core to multi-core.

The way to go to the next level is co-design, however. It has multiple aspects. Exascale will be enabled by co-design. There are four different approaches in co-design including hardware-to-hardware co-design, software-to-software co-design, hardware-to-software co-design, and co-design between industry and academia. There are a lot of middlewares out there. You need to unify all those elements.

In addition there is standard and non-standard as well as closed and open source, Gilad Shainer explained. Collaboration in open source is the future.

Exascale needs to be configurable and programmable. The Baidu company is working on it.

Co-design needs to start from a discrete level focus. The entire framework of applications has to respond to the system. Feature elements need to be co-designed. Every element in the data centre has to become a co-processor. If you bring elements of application into the data centre, each element can become a co-processor by itself, Gilad Shainer explained.

Virtualisation seems a lot of software. It is for data centres that want to share their resources. Can it be the answer for resilience of exascale? By separating the software from the hardware part, you are resilient for the hardware errors. If you get more failures, the system has more checkpoints and restarts. This needs to be tackled. You need to eliminate the overhead of the virtualisation aspect.

Gilad Shainer presented a few co-design examples to the audience.

From the interconnect perspective there is an increasing bandwidth. In fact, there is no limit. It is a different story for the latency. The latency is limited to 500 nanoseconds at this point. Even if you get the latency to zero, that is it. How to go to the next level, Gilad Shainer asked.

Tens of microseconds of latencies will still be remaining. You need to take the entire framework into the calculation. Some processing will run on the adapter, some on the switch, etc. In any case, you can take a framework and break it up to reduce the latency from tens of microseconds to a single digit of a microsecond.

In 2016, this will happen. This is going to impact all other systems.

Software and software co-design is being addressed in the new Unified Communication X (UCX) project, Gilad Shainer said. This software framework will eliminate different software frameworks that existed before. The collaboration exists of those founding members that each had its software framework for running HPC applications:

  • Mellanox co-designs the network interface and contributes the MXM technology.
  • Oak Ridge National Laboratory co-designs the network interface and contributes the UCCS project. The goal of this project is to establish a software interface that can map into different levels of infrastructures whether they are Cray systems of shared memory interfaces or standard interfaces. They are able to have something that can map into an entire system and have videos working on GPUDirect.
  • NVIDIA co-designs high-quality support for GPU devices.
  • IBM co-designs the network interface and contributes ideas and concepts from PAMI, that was used on the BlueGene machines.
  • The University of Houston and the University of Tennessee at Knoxville focus on integration with their research platforms.

The idea is to also do software-to-software co-design.

Now we have a new project that took all these discrete elements in order to unify them into one framework, Gilad Shainer explained. If you have more users and companies working on that same framework, this is where innovation and performance are going to come out.

There are three elements that will build the solution:

1. The creation of one unified framework instead of multiple frameworks.

2. The supporting of many variations of infrastructures. Here, Gilad Shainer warned to not create a high-level API because you then create an overhead of the software. UCX limits itself to the hardware level of each of the infrastructures. It eliminates some of the software interfaces.

3. It has to be open source to drive innovation and community development.

Another aspect of co-design is a hardware-to-hardware co-design. GPUDirect is an example of hardware-to-hardware co-design. It was announced several years ago. Now we are getting into the fourth generation of GPUDirect, Gilad Shainer explained. The idea is to take the GPUs and improve their application performance running on top of it. The starting point is that when you are running an application on the GPU, the GPU needs to communicate the data in and out by moving the data from the GPU to the CPU memory and then from the CPU memory you can send it to another server. Then it needs to go from the CPU there to be copied to the GPU.

It was a big overhead because everything needed to be copied from the CPU to the GPU. The GPUDirect was the result of a collaboration between hardware companies that enabled running the buffers and mapping the GPU memory as an extension to the CPU memory. The GPU can go and access the outside directory through peer to peer option PCA express. The result was that the latencies moved from 22 microseconds to 2 microseconds.

In fact, GPU went out of HPC to other domains also, Gilad Shainer said.

Now there is work on the GPUDirect 4.0 which actually also moved into the control system to be entered directly by the GPU, not just the data path. The GPUDirect 4.0 will be able to reduce application latencies to another 20-25 percent.

The co-design approach is critical for exascale, Gilad Shainer insisted. It is important for companies, laboratories and universities to work together in order to enable those co-designs.

Gilad Shainer concluded his talk by telling a little bit more about the activities of the HPC Advisory Council.

The HPC Advisory Council is a worldwide HPC non-profit organisation of about 400 members. It bridges the gap between HPC usage and its potential. It provides best practices and a support and development centre. It explores future technologies and future developments. It is leading edge solutions and technology demonstrations.

The HPC Advisory Council is bringing HPC to music in a special project now. HPC music is an advanced project about HPC Computing and Music Production dedicated to enable HPC in music creation. Its goal is to develop HPC cluster and Cloud solutions that further enable the future of music production and reproduction. HPC in music can replace a whole orchestra and even get finer results.

The HPC Advisory Council is also organizing multiple conferences all over the world. The next ones are in China in October 2015 and in South Africa in December 2015.

Ad Emmen

Back to Table of contents

Primeur weekly 2015-09-28

Special

Some perspectives on which approach to choose in the race towards exascale ...

System monitoring for energy-efficiency in the MontBlanc and DEEP-ER projects ...

Focus

In the following two years, 2016-2017, the European Union will spend over 150 million euro on HPC, ranging from HPC support for industry to exascale development ...

Quantum computing

D-Wave Systems announces multi-year agreement to provide its technology to Google, NASA and USRA's Quantum Artificial Intelligence Lab ...

Focus on Europe

PRACE to issue Newsletter 16 ...

Leibniz Supercomputing Centre in Garching near Munich to host 59th International HPC User Forum ...

Middleware

New support for CAMERA to develop computational mathematics for experimental facilities research ...

Hardware

Nor-Tech pioneers low-cost supercomputer solution ...

Curtiss-Wright launches new OpenHPEC Initiative to bring supercomputing software tools to embedded COTS systems ...

ADVA Optical Networking and PSNC trial 400G DCI technology in Research and Education Network ...

CENIC awarded International Networking Grant from NSF ...

Mellanox and Ixia demonstrate industry-first interoperability of 100Gb/s Ethernet platforms over 2km of optical fiber with silicon photonics transceivers ...

Applications

New 'stealth dark matter' theory may explain mystery of the universe's missing mass ...

One step closer to a new kind of computer ...

TACC supercomputers power RNA-seq analysis tools at summer bioinformatics workshop ...

LLNL joins Rensselaer Polytechnic Institute to promote industry adoption of supercomputing ...

Desika Narayanan unlocks the secrets to the brightest galaxies in the universe ...

Health care organisations select Mellanox InfiniBand-based Cloud ...

The Cloud

Dew helps ground Cloud computing ...

Computer scientist seeks stronger security shroud for the Cloud ...

Rescale and CTC announce strategic partnership to provide Cloud HPC Platform in Japan ...