Porting and optimization of applications is facilitated by adherence to standards (MPI and OpenMP), and by extending the task-based OmpSs model developed by Barcelona Supercomputing Center (BSC). ParaStation MPI, provided as part of ParTec's ParaStation ClusterSuite, has been turned into a Global MPI, the key system software component linking Cluster and Booster. The system is located at Jülich Supercomputing Centre (JSC) and is fully integrated with the hardware and software infrastructure on site. Initial application results clearly show the performance and efficiency potential of the system, and JSC plans to operate the machine for several years to come and make it available to external users.
Scalability, energy efficiency, programmability, and manageability are major challenges on the way to building exascale-class supercomputers. To address them, the collaborative DEEP R&D project implemented the novel Cluster-Booster concept, a heterogeneous architecture that enables applications to always run at the right level of concurrency: highly scalable code parts profit from the throughput of the many-core Booster, while code parts with limited scalability benefit from the high per-thread performance of a conventional Cluster.
The final DEEP system is up and running at JSC: with a peak performance of 500 TFlop/s, it uses Eurotech's Aurora technology to achieve tight packaging - the whole system uses less than two racks - and high energy efficiency through direct liquid cooling. The DEEP Booster tightly integrates 384 Intel Xeon Phi nodes communicating over a 3D high-performance torus network based on Extoll technology. The Booster was designed by Eurotech in close collaboration with Intel, Heidelberg University and Leibniz Supercomputing Centre under the guidance of Intel within the ExaCluster Lab at Jülich Supercomputing Centre. Furthermore, partner LRZ developed DEEPs novel RAS architecture, providing advanced monitoring tools that give a holistic picture of the system status with a level of detail not previously seen in HPC machines.
To mask the relative complexity of the Cluster-Booster architecture, DEEP developed a complete software stack that features an easy-to-use and familiar programming environment for application developers, and can achieve an optimal match between hardware and application characteristics. A global MPI implementation covers both Cluster and Booster, and is based on the fully MPI-3-compliant ParaStation MPI by the Munich-based software company ParTec. On top of that, the task-based OmpSs model developed by BSC now supports the DEEP collective offload model for highly parallel kernels that use MPI. Both layers are available on a wide variety of platforms.
Last but not least, six real-world HPC applications from science and industry were optimized for the DEEP concept. The work resulted in modernized versions of the codes, which are now ready to achieve high performance across a wide range of architectures. Initial results on the final DEEP system show the performance potential and clearly demonstrate the advantages of its architecture, such as its high flexibility and efficiency in using system resources.
"At first DEEP was just an idea. A group of the most competent, dedicated and enthusiastic scientists and engineers from all over Europe, strongly supported by the European Commission, breathed life into this idea. The companies, research institutes and universities behind the consortium can all be proud of having created a unique system, which is both most generally applicable and also unimaginably scalable. The DEEP Cluster-Booster concept will become part of the future of supercomputing", stated Prof. Dr. Thomas Lippert, Head of Jülich Supercomputing Centre and Scientific Coordinator of the DEEP project.
The prototype system at JSC will be made accessible to HPC application developers outside the DEEP project. Interested researchers should contact the DEEP Project Management Team via firstname.lastname@example.org . Additionally, JSC plans to complement the current JURECA system with a Booster machine of 10 PFlop/s in 2016/2017.
For more information on the project, you can visit the DEEP project website and the official project brochure . Results will also be showcased at the world's largest supercomputing conference SC15 taking place from November 15 to 20 in Austin, Texas, USA.
The DEEP project is a tight collaboration of 16 partners from across Europe, coordinated by JSC at Forschungszentrum Jülich. It was co-funded by the Commission of the European Union under Framework 7 with grant number 287530 and a total budget of 18.5 million euro.