2017 was a very productive year for Deep Learning in general and for deep learning scaling in particular. There were many published results on this topic, each offering increasing levels of performance by overcoming previous scalability challenges. As a result of its work as an Intel Parallel Computing Center (PCC), SURFsara has recently published scaling results on up to 1536 Intel Xeon Phi Knights Landing nodes and 1024 Intel Xeon Skylake CPUs.
Through several optimizations SURFsara managed to solve a problem that required two weeks when using state-of-the-art GPUs in only 28 minutes when using a large-scale cluster. Moreover, SURFsara has managed to train models 10 times larger than the ones regularly used today, on datasets that are also around 10 times larger, and can achieve state-of-the-art accuracy. These results were also detailed in a recent blog post.
Noticing the fact that Intel Xeon Skylake CPUs can sustain deep learning training throughput almost as high as state-of-the-art GPUs was particularly exciting. And it does this while also taking care of all OS/system specific tasks, unlike a GPU. As the Intel Xeon Skylake architecture is a clear candidate to form the CPU backbone in the coming timeframe, it is important to note that deep learning training is now competitive on high-end CPUs.
As a subsidiary of the Dutch Cooperative SURF and the Netherlands' leading centre for high performance computing, SURFsara's core expertise lies in scientific application optimization for execution on large-scale HPC resources, through an intimate knowledge of the underlying interplay between the various hardware elements of a large-scale system, but also through an in-depth software domain knowledge, be it numerical weather prediction, astronomy, or drug discovery.
Being well involved in several of these domain-specific scientific projects, SURFsara has noticed that the deep learning tools developed during the IPCC collaboration can be adapted and applied to other scientific disciplines as well, such as the ones previously mentioned. Thus, it is the goal of SURFsara's 2018 collaboration to apply these techniques and improve the results for important real-life scientific problems. This work will be performed in the context of the SURF Open Innovation Lab at SURFsara.
Prof. Dr. Anwar Osseyran, member of the Executive board of SURF and CEO of SURFsara, stated: "After having reached very successful results in accelerating image classification training by improving deep learning scaling during our first year as an Intel PCC, we want to extend and adapt these efficient scaling techniques to other disciplines that traditionally use HPC, but that can benefit from deep learning as well. These can range from numerical weather predictions to molecular dynamics. This will help solve real life problems that are too big for today's analytical solutions. Moreover, by making use of modern supercomputing facilities this can be done in a fraction of the time needed on smaller systems."