The A.I. researchers of Orange in France were also able to use Caffe, the popular deep learning framework to test the system for scalability. They were able to scale the training job to 16 GPUs. This endeavour is continuing with various partners to adapt the framework to its full potential to exploit all the 20 GPUs in the system. The next step would be to scale to a cluster.
The team - Orange Silicon Valley and CoCoLink Korea - has also upgraded the system with the latest commercially available NVIDIA GPUs - GeForce GTX 1080 based on Pascal architecture. They were the first to validate a GTX 1080 for Deep Learning and identified that these consumer grade GPUs capable of achieving the same task of running GoogleNet on Caffe with 3.5 times faster speed in reaching a certain level of accuracy of image recognition during training than the NVIDIA Tesla K40 enterprise grade GPUs, which were unveiled in 2014.
This gives us a sense of how efficiency in deep learning systems is increasing over years in a beyond linear fashion.
Having identified this disruptive price/performance value proposition, the team loaded the KLIMAX system with 10 GTX 1080 GPUs.
They were able to fire up all Pascal GPUs on overclock (Boost) mode with a theoretical aggregate computation capability of 106 TeraFLOPS (Single Precision). So far the A.I. research team of Orange France were able to scale Caffe (NVIDIA fork) to 8 GPUs with beta release of CUDA 8.0 and CuDNN 5 and CuDNN4. The eventual objective is to scale the server capability with 20 Pascal GPUs with a computational horsepower in the excess of 200+ TeraFLOPS - a feat that has never been accomplished before with consumer grade graphics card.
A particular training job on ImageNet data which used to take Orange researchers one and a half days (36 hours) with a single NVIDIA K40 can now be accomplished in 3.5 hours using 8 NVIDIA GTX 1080 cards. This is more than 10x increase in speed in regard to training performance.
As the world transitions towards Exascale and A.I. turns out to be a global race, this particular experiment is a partnership between researchers of 3 countries - USA, France and South Korea - working together to accelerate Artificial Intelligence by building a supercomputer in a single server by pushing the limits of thermodynamics, geometry and price vs. performance efficiency.
This currently remains as a research project for Orange and there are no plans at present to implement or develop this as a commercial offering. Detailed benchmark data based on this research will be published by the team in the near future as they make more progress towards optimization of the Deep Learning framework in collaboration with the open source community, academia and industry partners.