The DGX-2 server builds on the success of the DGX-1 server and increases and improves pretty much everything to create a 2Petaflop (tensor ops) monster of a system. Some of the hardware highlights include:
Another tip of the hat needs to go to the NVIDIA GPU Cloud as the number of containers/applications/frameworks that are available on this platform is growing daily. Optimised containers across Deep Learning, AI and HPC are readily available and vScaler used the Tensorflow container from this platform for the benchmarking exercise.
vScaler integration was seamless - the engineers had a preconfigured image that they have been using for their DeepOps integration and they flashed the system with that - bare metal provision, not virtualised. This provided them with all the tools needed to access the NVIDIA GPU Cloud container repository along with Kubernetes and other optimisation options, all based on Ubuntu Bionic 18.04 LTS.
All benchmarks were run using nvidia-docker, making use of the latest TensorFlow container provided by NVIDIA GPU Cloud, with the imagenet synthetic dataset, provided as part of the tf_cnn_benchmarks.
The benchmark script used was obtained from Github and they performed a sweep of batch sizes across the tests. All tests were run a number of times and the numbers reported were averaged.
To assess the performance of the system the vScaler engineers employed the commonly used ResNet Model which is used as a baseline for assessing training and inference performance. ResNet is shorthand for Residual Network and as the name suggests, it relies on Residual Learning - which tries to solve the challenges with training Deep Neural Networks. Such challenges include increased difficulty to train as they go deeper, as well as accuracy saturation and degradation. They selected two common models: ResNet-50 ResNet-152.
ResNet was introduced in 2015 and was the winner of ILSVRC - Large Scale Visual Recognition Challenge 2015 in image classification, detection, and localisation. There are of course many other Convolutional Neural Network (CNN) architecture models vScaler could have chosen from and in time it hopes to evaluate these also.
Each model was run using various batch sizes to ensure that each GPU was fully utilised, demanding the highest level of performance from the system. Each combination of batch size and GPU count was tested 3 times over 20 epochs and the average result recorded.
During the tests vScaler monitored the system power draw through the onboard sensors and captured data points using ipmitool.
More information on these benchmark runs is available at the vScaler website.