NVIDIA GPUs have been the accelerator of choice in the HPC community for over a decade, delivering trillions of floating-point operations per second (FLOPS). NVIDIA GPUs accelerate over 600 applications today, and have become critical to advanced HPC workloads in fields such as computational chemistry, molecular dynamics, and deep learning frameworks.
While GPUs offer outstanding performance and power-efficiency, managing GPU workloads can be challenging. Modern HPC environments are frequently powered by thousands of CPU cores along with thousands or even millions of GPU-resident cores shared among many users and applications. Efficient operation requires advanced workload management capabilities such as CPU-GPU affinity, NUMA, and topology-aware scheduling for NVIDIA NVLink and NVSwitch multi-GPU interconnects, advanced container support, and integrations with tools such as NVIDIA's Data Center GPU Manager (DCGM).
"Univa is a pioneer in support for advanced GPU workloads", stated Univa's Chief Technology Officer Fritz Ferstl citing Univa's work with Japan's AI Bridging Cloud Infrastructure (ABCI). "ABCI is a 550 AI-Petaflop, top-ten supercomputer where Univa Grid Engine manages HPC and AI workloads across over 4000 NVIDIA V100 Tensor Core GPUs. What we learn in leading-edge HPC drives innovation in Univa Grid Engine for the benefit of our commercial customers, including those running NVIDIA DGX systems. Customers realize better performance, improved utilization, and systems that are easier to manage."
Univa Grid Engine helps users simplify the management of GPU workloads. Sites can boost performance by placing workloads optimally and improve overall efficiency and productivity by reducing wait times and allowing more jobs to run simultaneously without conflict.
Univa was among the first commercial HPC software providers to support Arm-based systems, announcing Univa Grid Engine Arm support in 2013. Since that time, Arm systems have made steady inroads breaking into the TOP500 list of global supercomputers in 2018. Arm systems offer compelling price-performance, are power-efficient, and are available from multiple computer manufacturers.
"The availability of NVIDIA GPUs and software tools on Arm is an important development for the HPC community", stated Gary Tyreman, CEO of Univa. "GPUs play a central role in HPC and AI applications. For Univa, offering our advanced GPU workload management capabilities on Arm systems makes sense. It provides our customers and partners with added flexibility and new infrastructure choices."
"Deep integration with NVIDIA GPUs and support for NVIDIA NGC for HPC and AI containers is an excellent solution for customers running GPU applications locally or across their choice of Clouds", stated Duncan Poole, Director of Platform Alliances at NVIDIA.
Advanced GPU-scheduling will be available immediately as part of Univa's Arm-based Univa Grid Engine distribution.