Back to Table of contents

Primeur weekly 2019-02-04

Special

A number of European countries joining forces to bid for hosting Europe's fastest supercomputer ...

Per Öster sees free cooling as asset in CSC's hosting entity bid preparing for pre-exascale ...

So you want to host a 250 million euro European pre-exascale supercomputer with EuroHPC funding? ...

Focus

2018 - Another year on the Road to Exascale - Part I ...

Quantum computing

IBM and University of Chicago collaborate to advance quantum computing ...

Quantum Xchange named a finalist for 2019 SXSW Interactive Innovation Awards ...

ColdQuanta sponsors research at University of Wisconsin to accelerate neutral atom quantum computer commercialization ...

Focus on Europe

CESGAHACK4 returns in March to accelerate the execution time of scientists' applications ...

EGI appoints Arjen van Rijn new Council Chair ...

EGI Conference 2019 issues call for abstracts ...

Middleware

Dutch reseller Chip ICT joins the Bright Partner Programme ...

Hardware

CoolIT Systems adds experienced Intellectual Property Counsel ...

Mellanox delivers record fourth quarter and annual 2018 Results and exceeded $1 billion in annual revenue in 2018 ...

Asetek delivers all-in-one liquid cooler to support high wattage processors for extreme workstations ...

Oak Ridge National Laboratory adds powerful AI appliances to computing portfolio ...

Los Alamos National Laboratory issues Request for Proposal (RFP) for new supercomputer ...

Applications

Can you imagine predicting crop yields with supercomputers and satellites? ...

Netherlands eScience Center contributes to new insights into the human body clock and health ...

Going deep: Lawrence Livermore lab employees get an introduction to world of machine learning and neural networks ...

Lawrence Livermore National Laboratory team explores electric grid modernization via HPC ...

How to escape a black hole: simulations provide new clues to what's driving powerful plasma jets ...

Berkeley Lab researcher wins machine-learning competition with code that sorts through simulated telescope data ...

Supercomputing helps study two-dimensional materials ...

The Cloud

Clowder awarded $5 million from National Science Foundation ...

WekaIO joins the ranks of prestigious machine learning and Cloud leaders to provide benchmark code for MLPerf ...

Helix Nebula Science Cloud demonstrated final results of pilot phase for hybrid Cloud model ...

Oak Ridge National Laboratory adds powerful AI appliances to computing portfolio

From left to right: Cole Freniere and Michael Reynolds of Microway, Alex Volkov of NVIDIA, and Chris Layton and Brian Zachary of ORNL pose with a newly arrived DGX-2. The NVIDIA appliances connect ORNL researchers with a platform that excels at machine learning, a type of artificial intelligent that could automate some of the time-intensive analysis inherent in scientific research.30 Jan 2019 Oak Ridge - As home to three top-ranked supercomputers of the last decade, the US Department of Energy's (DOE) Oak Ridge National Laboratory (ORNL) has become synonymous with scientific computing at the largest scales. Getting the most out of these science machines, however, requires a willingness to experiment with problems and systems of every size and scale. This is especially important as technology vendors introduce new system architectures and as scientists' problem-solving toolkit expands to include artificial intelligence (AI) and advanced data analysis.

In that spirit, ORNL recently installed two NVIDIA DGX-2 systems, powerful GPU-accelerated appliances that will provide ORNL researchers with enhanced opportunities to conduct science - machine learning and data-intensive workloads in particular. The appliances will also provide an onramp to ORNL's Summit - the world's most powerful supercomputer - by enabling smaller and more experimental projects to be developed and tested before running on the 200-petaflop machine. The DGX-2 appliances reside in the laboratory's Compute and Data Environment for Science (CADES), which offers compute and data services for ORNL researchers.

"As Summit enters production, these DGX-2 systems supply ORNL with exploratory multipurpose computing resources", stated CADES director Arjun Shankar. "Early results suggest the DGX-2s will provide novel opportunities in data analysis, machine learning, and modeling and simulation that support the AI-driven transformation that is changing how science is conducted."

The DGX-2 represents the latest step-change in AI appliances, housing 16 fully interconnected NVIDIA Tesla V100 GPUs with increased GPU memory, a powerful combination that expands the types of problems scientists can tackle in a unified environment. In addition to a standard DGX-2, ORNL received the newly available DGX-2H, which contains upgraded CPUs and faster-clocked GPUs that offer higher performance.

Since NVIDIA debuted the DGX line in 2016, ORNL has deployed the appliances throughout the laboratory to connect researchers with a platform that excels at executing machine learning techniques with the potential to automate some of the time-intensive analysis inherent in research. This is especially relevant to ORNL's world-class experimental facilities, such as the Spallation Neutron Source, which produce large, unique datasets in need of analysis and automated data workflows.

In late 2018, Arvind Ramanathan, a staff scientist in ORNL's computer science and engineering division, and his team became one of the first groups to get extended time on the DGX-2s. The team used the opportunity to train and optimize algorithms that belong to a class of machine learning called reinforcement learning, in which an "agent" attempts to master its environment by performing actions and evaluating the results without any preexisting knowledge.

Reinforcement-learning algorithms, famously showcased by Google's AlphaGo programme, have proven capable of achieving prescribed goals, such as winning games, but optimizing the preset parameters that control their decision-making can be difficult. Running multiple algorithms simultaneously on the DGX-2 systems allowed Ramanathan’s team to identify superior optimization strategies via an ORNL-developed software called HyperSpace in a fraction of the time it would have taken on another system.

"We couldn't have done this without a DGX-2 because the problem space that we were exploring was so large and sample inefficient", Arvind Ramanathan stated. "Because these GPUs can essentially be used in a unified way, we can do things that are much more difficult to do on other systems, especially in terms of moving data and doing analysis."

Though ORNL is known for conducting leadership-scale science on its massively parallel supercomputers, there are instances when an innovative smaller machine can be useful. Refining algorithms on the DGX-2 can improve researchers' confidence that their AI software is ready to be deployed at scale later on. Additionally, workloads that may be poorly suited to run on a supercomputer - jobs that don't scale or jobs that need to run for extended periods of time, for example - could be carried out on a DGX-2 appliance.

The DGX-2s also have something to offer traditional modelling and simulation. Researchers can run simulations side-by-side with AI to extend simulations further than they would otherwise go, using AI-recognized patterns in the data to "steer" the simulation correctly. A project supported by ORNL's Laboratory Directed Research and Development programme is dedicated to a molecular dynamics framework called Molecules that can execute AI-informed simulation.

"Traditionally, running AI side-by-side with simulation would be too expensive", Arvind Ramanathan stated, "but state-of-the-art systems like Summit and the DGX-2 enable this in such a way that we can think of this arrangement as a fused workflow in some sense."

Currently, CADES staff are working to integrate the appliances into the data centre's shared environment so researchers can submit jobs as easily as any other CADES resource. The two DGX-2 systems have been connected by a dedicated EDR InfiniBand network to combine the systems' capabilities.

"The idea is that researchers will be able to schedule up to 32 GPUs at one time to run in parallel", stated CADES team lead Brian Zachary.

HyperSpace software development is part of the CANcer Distributed Learning Environment project, a cancer research effort supported by the Exascale Computing Project.

Visualization of the protein folding process for the Fs-peptide. A movie-like visualization of the protein folding process for the Fs-peptide (21 amino-acid chain) and how it may fold into its functional 3D structure, guided by deep learning. Data produced by simulation and interpreted by a deep learning algorithm on the DGX-2 system helped accurately extend this simulation further in time than would otherwise be possible. Movie credit: Arvind Ramanathan, Heng Ma, and Debsindhu Bhowmik.
Source: Oak Ridge National Laboratory - ORNL

Back to Table of contents

Primeur weekly 2019-02-04

Special

A number of European countries joining forces to bid for hosting Europe's fastest supercomputer ...

Per Öster sees free cooling as asset in CSC's hosting entity bid preparing for pre-exascale ...

So you want to host a 250 million euro European pre-exascale supercomputer with EuroHPC funding? ...

Focus

2018 - Another year on the Road to Exascale - Part I ...

Quantum computing

IBM and University of Chicago collaborate to advance quantum computing ...

Quantum Xchange named a finalist for 2019 SXSW Interactive Innovation Awards ...

ColdQuanta sponsors research at University of Wisconsin to accelerate neutral atom quantum computer commercialization ...

Focus on Europe

CESGAHACK4 returns in March to accelerate the execution time of scientists' applications ...

EGI appoints Arjen van Rijn new Council Chair ...

EGI Conference 2019 issues call for abstracts ...

Middleware

Dutch reseller Chip ICT joins the Bright Partner Programme ...

Hardware

CoolIT Systems adds experienced Intellectual Property Counsel ...

Mellanox delivers record fourth quarter and annual 2018 Results and exceeded $1 billion in annual revenue in 2018 ...

Asetek delivers all-in-one liquid cooler to support high wattage processors for extreme workstations ...

Oak Ridge National Laboratory adds powerful AI appliances to computing portfolio ...

Los Alamos National Laboratory issues Request for Proposal (RFP) for new supercomputer ...

Applications

Can you imagine predicting crop yields with supercomputers and satellites? ...

Netherlands eScience Center contributes to new insights into the human body clock and health ...

Going deep: Lawrence Livermore lab employees get an introduction to world of machine learning and neural networks ...

Lawrence Livermore National Laboratory team explores electric grid modernization via HPC ...

How to escape a black hole: simulations provide new clues to what's driving powerful plasma jets ...

Berkeley Lab researcher wins machine-learning competition with code that sorts through simulated telescope data ...

Supercomputing helps study two-dimensional materials ...

The Cloud

Clowder awarded $5 million from National Science Foundation ...

WekaIO joins the ranks of prestigious machine learning and Cloud leaders to provide benchmark code for MLPerf ...

Helix Nebula Science Cloud demonstrated final results of pilot phase for hybrid Cloud model ...