Back to Table of contents

Primeur weekly 2018-10-08

Special

Where did the first 500 million euro invested by the European Horizon 2020 programme go? ...

Focus

World's first ARM-based supercomputer Isambard is ready for science ...

Exascale supercomputing

New European project ESCAPE-2 on exascale computing for numerical weather prediction gets under way ...

Berkeley Lab, Oak Ridge, and NVIDIA team breaks exaop barrier with deep learning application ...

Coming soon to exascale computing: Software for chemistry of catalysis ...

Quantum computing

ORNL researchers advance quantum computing, science through six DOE awards ...

Berkeley Lab to build an advanced quantum computing testbed ...

Berkeley Lab to push quantum information frontiers with new programmes in computing, physics, materials, and chemistry ...

Berkeley Quantum to accelerate innovation in quantum information science ...

Quantum software company Zapata Computing adds Clark Golestani to Board ...

Defects promise quantum communication through standard optical fiber ...

Focus on Europe

Atos and the University of Reims launch ROMEO, one of the most powerful supercomputers in the world, under the sponsorship of Cedric Villani ...

Special Edition of Open e-IRG Workshop under the Austrian EU Presidency will focus on relationship between Open Science, FAIR data and EOSC ...

Goethe University to develop green supercomputer for science ...

Calling on HPC experts and enthusiasts to propose tutorials and workshops for ISC 2019 ...

ISC 2019 calls for research paper submission by December 12, 2018 ...

Middleware

USC ISI to pilot Cyberinfrastructure Center of Excellence for National Science Foundation ...

Hardware

Tintri co-founder Mark Gritter joins Tintri by DDN as CTO to lead analytics and server virtualization vision ...

DDN simplifies the AI data centre with NVIDIA ...

New research could lead to more energy-efficient computing ...

Applications

New simulation sheds light on spiraling supermassive black holes ...

DNA unzipped, turned around, and rezipped ...

Dark Energy Survey releases first year value-added data products ...

A quantum leap toward expanding the search for dark matter ...

HP-CONCORD paves the way for scalable machine learning in HPC ...

In disaster's wake, novel computing techniques support emergency responders ...

Transition metal dichalcogenides could increase computer speed, memory by a million times ...

A new brain-inspired architecture could improve how computers handle data and advance AI ...

Rochester Institute of Technology leads multi-university collaboration to simulate neutron star mergers ...

The Cloud

Oracle rolls out Autonomous NoSQL Database service ...

Quanta Cloud Technology showcases AI portfolio options at GTC Europe ...

ZeroStack delivers GPU-as-a-Service via NVIDIA hardware ...

HP-CONCORD paves the way for scalable machine learning in HPC

A human brain parcellation derived from HP-CONCORD's results using a graph clustering algorithm applied to a set of data from the Human Connectome Project.1 Oct 2018 Berkeley - Some of the most challenging problems in data-driven science involve understanding the interactions between thousands or even millions of variables: how a disease may be caused by a subset of the 20 thousands of human genes, or agricultural production improved by a combination of microbial species among millions in the environment. The problem is to discover the most significant relationships between all of these variables - genes that actively work together, while separating the accidental relationships - genes that occasionally appear together - or confounding effects - two genes that only interact through a common third gene.

A powerful machine learning algorithm called CONCORD can identify these relationships, but until recently could only be applied to modest-sized data sets. Researchers from Lawrence Berkeley National Laboratory and their collaborators have changed that, unleashing the full power of the Department of Energy's supercomputers on these problems through a high-performance computing version of the algorithm called HP-CONCORD. Using supercomputers at Berkeley Lab's National Energy Research Scientific Computing Center (NERSC), they demonstrated this parallel algorithm on an enormous set of data from the Human Connectome Project, which computed estimates for about 4 billion parameters, and an even larger demonstration problem with over 800 billion parameters. A paper introducing HP-CONCORD was presented at the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) conference in April 2018.

CONCORD was developed by Sang-Yun Oh, assistant professor in the Department of Statistics and Applied Probability at the University of California, Santa Barbara, as part of his dissertation work at Stanford. Sang-Yun Oh was a postdoc at Berkeley Lab when then-graduate student researcher, Penporn Koanantakool, began work on HP-CONCORD as part of her own dissertation at UC Berkeley. CONCORD is an example of a graphical model estimator, a class of machine learning methods that are easier to explain and interpret than some of the competing methods that act more like black boxes. In order to use very large data sets, Penporn Koanantakool brought in her perspective on how to make parallel algorithms run across thousands of computational nodes by reducing the amount of communication.

"The most expensive thing you do on any computer is move data around, so you want to minimize data movement between a processor and its own memory and between multiple processors on a parallel machine", stated Kathy Yelick, Associate Lab Director for Computing Sciences at Berkeley Lab and Penporn Koanantakool's thesis advisor. "Reducing data movement tends to save both time and energy."

Penporn Koanantakool, who now works at Google Brain, developed HP-CONCORD and the underlying communication-avoiding algorithms for parallelizing some of the most challenging "all-to-all" style computations.

"When computing the forces between all pairs of particles, or multiplying two matrices, there is a pattern of taking all combinations of things, which involves a lot of communication on a parallel machine", she explained. Within HP-CONCORD she looks at the problem of multiplying a huge sparse matrix - mostly zeros - with a smaller dense one, which has the added complexity of dividing the nonzeros and computational work evenly across the processors. Her work, which includes extensive experiments on NERSC supercomputers, demonstrates that with HP-CONCORD, the communication is minimal. Her algorithm proved to be more 100 times faster than the standard approach when running on 1,536 cores.

In their 2018 AISTATS paper, the HP-CONCORD team used functional magnetic resonance imaging (fMRI) data to estimate the underlying conditional dependency structure of the brain and then use the resulting estimate to automatically identify functional regions of the brain.

"We constructed a huge brain functional connectivity graph with HP-CONCORD. Then, using this graph, we can draw a map of functional regions in the brain, which is something neuroscientists care about", Aydin Buluc, a scientist at Berkeley Lab and a co-author on the paper. "The fMRI data we used was not big enough to push HP-CONCORD's limits; however, the datasets will only get bigger."

Many other science areas will benefit from HP-CONCORD, he emphasized, such as trying to figure out if a trait of a plant is correlated with factors like soil composition, the amount of sunlight it absorbs and its genetic makeup - "all different kinds of objects and variables", stated Aydin Buluc.

In statistical terms, HP-CONCORD estimates the most significant parameters in the inverse covariance matrix. Capturing these parameters results in a sparse estimate, which is shown to have good statistical properties when the number of data points is small relative to the number of features, as is most likely the case in many high dimensional datasets.

"Inverse covariance estimates have many practical uses, including reconstructing gene regulatory networks in biology, capturing volatility structure in finance, estimating temperature-to-environmental-proxy relationship in environmental sciences. HP-CONCORD solutions can be used for hypothesis generation in exploratory data analysis to guide further experimental study", stated Sang-Yun Oh. "Also, HP-CONCORD estimates can be used as plug-in estimates when relative magnitudes of associations are needed for some downstream analysis."

Other co-authors of the paper include Alnur Ali at Carnegie Mellon University and Ariful Azad, Dmitriy Morozov and Leonid Oliker from the Computational Research Division at Berkeley Lab.

HP-CONCORD and the underlying sparse-dense matrix routines are publicly available on Bitbucket . These are also provided as a ready-to-use software module on NERSC systems.

Source: National Energy Research Scientific Computing Center - NERSC

Back to Table of contents

Primeur weekly 2018-10-08

Special

Where did the first 500 million euro invested by the European Horizon 2020 programme go? ...

Focus

World's first ARM-based supercomputer Isambard is ready for science ...

Exascale supercomputing

New European project ESCAPE-2 on exascale computing for numerical weather prediction gets under way ...

Berkeley Lab, Oak Ridge, and NVIDIA team breaks exaop barrier with deep learning application ...

Coming soon to exascale computing: Software for chemistry of catalysis ...

Quantum computing

ORNL researchers advance quantum computing, science through six DOE awards ...

Berkeley Lab to build an advanced quantum computing testbed ...

Berkeley Lab to push quantum information frontiers with new programmes in computing, physics, materials, and chemistry ...

Berkeley Quantum to accelerate innovation in quantum information science ...

Quantum software company Zapata Computing adds Clark Golestani to Board ...

Defects promise quantum communication through standard optical fiber ...

Focus on Europe

Atos and the University of Reims launch ROMEO, one of the most powerful supercomputers in the world, under the sponsorship of Cedric Villani ...

Special Edition of Open e-IRG Workshop under the Austrian EU Presidency will focus on relationship between Open Science, FAIR data and EOSC ...

Goethe University to develop green supercomputer for science ...

Calling on HPC experts and enthusiasts to propose tutorials and workshops for ISC 2019 ...

ISC 2019 calls for research paper submission by December 12, 2018 ...

Middleware

USC ISI to pilot Cyberinfrastructure Center of Excellence for National Science Foundation ...

Hardware

Tintri co-founder Mark Gritter joins Tintri by DDN as CTO to lead analytics and server virtualization vision ...

DDN simplifies the AI data centre with NVIDIA ...

New research could lead to more energy-efficient computing ...

Applications

New simulation sheds light on spiraling supermassive black holes ...

DNA unzipped, turned around, and rezipped ...

Dark Energy Survey releases first year value-added data products ...

A quantum leap toward expanding the search for dark matter ...

HP-CONCORD paves the way for scalable machine learning in HPC ...

In disaster's wake, novel computing techniques support emergency responders ...

Transition metal dichalcogenides could increase computer speed, memory by a million times ...

A new brain-inspired architecture could improve how computers handle data and advance AI ...

Rochester Institute of Technology leads multi-university collaboration to simulate neutron star mergers ...

The Cloud

Oracle rolls out Autonomous NoSQL Database service ...

Quanta Cloud Technology showcases AI portfolio options at GTC Europe ...

ZeroStack delivers GPU-as-a-Service via NVIDIA hardware ...