Back to Table of contents

Primeur weekly 2018-03-12

Exascale supercomputing

Researchers find algorithm for large-scale brain simulations ...

Quantum computing

Google to present preview of Bristlecone, a new quantum processor ...

Seeing is believing - precision atom qubits achieve major quantum computing milestone ...

Scaling silicon quantum photonics technology ...

Focus on Europe

INFOCOMP 2018 to issue Call for Papers ...

e-Infrastructures for excellent science in Southeast Europe and Eastern Mediterranean to be held in Sofia on 15-16 May, 2018 ...

HSE University opens joint laboratory with Samsung Research ...

ISC 2018 keynote spotlights computing challenges of Large Hadron Collider ...

Middleware

Bright Computing announces support for OpenHPC project ...

Hardware

NEC supplies latest-generation supercomputer to Technische Universität Kaiserslautern ...

Asperitas and EcoRacks receive a grant for first super energy efficient Immersed Computing data centre project ...

Mellanox surpasses one million 100Gb/s ports with LinkX optical transceivers and cables ...

Mellanox introduces next generation Ethernet network operating system - Mellanox Onyx ...

Upgrade to UK environmental science supercomputer will make it twice as capable ...

Supermicro opens path to 100G networking with new 25G Ethernet server and storage solutions ...

Applications

University of Toronto's expert to explain heroic calculation using new supercomputer to shed light on how oceans behave ...

SDSC welcomes Halicioglu Data Science Institute staff ...

SDSC simulations reveal how a heart drug molecular switch is turned on and off ...

Mayo Clinic's clinical trial matching project sees higher enrollment in breast cancer trials through use of artificial intelligence ...

TACC and Lamont Observatory of Columbia University host one of the largest Earth sciences data collections in the country ...

Simulation and experiment help TU Dresden researchers study next-generation semiconductors ...

The final frontier's final frontier ...

Scientists accurately model the action of aerosols on clouds ...

Fast, high capacity fiber transmission gets real for data centers ...

Teaching computers to guide science: Machine learning method sees forests and trees ...

DEDALE: mathematical tools to help navigate the Big Data maze ...

The Cloud

SDSC's Health CI Division now meets NIST CUI compliance requirements ...

Teaching computers to guide science: Machine learning method sees forests and trees

James (Ben) Brown of Berkeley Lab has come up with a novel machine learning method that enables scientists to derive insights from systems of previously intractable complexity in record time. Credit: Berkeley Lab.6 Mar 2018 Berkeley - While it may be the era of supercomputers and 'Big Data', without smart methods to mine all that data, it's only so much digital detritus. Now researchers at the Department of Energy's Lawrence Berkeley National Laboratory and UC Berkeley have come up with a novel machine learning method that enables scientists to derive insights from systems of previously intractable complexity in record time.

In a paper published recently in theProceedings of the National Academy of Sciences (PNAS), the researchers describe a technique called "iterative Random Forests", which they say could have a transformative effect on any area of science or engineering with complex systems, including biology, precision medicine, materials science, environmental science, and manufacturing, to name a few.

"Take a human cell, for example. There are 10170possible molecular interactions in a single cell. That creates considerable computing challenges in searching for relationships", stated Ben Brown, head of Berkeley Lab's Molecular Ecosystems Biology Department. "Our method enables the identification of interactions of high order at the same computational cost as main effects - even when those interactions are local with weak marginal effects."

Ben Brown and Bin Yu of UC Berkeley are lead senior authors of "Iterative Random Forests to Discover Predictive and Stable High-Order Interactions". The co-first authors are Sumanta Basu - formerly a joint postdoc of Ben Brown and Bin Yu and now an assistant professor at Cornell University and Karl Kumbier - a Ph.D. student of Bin Yu in the UC Berkeley Statistics Department. The paper is the culmination of three years of work that the authors believe will transform the way science is done. "With our method we can gain radically richer information than we've ever been able to gain from a learning machine", Ben Brown stated.

The needs of machine learning in science are different from that of industry, where machine learning has been used for things like playing chess, making self-driving cars, and predicting the stock market.

"The machine learning developed by industry is great if you want to do high-frequency trading on the stock market", Ben Brown stated. "You don't care why you're able to predict the stock will go up or down. You just want to know that you can make the predictions."

But in science, questions surrounding why a process behaves in certain ways are critical. Understanding "why" allows scientists to model or even engineer processes to improve or attain a desired outcome. As a result, machine learning for science needs to peer inside the black box and understand why and how computers reached the conclusions they reached. A long-term goal is to use this kind of information to model or engineer systems to obtain desired outcomes.

In highly complex systems - whether it's a single cell, the human body, or even an entire ecosystem - there are a large number of variables interacting in nonlinear ways. That makes it difficult if not impossible to build a model that can determine cause and effect. "Unfortunately, in biology, you come across interactions of order 30, 40, 60 all the time", Ben Brown stated. "It's completely intractable with traditional approaches to statistical learning."

The method developed by the team led by Ben Brown and Bin Yu, iterative Random Forests (iRF), builds on an algorithm called random forests, a popular and effective predictive modelling tool, translating the internal states of the black box learner into a human-interpretable form. Their approach allows researchers to search for complex interactions by decoupling the order, or size, of interactions from the computational cost of identification.

"There is no difference in the computational cost of detecting an interaction of order 30 versus an interaction of order two", Ben Brown stated. "And that's a sea change."

In the PNAS paper, the scientists demonstrated their method on two genomics problems, the role of gene enhancers in the fruit fly embryo and alternative splicing in a human-derived cell line. In both cases, using iRF confirmed previous findings while also uncovering previously unidentified higher-order interactions for follow-up study.

Ben Brown said they're now using their method for designing phased array laser systems and optimizing sustainable agriculture systems.

"We believe this is a different paradigm for doing science", stated Bin Yu, a professor in the departments of Statistics and Electrical Engineering & Computer Science at UC Berkeley. "We do prediction, but we introduce stability on top of prediction in iRF to more reliably learn the underlying structure in the predictors."

"This enables us to learn how to engineer systems for goal-oriented optimization and more accurately targeted simulations and follow-up experiments", Ben Brown added.

In aPNAScommentary on the technique, Danielle Denisko and Michael Hoffman of the University of Toronto wrote: "iRF holds much promise as a new and effective way of detecting interactions in a variety of settings, and its use will help us ensure no branch or leaf is ever left unturned."

The research was supported by grants from DOE's Small Business Technology Transfer (STTR) programme, the Laboratory Directed Research and Development (LDRD) programme, the National Human Genome Research Institute, the Army Research Office, the Office of Naval Research, and the National Science Foundation.

Source: DOE/Lawrence Berkeley National Laboratory

Back to Table of contents

Primeur weekly 2018-03-12

Exascale supercomputing

Researchers find algorithm for large-scale brain simulations ...

Quantum computing

Google to present preview of Bristlecone, a new quantum processor ...

Seeing is believing - precision atom qubits achieve major quantum computing milestone ...

Scaling silicon quantum photonics technology ...

Focus on Europe

INFOCOMP 2018 to issue Call for Papers ...

e-Infrastructures for excellent science in Southeast Europe and Eastern Mediterranean to be held in Sofia on 15-16 May, 2018 ...

HSE University opens joint laboratory with Samsung Research ...

ISC 2018 keynote spotlights computing challenges of Large Hadron Collider ...

Middleware

Bright Computing announces support for OpenHPC project ...

Hardware

NEC supplies latest-generation supercomputer to Technische Universität Kaiserslautern ...

Asperitas and EcoRacks receive a grant for first super energy efficient Immersed Computing data centre project ...

Mellanox surpasses one million 100Gb/s ports with LinkX optical transceivers and cables ...

Mellanox introduces next generation Ethernet network operating system - Mellanox Onyx ...

Upgrade to UK environmental science supercomputer will make it twice as capable ...

Supermicro opens path to 100G networking with new 25G Ethernet server and storage solutions ...

Applications

University of Toronto's expert to explain heroic calculation using new supercomputer to shed light on how oceans behave ...

SDSC welcomes Halicioglu Data Science Institute staff ...

SDSC simulations reveal how a heart drug molecular switch is turned on and off ...

Mayo Clinic's clinical trial matching project sees higher enrollment in breast cancer trials through use of artificial intelligence ...

TACC and Lamont Observatory of Columbia University host one of the largest Earth sciences data collections in the country ...

Simulation and experiment help TU Dresden researchers study next-generation semiconductors ...

The final frontier's final frontier ...

Scientists accurately model the action of aerosols on clouds ...

Fast, high capacity fiber transmission gets real for data centers ...

Teaching computers to guide science: Machine learning method sees forests and trees ...

DEDALE: mathematical tools to help navigate the Big Data maze ...

The Cloud

SDSC's Health CI Division now meets NIST CUI compliance requirements ...