Back to Table of contents

Primeur weekly 2016-11-14

Exascale supercomputing

Berkeley Lab to lead AMR co-design centre for DOE's Exascale Computing Project ...

Exascale Computing Project announces $48 million to establish Exascale co-design centres ...

US Exascale Computing Project awards $34 million for software development ...

Quantum computing

Breakthrough in the quantum transfer of information between matter and light ...

Focus on Europe

European Commission reveals its forthcoming call for energy efficient, high performance processors ...

World-leading HPC centres partner to form accelerated computing institute ...

Atos Bull to boost Dutch research at SURFsara with first Bull sequana supercomputer installed ...

Middleware

Allinea tools yield a 50% speed up for genome applications at the Earlham Institute ...

NERSC's 'Shifter' scales up to more than 9,000 Cori KNL processors ...

DDN's Big Data storage provides Aalto University ample capacity and fast access to vital research data ...

DDN unveils industry's fastest multi-level security Lustre solution ...

DDN delivers new burst buffer appliance and updates block and file appliances, completing total product line refresh ...

Atos Bull tackles storage bottlenecks for High Performance Computing ...

Cycle Computing debuts the newest version of its groundbreaking CycleCloud ...

Hardware

University of Toronto selects CoolIT Systems to liquid cool signal processor for CHIME project ...

CoolIT Systems optimizes Trade and Match solution with custom closed-loop liquid cooling ...

SDSC to host high-speed, large data transfer experiment at SC16 Show ...

Cray XC40 "Theta" supercomputer accepted at Argonne National Laboratory ...

Cray launches next-generation supercomputer: the Cray XC50 ...

Cray reports third quarter 2016 financial results ...

Mellanox drives Virtual Reality to new levels with breakthrough performance ...

Mellanox announces 200Gb/s HDR InfiniBand solutions enabling record levels of performance and scalability ...

Computers made of genetic material? ...

CoolIT Systems to showcase best-in-class HPC liquid cooling offering at SC16 ...

Applications

BoschDoc, AHCODA-DB and OpenML winners of the Data Prize 2016 ...

Blue Waters simulates largest membrane channel made of DNA origami ...

Cray joins iEnergy, the oil and gas industry's foremost community for exploration and production ...

Large-scale computer simulations reveal biological growth processes ...

NASA science and technology advancements demonstrated at Supercomputing Conference ...

Unlocking big genetic datasets ...

Accelerating cancer research with deep learning ...

System opens up high-performance programming to non-experts ...

Studying structure to understand function within 'material families' ...

Chury is much younger than previously thought ...

TOP500

Global supercomputing capacity creeps up as Petascale systems blanket Top 100 ...

InfiniBand chosen by nearly 4x more end-users versus proprietary offerings in 2016 as shown on the TOP500 supercomputers list ...

The Cloud

SURFnet selects eight Cloud providers for Dutch education and research ...

System opens up high-performance programming to non-experts

7 Nov 2016 Cambridge - Dynamic programming is a technique that can yield relatively efficient solutions to computational problems in economics, genomic analysis, and other fields. But adapting it to computer chips with multiple "cores", or processing units, requires a level of programming expertise that few economists and biologists have.

Researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and Stony Brook University aim to change that, with a new system that allows users to describe what they want their programmes to do in very general terms. It then automatically produces versions of those programmes that are optimized to run on multi-core chips. It also guarantees that the new versions will yield exactly the same results that the single-core versions would, albeit much faster.

In experiments, the researchers used the system to "parallelize" several algorithms that used dynamic programming, splitting them up so that they would run on multi-core chips. The resulting programmes were between three and 11 times as fast as those produced by earlier techniques for automatic parallelization, and they were generally as efficient as those that were hand-parallelized by computer scientists.

The researchers presented their new system at the Association for Computing Machinery's conference on Systems, Programming, Languages and Applications: Software for Humanity.

Dynamic programming offers exponential speedups on a certain class of problems because it stores and reuses the results of computations, rather than recomputing them every time they're required.

"But you need more memory, because you store the results of intermediate computations", stated Shachar Itzhaky, first author on the new paper and a postdoc in the group of Armando Solar-Lezama, an associate professor of electrical engineering and computer science at MIT. "When you come to implement it, you realize that you don't get as much speedup as you thought you would, because the memory is slow. When you store and fetch, of course, it's still faster than redoing the computation, but it's not as fast as it could have been."

Computer scientists avoid this problem by reordering computations so that those requiring a particular stored value are executed in sequence, minimizing the number of times that the value has to be recalled from memory. That's relatively easy to do with a single-core computer, but with multi-core computers, when multiple cores are sharing data stored at multiple locations, memory management become much more complex. A hand-optimized, parallel version of a dynamic-programming algorithm is typically 10 times as long as the single-core version, and the individual lines of code are more complex, to boot.

The CSAIL researchers' new system - dubbed Bellmania, after Richard Bellman, the applied mathematician who pioneered dynamic programming - adopts a parallelization strategy called recursive divide-and-conquer. Suppose that the task of a parallel algorithm is to perform a sequence of computations on a grid of numbers, known as a matrix. Its first task might be to divide the grid into four parts, each to be processed separately.

But then it might divide each of those four parts into four parts, and each of those into another four parts, and so on. Because this approach - recursion - involves breaking a problem into smaller subproblems, it naturally lends itself to parallelization.

Joining Shachar Itzhaky on the new paper are Armando Solar-Lezama; Charles Leiserson, the Edwin Sibley Webster Professor of Electrical Engineering and Computer Science; Rohit Singh and Kuat Yessenov, who were MIT both graduate students in electrical engineering and computer science when the work was done; Yongquan Lu, an MIT undergraduate who participated in the project through MIT's Undergraduate Research Opportunities Programme; and Rezaul Chowdhury, an assistant professor of computer science at Stony Brook, who was formerly a research affiliate in Charles Leiserson's group.

Charles Leiserson's group specializes in divide-and-conquer parallelization techniques; Armando Solar-Lezama's specializes in programme synthesis, or automatically generating code from high-level specifications. With Bellmania, the user simply has to describe the first step of the process - the division of the matrix and the procedures to be applied to the resulting segments. Bellmania then determines how to continue subdividing the problem so as to use memory efficiently.

At each level of recursion - with each successively smaller subdivision of the matrix - a programme generated by Bellmania will typically perform some operation on some segment of the matrix and farm the rest out to subroutines, which can be performed in parallel. Each of those subroutines, in turn, will perform some operation on some segment of the data and farm the rest out to further subroutines, and so on.

Bellmania determines how much data should be processed at each level and which subroutines should handle the rest. "The goal is to arrange the memory accesses such that when you read a cell of the matrix, you do as much computation as you can with it, so that you will not have to read it again later", Shachar Itzhaky stated.

Finding the optimal division of tasks requires canvassing a wide range of possibilities. Armando Solar-Lezama's group has developed a suite of tools to make that type of search more efficient; even so, Bellmania takes about 15 minutes to parallelize a typical dynamic-programming algorithm. That's still much faster than a human programmer could perform the same task, however. And the result is guaranteed to be correct; hand-optimized code is so complex that it's easy for errors to creep in.

Source: Massachusetts Institute of Technology

Back to Table of contents

Primeur weekly 2016-11-14

Exascale supercomputing

Berkeley Lab to lead AMR co-design centre for DOE's Exascale Computing Project ...

Exascale Computing Project announces $48 million to establish Exascale co-design centres ...

US Exascale Computing Project awards $34 million for software development ...

Quantum computing

Breakthrough in the quantum transfer of information between matter and light ...

Focus on Europe

European Commission reveals its forthcoming call for energy efficient, high performance processors ...

World-leading HPC centres partner to form accelerated computing institute ...

Atos Bull to boost Dutch research at SURFsara with first Bull sequana supercomputer installed ...

Middleware

Allinea tools yield a 50% speed up for genome applications at the Earlham Institute ...

NERSC's 'Shifter' scales up to more than 9,000 Cori KNL processors ...

DDN's Big Data storage provides Aalto University ample capacity and fast access to vital research data ...

DDN unveils industry's fastest multi-level security Lustre solution ...

DDN delivers new burst buffer appliance and updates block and file appliances, completing total product line refresh ...

Atos Bull tackles storage bottlenecks for High Performance Computing ...

Cycle Computing debuts the newest version of its groundbreaking CycleCloud ...

Hardware

University of Toronto selects CoolIT Systems to liquid cool signal processor for CHIME project ...

CoolIT Systems optimizes Trade and Match solution with custom closed-loop liquid cooling ...

SDSC to host high-speed, large data transfer experiment at SC16 Show ...

Cray XC40 "Theta" supercomputer accepted at Argonne National Laboratory ...

Cray launches next-generation supercomputer: the Cray XC50 ...

Cray reports third quarter 2016 financial results ...

Mellanox drives Virtual Reality to new levels with breakthrough performance ...

Mellanox announces 200Gb/s HDR InfiniBand solutions enabling record levels of performance and scalability ...

Computers made of genetic material? ...

CoolIT Systems to showcase best-in-class HPC liquid cooling offering at SC16 ...

Applications

BoschDoc, AHCODA-DB and OpenML winners of the Data Prize 2016 ...

Blue Waters simulates largest membrane channel made of DNA origami ...

Cray joins iEnergy, the oil and gas industry's foremost community for exploration and production ...

Large-scale computer simulations reveal biological growth processes ...

NASA science and technology advancements demonstrated at Supercomputing Conference ...

Unlocking big genetic datasets ...

Accelerating cancer research with deep learning ...

System opens up high-performance programming to non-experts ...

Studying structure to understand function within 'material families' ...

Chury is much younger than previously thought ...

TOP500

Global supercomputing capacity creeps up as Petascale systems blanket Top 100 ...

InfiniBand chosen by nearly 4x more end-users versus proprietary offerings in 2016 as shown on the TOP500 supercomputers list ...

The Cloud

SURFnet selects eight Cloud providers for Dutch education and research ...