Back to Table of contents

Primeur weekly 2017-02-06

Focus

Photon and Neutron Community ready to act as a go-between for the e-Infrastructures and user communities ...

Bridging socio-cultural distance in science through technical community-engaging mechanisms ...

Exascale supercomputing

How to improve data management in the supercomputers of the future ...

Crowd computing

Your computer can help scientists search for new childhood cancer treatments ...

Quantum computing

Quantum phase transition observed for the first time ...

Quantum matter: Shaken, but not stirred ...

First ever blueprint unveiled to construct a large scale quantum computer ...

Focus on Europe

PRACE opens Tier-1 for Tier-0 service ...

Middleware

New version of Univa Unisight 4.1 provides comprehensive tool to support IT purchasing decisions ...

Czech TV speeds broadcast and production delivery with DDN's fully integrated MEDIAScaler platform ...

Optimized compiler yields more-efficient parallel programmes ...

Hardware

Three magnetic states for each hole: researchers investigate the potential of metal grids for electronic components ...

Making the switch to polarization diversity ...

SDSC's 'Comet' supercomputer surpasses '10,000 users' milestone ...

New Cheyenne supercomputer triples scientific capability with greater efficiency ...

GBP 3.2 million for Midlands-based high performance computing centre ...

Applications

Machine learning accurately predicts metallic defects ...

Jupiter Medical Center implements revolutionary Watson for Oncology to help oncologists make data-driven cancer treatment decisions ...

University of Delaware's Anderson Janotti receives NSF Career Award to model defects in complex materials ...

Supercomputing and experiment combine for first look at magnetism of real nanoparticle ...

Researchers flip script for Li-Ion electrolytes to simulate better batteries ...

Huawei and SURFsara join forces for ICT innovation in Smart Healthcare and Smart Energy ...

The shape of melting in two dimensions: University of Michigan team uses Titan to explore fundamental phase transitions ...

Nature Geoscience highlights CALIOPE's ability to "provide decision makers with the information they need to take preventive action" on air quality ...

Magnetic recording with light and no heat on garnet ...

Breaking the jargon barrier ...

Carnegie Mellon Artificial Intelligence beats top poker pros ...

Preventing blood clots with a new metric for heart function: Simulations on Stampede supercomputer reveal better way of predicting future clots in the left ventricle ...

Berkeley Lab resources used to model superluminous supernova in 2D for first time ...

The Cloud

Utilities regulators see value in the Cloud and Cloud technology investments as critical to utilities' success ...

Optimized compiler yields more-efficient parallel programmes

31 Jan 2017 Cambridge - Compilers are programmes that convert computer code written in high-level languages intelligible to humans into low-level instructions executable by machines. But there's more than one way to implement a given computation, and modern compilers extensively analyze the code they process, trying to deduce the implementations that will maximize the efficiency of the resulting software.

Code explicitly written to take advantage of parallel computing, however, usually loses the benefit of compilers' optimization strategies. That's because managing parallel execution requires a lot of extra code, and existing compilers add it before the optimizations occur. The optimizers aren't sure how to interpret the new code, so they don't try to improve its performance.

At the Association for Computing Machinery's Symposium on Principles and Practice of Parallel Programming, researchers from MIT's Computer Science and Artificial Intelligence Laboratory presented a new variation on a popular open-source compiler that optimizes before adding the code necessary for parallel execution.

As a consequence, said Charles E. Leiserson, the Edwin Sibley Webster Professor in Electrical Engineering and Computer Science at MIT and a coauthor on the new paper, the compiler "now optimizes parallel code better than any commercial or open-source compiler, and it also compiles where some of these other compilers don't".

That improvement comes purely from optimization strategies that were already part of the compiler the researchers modified, which was designed to compile conventional, serial programmes. The researchers' approach should also make it much more straightforward to add optimizations specifically tailored to parallel programmes. And that will be crucial as computer chips add more and more "cores", or parallel processing units, in the years ahead.

The idea of optimizing before adding the extra code required by parallel processing has been around for decades. But "compiler developers were skeptical that this could be done", Charles E. Leiserson said.

"Everybody said it was going to be too hard, that you'd have to change the whole compiler. And these guys", he stated, referring to Tao B. Schardl, a postdoc in Charles E. Leiserson's group, and William S. Moses, an undergraduate double major in electrical engineering and computer science and physics, "basically showed that conventional wisdom to be flat-out wrong. The big surprise was that this didn't require rewriting the 80-plus compiler passes that do either analysis or optimization. T.B. and Billy did it by modifying 6,000 lines of a 4-million-line code base."

Tao B. Schardl, who earned his PhD in electrical engineering and computer science (EECS) from MIT, with Charles E. Leiserson as his advisor, before rejoining Charles E. Leiserson's group as a postdoc, and William S. Moses, who will graduate next spring after only three years, with a master's in EECS to boot, share authorship on the paper with Charles E. Leiserson.

A typical compiler has three components: the front end, which is tailored to a specific programming language; the back end, which is tailored to a specific chip design; and what computer scientists oxymoronically call the middle end, which uses an "intermediate representation", compatible with many different front and back ends, to describe computations. In a standard, serial compiler, optimization happens in the middle end.

The researchers' chief innovation is an intermediate representation that employs a so-called fork-join model of parallelism: At various points, a programme may fork, or branch out into operations that can be performed in parallel; later, the branches join back together, and the program executes serially until the next fork.

In the current version of the compiler, the front end is tailored to a fork-join language called Cilk, pronounced "silk" but spelled with a C because it extends the C programming language. Cilk was a particularly congenial choice because it was developed by Charles E. Leiserson's group - although its commercial implementation is now owned and maintained by Intel. But the researchers might just as well have built a front end tailored to the popular OpenMP or any other fork-join language.

Cilk adds just two commands to C: "spawn", which initiates a fork, and "sync", which initiates a join. That makes things easy for programmers writing in Cilk but a lot harder for Cilk's developers.

With Cilk, as with other fork-join languages, the responsibility of dividing computations among cores falls to a management programme called a runtime. A programme written in Cilk, however, must explicitly tell the runtime when to check on the progress of computations and rebalance cores' assignments. To spare programmers from having to track all those runtime invocations themselves, Cilk, like other fork-join languages, leaves them to the compiler.

All previous compilers for fork-join languages are adaptations of serial compilers and add the runtime invocations in the front end, before translating a programme into an intermediate representation, and thus before optimization. In their paper, the researchers give an example of what that entails. Seven concise lines of Cilk code, which compute a specified term in the Fibonacci series, require the compiler to add another 17 lines of runtime invocations. The middle end, designed for serial code, has no idea what to make of those extra 17 lines and throws up its hands.

The only alternative to adding the runtime invocations in the front end, however, seemed to be rewriting all the middle-end optimization algorithms to accommodate the fork-join model. And to many - including Charles E. Leiserson, when his group was designing its first Cilk compilers - that seemed too daunting.

Tao B. Schardl and William S. Moses's chief insight was that injecting just a little bit of serialism into the fork-join model would make it much more intelligible to existing compilers' optimization algorithms. Where Cilk adds two basic commands to C, the MIT researchers' intermediate representation adds three to a compiler's middle end: detach, reattach, and sync.

The detach command is essentially the equivalent of Cilk's spawn command. But reattach commands specify the order in which the results of parallel tasks must be recombined. That simple adjustment makes fork-join code look enough like serial code that many of a serial compiler's optimization algorithms will work on it without modification, while the rest need only minor alterations.

Indeed, of the new code that Tao B. Schardl and William S. Moses wrote, more than half was the addition of runtime invocations, which existing fork-join compilers add in the front end, anyway. Another 900 lines were required just to define the new commands, detach, reattach, and sync. Only about 2,000 lines of code were actual modifications of analysis and optimization algorithms.

To test their system, the researchers built two different versions of the popular open-source compiler LLVM. In one, they left the middle end alone but modified the front end to add Cilk runtime invocations; in the other, they left the front end alone but implemented their fork-join intermediate representation in the middle end, adding the runtime invocations only after optimization.

Then they compiled 20 Cilk programmes on both. For 17 of the 20 programmes, the compiler using the new intermediate representation yielded more efficient software, with gains of 10 to 25 percent for a third of them. On the programmes where the new compiler yielded less efficient software, the falloff was less than 2 percent.

Source: Massachusetts Institute of Technology - MIT

Back to Table of contents

Primeur weekly 2017-02-06

Focus

Photon and Neutron Community ready to act as a go-between for the e-Infrastructures and user communities ...

Bridging socio-cultural distance in science through technical community-engaging mechanisms ...

Exascale supercomputing

How to improve data management in the supercomputers of the future ...

Crowd computing

Your computer can help scientists search for new childhood cancer treatments ...

Quantum computing

Quantum phase transition observed for the first time ...

Quantum matter: Shaken, but not stirred ...

First ever blueprint unveiled to construct a large scale quantum computer ...

Focus on Europe

PRACE opens Tier-1 for Tier-0 service ...

Middleware

New version of Univa Unisight 4.1 provides comprehensive tool to support IT purchasing decisions ...

Czech TV speeds broadcast and production delivery with DDN's fully integrated MEDIAScaler platform ...

Optimized compiler yields more-efficient parallel programmes ...

Hardware

Three magnetic states for each hole: researchers investigate the potential of metal grids for electronic components ...

Making the switch to polarization diversity ...

SDSC's 'Comet' supercomputer surpasses '10,000 users' milestone ...

New Cheyenne supercomputer triples scientific capability with greater efficiency ...

GBP 3.2 million for Midlands-based high performance computing centre ...

Applications

Machine learning accurately predicts metallic defects ...

Jupiter Medical Center implements revolutionary Watson for Oncology to help oncologists make data-driven cancer treatment decisions ...

University of Delaware's Anderson Janotti receives NSF Career Award to model defects in complex materials ...

Supercomputing and experiment combine for first look at magnetism of real nanoparticle ...

Researchers flip script for Li-Ion electrolytes to simulate better batteries ...

Huawei and SURFsara join forces for ICT innovation in Smart Healthcare and Smart Energy ...

The shape of melting in two dimensions: University of Michigan team uses Titan to explore fundamental phase transitions ...

Nature Geoscience highlights CALIOPE's ability to "provide decision makers with the information they need to take preventive action" on air quality ...

Magnetic recording with light and no heat on garnet ...

Breaking the jargon barrier ...

Carnegie Mellon Artificial Intelligence beats top poker pros ...

Preventing blood clots with a new metric for heart function: Simulations on Stampede supercomputer reveal better way of predicting future clots in the left ventricle ...

Berkeley Lab resources used to model superluminous supernova in 2D for first time ...

The Cloud

Utilities regulators see value in the Cloud and Cloud technology investments as critical to utilities' success ...