Back to Table of contents

Primeur weekly 2013-06-24

Special

Human Brain Project to seek support in neuromorphic computing and non-volatile memory approach ...

Deploying new and more energy-efficient combustion technologies with exascale power ...

Parallelism, hybrid architectures, fault tolerance and power major challenges for extreme computing ...

The Cloud

Dell launches secure and flexible Cloud solution for U.S. Governments ...

Eurotech launches new release of Everyware Cloud to simplify device management in the Internet of Things ...

Thermax Ltd chooses IBM PureSystems and SmartCloud over Cisco and Dell ...

Cloud computing user privacy in serious need of reform, scholars say ...

VTI brings Internet of Things (IOT) and Cloud computing to test and measurement ...

Desktop Grids

Seeking testers for BOINC on Android ...

SETIspirit Windows GUI for SETI@home released ...

Using IBM's crowdsourced supercomputer, Harvard rates solar energy potential of 2.3 million new compounds ...

EuroFlash

projectiondesign ships ProNet.precision, camera-assisted warp and blend software ...

Remote Cluster Administration offers a unique solution to the HPC skills gap ...

New Cluster Installation Further Strengthens Regional HPC Infrastructure ...

Altair Engineering announces 8th UK Altair Technology Conference;to be held at the Heritage Motor Centre, Gaydon, Warwickshire ...

GENOA, MCQ-Composites to join Altair Partner Alliance Composites line-up ...

Altair broadens relationship with Siemens PLM Software to enhance data exchange for its CAE software users ...

Neuroscience to benefit from hybrid supercomputer memory ...

ISC'13 caps 28th Conference with new attendance, awards and more ...

USFlash

CANARIE upgrades 100G research & education network with Ciena ...

Linguists, computer scientists use supercomputers to improve natural language processing ...

UC San Diego launches new research computing programme ...

Which qubit my dear? New method to distinguish between neighbouring quantum bits ...

Making memories: Practical quantum computing moves closer to reality ...

Intel introducing new Lustre solution during Lustre event, addressing new Lustre markets ...

NetApp unveils clustered data ONTAP innovations that pave the way for software-defined storage ...

HP expands Converged Storage portfolio ...

UC San Diego researchers get access to Open Science Grid ...

HP extends support for OpenVMS through year 2020 ...

IBM expands support for Linux on Power Systems servers ...

Parallelism, hybrid architectures, fault tolerance and power major challenges for extreme computing


20 Jun 2013 Leipzig - Jack Dongarra from the University of Tennessee and Oak Ridge National Laboratory highlighted the critical issues for algorithm and software design in the exascale session on June 20 at the ISC'13 in Leipzig. Today's machines are too complicated, developers have to build smarts into the software to adapt to the hardware, Jack Dongarra explained. At present, the reproducibility of results cannot be guaranteed. The experts understand the issues but some of their colleagues have a hard time to deal with this.

Jack Dongarra described the use of synchronization-reducing algortihms and communication-reducing algorithms as well as mixed precision methods. The developers need to work with fault resilient algorithms and implement algorithms that can recover from failures.

The strategies used are checkpointing and restart, diskless checkpointing, and algorithm based fault tolerance (ABFT). Before the factorization starts, a checksum is taken and algorithm based fault tolerance is used, explained Jack Dongarra.

For the dense facorization, the C matrix contains a checksum. The overhead is for 1 failure, for multiple failures you have to multiple by the number of failures you want to protect against.

The Fault Tolerant Linear algebra software package extends the ScaLAPACK codebase. It will be released next September and will support the checkpoint on failure.

Jack Dongarra thinks developers should be implementing ABFT for dense factorization with minimal middleware support. They also have to enable ABFT recovery on existing MPI implementations. CoF is a hybrid of rollback recovery and ABFT.

MPI requirements consist in returning control after failure and termination after checkpoint.

ABFT with checkpoint on failure runs on today's unmodified MPI systems. There are three executions runs.

The tolerance is about 3%.

There is a new generation of DLA software. The software algorithms follow hardware evolution in time, according to Jack Dongarra. We had LINPACK in the seventies, LAPACK in the eighties, and ScaLAPACK in the nineties. Today, there are new algorighms that are many-core friendly, known as PLASMA and hybrid algorithms, known as MAGMA.

Jack Dongarra introduced the parallel linear algebra software for multicore/hybrid architectures to the audience. The parallel runtime scheduler and execution control is PaRSEC. It executes a dataflow representation of a programme. The scheduler provides automatic load-balance between cores. It harnesses the power.

Another tool is runtime DAG scheduling. Every process has the symbolic DAG representation with backgroud remote data transfer.

The task affinity in PaRSEC is the following: within each node, task scheduling on hardware resources is decided dynamically.

Jack Dongarra also talked about hybrid clusters of accelerators. The Keeneland system has three GPU acclereators per node but has a severe computation/bandwidth provision imbalance with 75% of ideal scaling and 60% of GEMM peak.

The energy used depends on the number of cores. There is up to 62% more of energy efficiency while using a high performance tuned scheduling, explained Jack Dongarra.

Leslie Versweyveld

Back to Table of contents

Primeur weekly 2013-06-24

Special

Human Brain Project to seek support in neuromorphic computing and non-volatile memory approach ...

Deploying new and more energy-efficient combustion technologies with exascale power ...

Parallelism, hybrid architectures, fault tolerance and power major challenges for extreme computing ...

The Cloud

Dell launches secure and flexible Cloud solution for U.S. Governments ...

Eurotech launches new release of Everyware Cloud to simplify device management in the Internet of Things ...

Thermax Ltd chooses IBM PureSystems and SmartCloud over Cisco and Dell ...

Cloud computing user privacy in serious need of reform, scholars say ...

VTI brings Internet of Things (IOT) and Cloud computing to test and measurement ...

Desktop Grids

Seeking testers for BOINC on Android ...

SETIspirit Windows GUI for SETI@home released ...

Using IBM's crowdsourced supercomputer, Harvard rates solar energy potential of 2.3 million new compounds ...

EuroFlash

projectiondesign ships ProNet.precision, camera-assisted warp and blend software ...

Remote Cluster Administration offers a unique solution to the HPC skills gap ...

New Cluster Installation Further Strengthens Regional HPC Infrastructure ...

Altair Engineering announces 8th UK Altair Technology Conference;to be held at the Heritage Motor Centre, Gaydon, Warwickshire ...

GENOA, MCQ-Composites to join Altair Partner Alliance Composites line-up ...

Altair broadens relationship with Siemens PLM Software to enhance data exchange for its CAE software users ...

Neuroscience to benefit from hybrid supercomputer memory ...

ISC'13 caps 28th Conference with new attendance, awards and more ...

USFlash

CANARIE upgrades 100G research & education network with Ciena ...

Linguists, computer scientists use supercomputers to improve natural language processing ...

UC San Diego launches new research computing programme ...

Which qubit my dear? New method to distinguish between neighbouring quantum bits ...

Making memories: Practical quantum computing moves closer to reality ...

Intel introducing new Lustre solution during Lustre event, addressing new Lustre markets ...

NetApp unveils clustered data ONTAP innovations that pave the way for software-defined storage ...

HP expands Converged Storage portfolio ...

UC San Diego researchers get access to Open Science Grid ...

HP extends support for OpenVMS through year 2020 ...

IBM expands support for Linux on Power Systems servers ...