Back to Table of contents

Primeur weekly 2015-11-09

Special

HARNESS explored principles to integrate heterogeneous resources into Cloud platform ...

Focus

Combining the benefits of both GPU and CPU in heterogeneous computing ...

Exascale supercomputing

Towards future supercomputing: EU project Exa2Green improves energy efficiency in high performance computing ...

DEEP project unveils next-generation HPC platform ...

Focus on Europe

Launch of BioExcel - Centre of Excellence for Biomolecular Research ...

Information security community for e-infrastructures crystalises at WISE workshop ...

ALCF helps tackle the Large Hadron Collider's Big Data challenge ...

Middleware

Bright Computing to release updates to popular management software at SC15 ...

Altair partners with South Africa's Centre for High Performance Computing ...

Cray, AMPLab, NERSC collaboration targets Spark performance on HPC platforms ...

Hardware

Singapore scientists among the first to benefit from Infinera Cloud Xpress with 100 GbE for data centre interconnect ...

Supermicro world record performance benchmarks for SYS-1028GR-TR with Intel Xeon Phi coprocessors announced at Fall 2015 STAC Summit ...

IBM Teams with Mellanox to help maximize performance of Power Systems LC line servers for Cloud and cluster deployments ...

LSU deploys new IBM supercomputer "Delta" to advance Big Data research in Louisiana ...

Applications

Nomadic computing speeds up Big Data analytics ...

Clemson researchers and IT scientists team up to tackle Big Data ...

Calcium-48's 'neutron skin' thinner than previously thought ...

Oklahoma University collaborating in NSF South Big Data Regional Innovation Hub ...

Columbia to lead Northeast Big Data Innovation Hub ...

University of Miami gets closer to helping find a cure for gastrointestinal cancer thanks to DDN storage ...

The Cloud

Cornell leads new National Science Foundation federated Cloud project ...

Bright Computing reveals plans for Cloud Expo Frankfurt ...

UberCloud delivers CAE Applications as a Service ...

IBM plans to acquire The Weather Company's product and technology businesses; extends power of Watson to the Internet of Things ...

Oracle updates Oracle Cloud Infrastructure services ...

Cray, AMPLab, NERSC collaboration targets Spark performance on HPC platforms

This chart highlights portions of the BDAS stack (in red) that will be explored as a part of the Cray/AmpLab/NERSC collaboration.4 Nov 2015 Berkeley - As data-centric workloads become increasingly common in scientific and industrial applications, a pressing concern is how to design large-scale data analytics stacks that simplify analysis of the resulting data. A new collaboration between Cray, researchers at UC Berkeley's AMPLab and Berkeley Lab's National Energy Research Scientific Computing Center (NERSC) is working to address this issue.

The need to build and study increasingly detailed models of physical phenomena has benefited from advancement in high performance computing (HPC) for decades. It has also resulted in an exponential increase in data, from simulations as well as real-world experiments. This has fundamental implications for HPC systems design, such as the need for improved algorithmic methods and the ability to exploit deeper memory/storage hierarchies and efficient methods for data interchange and representation in a scientific workflow. The modern HPC platform has to be equally capable of handling both traditional HPC workloads and the emerging class of data-centric workloads and analytics motifs.

In the commercial sector, these challenges have fueled the development of frameworks such as Hadoop and Spark and a rapidly growing body of open-source software for common data analysis and machine learning problems. These technologies are typically designed for and implemented in distributed data centres consisting of a large number of commodity processing nodes, with an emphasis on scalability, fault tolerance and productivity. In contrast, HPC environments are focused primarily on no-compromise performance of carefully optimized codes at extreme scale.

Given this scenario, how can we the derive the greatest value from adapting productivity-oriented analytics tools such as Spark to HPC environments? And how can a framework like Spark better exploit supercomputing technologies like advanced interconnects and memory hierarchies to improve performance at scale, without losing its productivity benefits?

To address these questions researchers from Cray, AMPLab and NERSC are actively examining research and performance issues in getting Spark up and running on HPC environments such as NERSC's Edison (Cray XC30) and Cori (Cray XC40) systems. Since linear algebra algorithms underlie many of NERSC's most pressing scientific data analysis problems, this collaboration will involve the development of novel randomized linear algebra algorithms, the implementation of these algorithms within the AMPLab stack and on Edison and Cori and the application of these algorithms to some of NERSC's most pressing scientific data-analysis challenges, including problems in BioImaging, Neuroscience and Climate Science.

"Analytics workloads will be an increasingly important workload on our supercomputers and we are thrilled to support and participate in this key collaboration", stated Ryan Waite, senior vice president of products at Cray. "As Cray's supercomputing platforms enable researchers and scientists to model reality ever more accurately using high-fidelity simulations, we have long seen the need for scalable, performant analytic tools to interpret the resulting data. The Berkeley Data Analytics Stack (BDAS) and Spark, in particular, are emerging as a de facto foundation of such a toolset because of their combined focus on productivity and scalable performance."

Drawing strength from NERSC's expertise in scientific data applications, the collaboration combines grand challenge analytical problems from NERSC, pioneering research into big data platforms and scalable randomized linear algebra methods from AMPLab and Cray's long-standing expertise in scalable supercomputing systems. "We are looking forward to understanding and improving the systems-level behavior and performance of Spark when it is applied to challenging real-world analytics problems on some of Cray's biggest platforms to date", stated Venkat Krishnamurthy of the Analytics Products group at Cray, who is leading Cray's involvement in this initiative.

"The AMPLab has been a great success in terms of infrastructure development, but we are continually on the lookout for new use cases to stress-test our framework", stated Michael Mahoney, a faculty member in the University of California, Berkeley Department of Statistics and AMPLab and lead principal investigator on the project. "Spark is very good for certain data analysis computations, but typical Spark use cases haven't stressed many of the sophisticated linear algebra computations that underlie popular machine learning algorithms. This has historically been the domain of scientific computing. We aim to bridge that gap, to the benefit of both areas."

"There is currently a lot of momentum behind Spark in the commercial world, and we would like to explore how the scientific community can benefit from the resulting big data analytics capabilities", stated Prabhat, Data and Analytics Services Group Lead at NERSC. "Spark offers a highly productive interface for data scientists; the question in my mind is really regarding Spark's performance and scalability. Historically, the HPC community has set a high bar for computing performance, and we are hopeful that this collaboration will lead the way in bridging the gap between big data analytics for commercial and high-performance scientific applications."
Source: National Energy Research Scientific Computing Center - NERSC

Back to Table of contents

Primeur weekly 2015-11-09

Special

HARNESS explored principles to integrate heterogeneous resources into Cloud platform ...

Focus

Combining the benefits of both GPU and CPU in heterogeneous computing ...

Exascale supercomputing

Towards future supercomputing: EU project Exa2Green improves energy efficiency in high performance computing ...

DEEP project unveils next-generation HPC platform ...

Focus on Europe

Launch of BioExcel - Centre of Excellence for Biomolecular Research ...

Information security community for e-infrastructures crystalises at WISE workshop ...

ALCF helps tackle the Large Hadron Collider's Big Data challenge ...

Middleware

Bright Computing to release updates to popular management software at SC15 ...

Altair partners with South Africa's Centre for High Performance Computing ...

Cray, AMPLab, NERSC collaboration targets Spark performance on HPC platforms ...

Hardware

Singapore scientists among the first to benefit from Infinera Cloud Xpress with 100 GbE for data centre interconnect ...

Supermicro world record performance benchmarks for SYS-1028GR-TR with Intel Xeon Phi coprocessors announced at Fall 2015 STAC Summit ...

IBM Teams with Mellanox to help maximize performance of Power Systems LC line servers for Cloud and cluster deployments ...

LSU deploys new IBM supercomputer "Delta" to advance Big Data research in Louisiana ...

Applications

Nomadic computing speeds up Big Data analytics ...

Clemson researchers and IT scientists team up to tackle Big Data ...

Calcium-48's 'neutron skin' thinner than previously thought ...

Oklahoma University collaborating in NSF South Big Data Regional Innovation Hub ...

Columbia to lead Northeast Big Data Innovation Hub ...

University of Miami gets closer to helping find a cure for gastrointestinal cancer thanks to DDN storage ...

The Cloud

Cornell leads new National Science Foundation federated Cloud project ...

Bright Computing reveals plans for Cloud Expo Frankfurt ...

UberCloud delivers CAE Applications as a Service ...

IBM plans to acquire The Weather Company's product and technology businesses; extends power of Watson to the Internet of Things ...

Oracle updates Oracle Cloud Infrastructure services ...