Back to Table of contents

Primeur weekly 2013-10-14

Special

Big Data analytics: a complex story of disruptive hype countered with converging technologies ...

Big Data forcing enterprises to look into the direction of HPC solutions ...

River no longer too deep between HPC and data intensive computing ...

HPC is HPC, and enterprise is enterprise, and never the twain shall meet? Can Big Data be the catalyst? ...

High Performance Data Analysis ecosystem to grow to more than $2 billion by 2016 ...

Focus

2013: Another year on the road to Exascale - An Interview with Thomas Sterling and Satoshi Matsuoka - Part III ...

The Cloud

Contrail project partners to release version R1.3 of the Contrail software ...

Fujitsu begins global packaged sales of "SPATIOWL" location data Cloud service ...

Dynamically managing network bandwidth in a Cloud ...

EuroFlash

The Transinsight Award for Semantic Intelligence goes to the "Wishart" team from the University of Alberta, Canada ...

Final Call for the HPCAC- ISC 2014 Student Cluster Competition Submission ...

CGG slashes development time with Allinea DDT ...

Bull launches GCOS7 V12 for its large mainframes ...

Bull launches Bull Optimal Database Booster, to optimize the performance of Oracle databases running on its Escala servers ...

Adept project: investigating energy efficiency in parallel technologies ...

DataDirect Networks' scalable, high-performance storage powers Wellcome Trust Sanger Institute's worldwide research efforts to reduce global health burden ...

The Human Brain Project has begun ...

Intel and Janssen Pharmaceutica to collaborate with imec and 5 Flemish universities to open ExaScience Life Lab ...

PRACE to showcase the principles of HPC at the European Union Contest for Young Scientists (EUCYS) ...

The 2013 Nobel Prize in Chemistry goes for multiscale models development ...

USFlash

Jack Dongarra receives high honour for supercomputing accomplishments ...

Cray enhances coprocessor and accelerator programming with support for OpenACC 2.0 ...

NCSA joins OpenSFS ...

Juniper Networks enables the discovery of new data insights in IBM Research Accelerated Discovery Lab ...

Louisiana State University researchers awarded nearly $1 million for Big Data research ...

Winchester Systems introduces FlashDisk RAID arrays with iSCSI 10Gb ...

River no longer too deep between HPC and data intensive computing


26 Sep 2013 Heidelberg - At the ISC'13 Big Data Conference, Dr. Flavio Villanustre, vice president of Infrastructure & Products for the HPCC Systems Group and vice president of Information Security at LexisNexis, offered his perspective on the merging of HPC and data intensive computing. In the late nineties LexisNexis built its own HPC platform, called HPCC, providing back-end data manipulation, hand-lifting on the batch-oriented model, and all the real time delivery models. In 2011, HPCC was launched as an open source project to gather more adoption around the community.

In order to put things into perspective, Dr. Villanustre provided the audience with a brief history of high performance and data intensive computing.

Initially, HPC efforts were driven by the need for specialized hardware to tackle complex problems but the demand quickly picked up volume and made HPC commercially viable with Cray as a strong player in the market, the speaker explained. The goal was to build the largest and fastest computers in the world. Typical workloads in simulation in those days usually involved a large number of floating points operations on a more or less constantly sized data, needing only one single or a few threads of execution.

The birth of Beowulf initiated the notion of distributed execution in HPC, the speaker continued. Commodity hardware, commodity OS, distributed architecture, and virtually limitless scalability made supercomputing affordable to the masses. The parallelism burden thus was moved into the tools and became the programmers' problem. Dr. Villanustre explained.

But then the growth of digital data and the Internet generated data volumes never seen before. The search engines and data services companies pushed traditional data stores to their limits. Since the value derived from the aggregation of massive amounts of data, the distributed data stores and data intensive

computing turned data locality into a key performance factor. As such, the distributed execution paradigms quickly gained interest, the speaker told the audience.

So, in this context, what are the key design principles for distributed data intensive computing? Dr. Villanustre asked. Data can be huge but their volume is highly reduced throughout the process. Sometimes the disk activity can be sequenced to minimize the number of disk seeks. Many data search and retrieval problems are embarrassingly parallel but not all of them. The commodity hardware has become so affordable that an agile exploratory analysis of large data is now feasible and also desirable, according to the speaker.

At present, we are in need of analytics and recommendation systems, and non-linear regression, as well as multinomial classification, and complex clustering. This is where machine learning comes in. As a matter of fact, many algorithms cannot be easily translated into maps and folds. So iterative optimizers are a must when closed form solutions would be highly inefficient or simply do not exist. Dr. Villanustre warned that the parallellisation of algorithms can be a daunting task and when it comes to massive graphs, the partitioning can be really tricky.

Over the years we have learned that floating point operations are still very important, but in-place data processing and data locality are as well. The speaker showed that cache coherency is extremely critical to obtain the most from CPUs and even RAM can be too slow. Parallelism is paramount but unfortunately, parallel algorithms are difficult to implement and debug. A uniform memory access simplifies programming models but has scalability challenges. Most data analytic algorithms luckily can be expressed as a series of

vector and matrix operations. However, power consumption is starting to become a significant problem, as we all can witness, according to the speaker.