Back to Table of contents

Primeur weekly 2013-09-16

Special

Sensor Cloud network - a new infrastructure for clever applications ...

The Cloud

Bright Computing nominated for the 2013 Deloitte Rising Star Award ...

Harnessing the petabyte at Rensselaer Polytechnic Institute ...

New system allows Cloud customers to detect programme-tampering ...

IBM launches Digital Marketing Network in the Cloud to help companies improve marketing performance in real-time ...

Desktop Grids

Successful BOINC:FAST event in Russia ...

TheSkyNet - T2 is born ...

EuroFlash

ParallWare v0.1b released ...

Promising hybrid computer architecture at CSCS ...

Science and Industry join hands: 1st Meeting of the PRACE Industrial Advisory Committee ...

More than 70 students from 4 continents gain HPC skills at fourth annual Summer School ...

PRACE 8th Call is open between 3 September and 15 October 2013 ...

Dassault Systemes unveils the SOLIDWORKS 2014 product portfolio ...

USFlash

US Energy Secretary Moniz dedicates Clean Energy Research Center and new supercomputer ...

Louisiana State University CCT's researchers develop Melete, among first interactive supercomputers ...

SGI achieves even higher levels of performance, scale and efficiency in its SGI ICE X and Rackable server lines by adding the new Intel Xeon processor E5-2600 v2 ...

Cray supercomputers now available with the new Intel Xeon processor E5-2600 V2 product family ...

Intel introduces highly versatile data centre processor family architected for new era of services ...

Adaptive's Moab HPC Suite to optimize new supercomputer at National Scientific User Facility in Washington ...

Indiana University and Internet2 celebrate 15-year partnership for advanced research and education networks ...

Continuing recovery in low-end HPC systems fuels 7.9% growth in the second quarter, according to IDC ...

IBM introduces NeXtScale system: High Performance Computing experience and technology move from the lab to the data centre ...

New centre to better understand human intelligence and build smarter machines ...

Don Bosco University links to National Knowledge Network ...

Cisco unveils nPower, world's most advanced network processor ...

The '50-50' chip: Memory device of the future? ...

Software may be able to take over from hardware in managing caches ...

Harnessing the petabyte at Rensselaer Polytechnic Institute


9 Sep 2013 Troy - A team of researchers at the Data Science Research Center (DSRC) at Rensselaer Polytechnic Institute is combining the reach of Cloud computing with the precision of supercomputers in a new approach to petabyte Big Data analysis.

"Advances in technology for medical imaging devices, sensors, and in powerful scientific simulations are producing data that we must be able to access and mine", stated Bulent Yener, founding director of the DSRC, a professor of computer science within the Rensselaer School of Science, and a member of the research team. "The trend is heading toward petabyte data and we need to develop algorithms and methods that can help us understand the knowledge contained within it."

The team, led by Petros Drineas, associate professor of computer science at Rensselaer, has been awarded a four-year, $1 million grant from the National Science Foundation Division of Information & Intelligent Systems to explore the new strategies for mining petabyte data. The project will enlist key faculty from across the Institute including Petros Drineas and Bulent Yener; Christopher Carothers, director of the Rensselaer supercomputing centre, the Computational Center for Nanotechnology Innovations (CCNI), and professor of computer science; Mohammed Zaki, professor of computer science; and Angel Garcia, head of the Department of Physics, Applied Physics, and Astronomy and senior chaired professor in the Biocomputation and Bioinformatics Constellation.

Petros Drineas said the team proposes a novel two-stage approach to harnessing the petabyte. "This is a new paradigm in dealing with massive amounts of data", Petros Drineas stated. "In the first stage, we will use Cloud computing - which is cheap and easily accessible - to create a sketch or a statistical summary of the data. In the second stage, we feed those sketches to a more precise - but also more expensive - computational system, like those in the Rensselaer supercomputing centre, to mine the data for information."

The problem, according to Bulent Yener, is that data on the petabyte scale is so large, scientists do not yet have a means to extract knowledge from the bounty.

"Scientifically, it is difficult to manage a petabyte of data", stated Bulent Yener. "It's an enormous amount of data. If, for example, you wanted to transfer a petabyte of data from California to New York, you would need to hire an entire fleet of trucks to carry the disks. What we are trying to do is establish methods for mining and for extracting knowledge from this much data."

Although petabyte data is still uncommon and not easily obtained - for this particular research project Angel Garcia will generate and provide a petabyte simulation of atomic-level movements, it is a visible frontier, and standard approaches to data analysis will be too costly, too time-consuming, and not sufficiently powerful to do the job given current computing power.

"Having a supercomputer process a petabyte of data is not a feasible model, but cloud computing cannot do the job alone either", Bulent Yener stated. "In this way, we do some pre-processing with the Cloud, and then we do more precise computing with CCNI. So it is finding this balance between how much you are going to execute, and how accurately you can execute it."

The work will include developing the techniques for pre-processing and precision processing, such as sampling, rank reduction, and search techniques. In one simplistic example, Bulent Yener said the Cloud may calculate some simple statistics for the data - mean, maximum, average - which could be used to reduce the data into a "sketch" that could be further analysed by a supercomputer.

Balancing between the two stages is critical, said Petros Drineas. "How do you execute these two stages? There are some steps, some algorithms, some techniques that we will be developing", Petros Drineas stated. "The steps in Cloud computing will all be directed to pre-processing, and the steps in supercomputing will all be directed to more exact, expensive, and precise calculations to mine the data."

Established in 2010, the DSRC is focused on fostering research and development to address today's most pressing data-centric and data-intensive research challenges, utilizing the unique resources available at Rensselaer. Recently, the DSRC welcomed General Dynamics Advanced Information Systems and General Electric as its first two corporate members.

Big Data, broad data, high performance computing, data analytics, and Web science are creating a significant transformation globally in the way we make connections, make discoveries, make decisions, make products, and ultimately make progress. The DSRC is a component of Rensselaer's university-wide effort to maximize the capabilities of these tools and technologies for the purpose of expediting scientific discovery and innovation, developing the next generation of these digital enablers, and preparing our students to succeed and lead in this new data-driven world.

Source: Rensselaer Polytechnic Institute

Back to Table of contents

Primeur weekly 2013-09-16

Special

Sensor Cloud network - a new infrastructure for clever applications ...

The Cloud

Bright Computing nominated for the 2013 Deloitte Rising Star Award ...

Harnessing the petabyte at Rensselaer Polytechnic Institute ...

New system allows Cloud customers to detect programme-tampering ...

IBM launches Digital Marketing Network in the Cloud to help companies improve marketing performance in real-time ...

Desktop Grids

Successful BOINC:FAST event in Russia ...

TheSkyNet - T2 is born ...

EuroFlash

ParallWare v0.1b released ...

Promising hybrid computer architecture at CSCS ...

Science and Industry join hands: 1st Meeting of the PRACE Industrial Advisory Committee ...

More than 70 students from 4 continents gain HPC skills at fourth annual Summer School ...

PRACE 8th Call is open between 3 September and 15 October 2013 ...

Dassault Systemes unveils the SOLIDWORKS 2014 product portfolio ...

USFlash

US Energy Secretary Moniz dedicates Clean Energy Research Center and new supercomputer ...

Louisiana State University CCT's researchers develop Melete, among first interactive supercomputers ...

SGI achieves even higher levels of performance, scale and efficiency in its SGI ICE X and Rackable server lines by adding the new Intel Xeon processor E5-2600 v2 ...

Cray supercomputers now available with the new Intel Xeon processor E5-2600 V2 product family ...

Intel introduces highly versatile data centre processor family architected for new era of services ...

Adaptive's Moab HPC Suite to optimize new supercomputer at National Scientific User Facility in Washington ...

Indiana University and Internet2 celebrate 15-year partnership for advanced research and education networks ...

Continuing recovery in low-end HPC systems fuels 7.9% growth in the second quarter, according to IDC ...

IBM introduces NeXtScale system: High Performance Computing experience and technology move from the lab to the data centre ...

New centre to better understand human intelligence and build smarter machines ...

Don Bosco University links to National Knowledge Network ...

Cisco unveils nPower, world's most advanced network processor ...

The '50-50' chip: Memory device of the future? ...

Software may be able to take over from hardware in managing caches ...