Back to Table of contents

Primeur weekly 2015-11-09

Special

HARNESS explored principles to integrate heterogeneous resources into Cloud platform ...

Focus

Combining the benefits of both GPU and CPU in heterogeneous computing ...

Exascale supercomputing

Towards future supercomputing: EU project Exa2Green improves energy efficiency in high performance computing ...

DEEP project unveils next-generation HPC platform ...

Focus on Europe

Launch of BioExcel - Centre of Excellence for Biomolecular Research ...

Information security community for e-infrastructures crystalises at WISE workshop ...

ALCF helps tackle the Large Hadron Collider's Big Data challenge ...

Middleware

Bright Computing to release updates to popular management software at SC15 ...

Altair partners with South Africa's Centre for High Performance Computing ...

Cray, AMPLab, NERSC collaboration targets Spark performance on HPC platforms ...

Hardware

Singapore scientists among the first to benefit from Infinera Cloud Xpress with 100 GbE for data centre interconnect ...

Supermicro world record performance benchmarks for SYS-1028GR-TR with Intel Xeon Phi coprocessors announced at Fall 2015 STAC Summit ...

IBM Teams with Mellanox to help maximize performance of Power Systems LC line servers for Cloud and cluster deployments ...

LSU deploys new IBM supercomputer "Delta" to advance Big Data research in Louisiana ...

Applications

Nomadic computing speeds up Big Data analytics ...

Clemson researchers and IT scientists team up to tackle Big Data ...

Calcium-48's 'neutron skin' thinner than previously thought ...

Oklahoma University collaborating in NSF South Big Data Regional Innovation Hub ...

Columbia to lead Northeast Big Data Innovation Hub ...

University of Miami gets closer to helping find a cure for gastrointestinal cancer thanks to DDN storage ...

The Cloud

Cornell leads new National Science Foundation federated Cloud project ...

Bright Computing reveals plans for Cloud Expo Frankfurt ...

UberCloud delivers CAE Applications as a Service ...

IBM plans to acquire The Weather Company's product and technology businesses; extends power of Watson to the Internet of Things ...

Oracle updates Oracle Cloud Infrastructure services ...

Clemson researchers and IT scientists team up to tackle Big Data

Dr. Alex Feltus, an associate professor in genetics and biochemistry at Clemson, discusses his research at the Palmetto Cluster, a supercomputer owned and operated by Clemson University. Credit: Jim Melvin / Clemson University29 Oct 2015 Clemson - While researchers at Clemson University have recently announced an array of breakthroughs in agricultural and life sciences, the size of the data sets they are now using to facilitate these achievements is like a mountain compared to a molehill in regard to what was available just a few years ago. But as the amount of "Big Data" being generated and shared throughout the scientific community continues to grow exponentially, new issues have arisen. Where should all this data be stored and shared in a cost-effective manner? How can it be most efficiently transferred across advanced data networks? How will researchers be interacting with the data and global computing infrastructure?

A team of trail-blazing scientists and information technologists at Clemson is working hard to answer these questions by studying ways to simplify collaboration and improve efficiency.

"I use genomic data sets to find gene interactions in various crop species", stated Alex Feltus, an associate professor in genetics and biochemistry at Clemson. "My goal is to advance crop development cycles to make crops grow fast enough to meet demand in the face of new economic realities imposed by climate change. In the process of doing this, I've also become a Big Data scientist who has to transfer data across networks and process it very quickly using supercomputers like the Palmetto Cluster at Clemson. And I recently found myself - especially in just the past couple of years - bumping up against some pretty serious bottlenecks that have slowed down my ability to do my best possible work."

Big Data, defined as data sets too large and complex for traditional computers to handle, is being mined in new and innovative ways to computationally analyze patterns, trends and associations within the field of genomics and a wide range of other disciplines. But significant delays in Big Data transfer can cause scientists to give up on a project before they even start.

"There are many available technologies in place today that can solve the Big Data transfer problem", stated Kuang-Ching "KC" Wang, associate professor in electrical and computer engineering and also networking chief technology officer at Clemson. "It's an exciting time for genomics researchers to vastly transform their workflows by leveraging advanced networking and computing technologies. But to get all these technologies working together in the right way requires complex engineering. And that's why we are encouraging genomics researchers to collaborate with their local IT resources, which include IT engineers and computer scientists. This kind of cross-discipline collaboration is reflecting the national research trends."

In their recently published paper titled "The Widening Gulf between Genomics Data Generation and Consumption: A Practical Guide to Big Data Transfer Technology", Alex Feltus, Kuang-Ching Wang and six other co-authors at Clemson, the University of Utah and the National Center for Biotechnology Information discussed the careful planning and engineering required to move and manage Big Data at the speeds needed for high-throughput science. If properly executed, sophisticated data transfer networks, such as Internet2's Advanced Layer2 Service, as well as the inclusion of advanced applications and software, can improve transfer efficiency by orders of magnitude.

"Universities and other research organisations can spend a lot of money building supercomputers and really fast networks", Alex Feltus stated. "But with research computing systems, there's a gulf between the 'technology people' and the 'research people'. We're trying to bring these two groups of experts together and learn to speak a common dialect. The goal of our paper is to expose some of this information technology to the research scientists so that they can better see the big picture."

It won't be long before the information being generated by high-throughput DNA sequencing will soon be measured in exabytes, which is equal to one quintillion bytes or one billion gigabytes. A byte is the unit computers use to represent a letter, number or symbol.

In simpler terms, that's a mountain of information so immense it makes Everest look like a molehill.

"The technology landscape is really changing now", Kuang-Ching Wang stated. "New technologies are coming up so fast, even IT experts are struggling to keep up. So to make these new and ever-evolving resources available quickly to a wider range of different communities, IT staffs are more and more working directly with domain science researchers as opposed to remaining in the background waiting to be called upon when needed. Meanwhile, scientists are finding that the IT staffs that are the most open-minded and willing to brainstorm are becoming an invaluable part of the research process."

The National Science Foundation and other high-profile organizations have made Big Data a high priority and they are encouraging scientists to explore the issues surrounding it in depth. In August 2014, Alex Feltus, Kuang-Ching Wang and five cohorts received a $1.485 million NSF grant to advance research on next-generation data analysis and sharing. Also in August 2014, Alex Feltus and Walt Ligon at Clemson received a $300,000 NSF grant with Louisiana State and Indiana universities to study collaborative research for computational science. And in September 2012, Kuang-Ching Wang and James Bottum of Clemson received a $991,000 NSF grant to roll out a high-speed, next-generation campus network to advance cyberinfrastructure.

"NSF is increasingly showing support for these kinds of research collaborations for many of the different problem domains", Kuang-Ching Wang stated. "The sponsoring organisations are saying that we should really combine technology people and domain research people and that's what we're doing here at Clemson."

Alex Feltus, for one, is sold on the concept. He says that working with participants in Kuang-Ching Wang's CC-NIE grant has already uncovered a slew of new research opportunities.

"During my career, I've been studying a handful of organisms", Alex Feltus stated. "But because I now have much better access to the data, I'm finding ways to study a lot more of them. I see fantastic opportunities opening up before my eyes. When you are able to give scientists tools that they've never had before, it will inevitably lead to discoveries that will change the world in ways that were once unimaginable."
Source: Clemson University

Back to Table of contents

Primeur weekly 2015-11-09

Special

HARNESS explored principles to integrate heterogeneous resources into Cloud platform ...

Focus

Combining the benefits of both GPU and CPU in heterogeneous computing ...

Exascale supercomputing

Towards future supercomputing: EU project Exa2Green improves energy efficiency in high performance computing ...

DEEP project unveils next-generation HPC platform ...

Focus on Europe

Launch of BioExcel - Centre of Excellence for Biomolecular Research ...

Information security community for e-infrastructures crystalises at WISE workshop ...

ALCF helps tackle the Large Hadron Collider's Big Data challenge ...

Middleware

Bright Computing to release updates to popular management software at SC15 ...

Altair partners with South Africa's Centre for High Performance Computing ...

Cray, AMPLab, NERSC collaboration targets Spark performance on HPC platforms ...

Hardware

Singapore scientists among the first to benefit from Infinera Cloud Xpress with 100 GbE for data centre interconnect ...

Supermicro world record performance benchmarks for SYS-1028GR-TR with Intel Xeon Phi coprocessors announced at Fall 2015 STAC Summit ...

IBM Teams with Mellanox to help maximize performance of Power Systems LC line servers for Cloud and cluster deployments ...

LSU deploys new IBM supercomputer "Delta" to advance Big Data research in Louisiana ...

Applications

Nomadic computing speeds up Big Data analytics ...

Clemson researchers and IT scientists team up to tackle Big Data ...

Calcium-48's 'neutron skin' thinner than previously thought ...

Oklahoma University collaborating in NSF South Big Data Regional Innovation Hub ...

Columbia to lead Northeast Big Data Innovation Hub ...

University of Miami gets closer to helping find a cure for gastrointestinal cancer thanks to DDN storage ...

The Cloud

Cornell leads new National Science Foundation federated Cloud project ...

Bright Computing reveals plans for Cloud Expo Frankfurt ...

UberCloud delivers CAE Applications as a Service ...

IBM plans to acquire The Weather Company's product and technology businesses; extends power of Watson to the Internet of Things ...

Oracle updates Oracle Cloud Infrastructure services ...