Back to Table of contents

Primeur weekly 2014-03-24

Special

Crowd computing takes a closer look at the social life of molecules ...

Exascale supercomputing

Dongarra calls for renewed focus on research into high-end math ...

The Cloud

HP expands testing for mobile and Cloud-based application delivery ...

Kamal Osman Jamjoom Group LLC transforms performance management system with Oracle HCM Cloud ...

EuroFlash

HPC performance insight improves usage at Cardiff University ...

Early-Bird registration opens for International Supercomputing Conference ...

Europe's most powerful supercomputer cleared for users ...

Bright Computing expands product line to manage HPC, OpenStack, and Apache Hadoop clusters ...

World-renowned computational biophysicist, Klaus Schulten to deliver ISC'14 keynote ...

Innovative computer under scrutiny ...

Plans for world class research centre in the United Kingdom ...

2nd International Workshop on OpenCL full technical programme is now available ...

Excelian offers a fully managed compute Grid solution with pay-as-you-go infrastructure costs ...

Prêt-à-fabriquer: Real-time simulation of textiles ...

Follow the ant trail for drug design ...

CFAED presents the new microchip 'Tomahawk 2' at the DATE'14 in Dresden ...

USFlash

Cray to install China's first Cray XC30 supercomputer at Hong Kong Sanatorium and Hospital ...

Dot Hill storage products drill down to support oil & gas applications ...

The New York Genome Center and IBM Watson Group announce collaboration to advance genomic medicine ...

SDSC's Gordon supercomputer assists in whole-genome sequencing analysis under collaboration with Janssen ...

HP delivers powerful system with faster analytics engine for SAP HANA environments ...

Southern Methodist University announces name for new supercomputer: ManeFrame ...

Oregon physicists use geometry to understand 'jamming' process ...

NIST chips help BICEP2 telescope find direct evidence of origin of the universe ...

University of Idaho researchers gain new advantage with one of the nation's most powerful computers ...

Study finds forest corridors help plants disperse their seeds ...

Computer analyzes massive clinical databases to properly categorize asthma patients ...

Start-up focuses on reliable, efficient cooling for computer servers ...

SDSC's Gordon supercomputer assists in whole-genome sequencing analysis under collaboration with Janssen


24 Mar 2014 San Diego - A recent whole-genome sequencing (WGS) analysis project supported by the San Diego Supercomputer Center (SDSC) at the University of California, San Diego has demonstrated the effectiveness of innovative applications of "flash" memory technology to rapidly process large data sets that are pervasive throughout human genomics research. Janssen Research and Development LLC, in collaboration with SDSC and the Scripps Translational Science Institute (STSI), recently launched a project to conduct whole-genome sequencing of 438 patients with rheumatoid arthritis to better understand the disease, as well as explore genetic factors of patient response to a biologic therapy discovered, developed, and currently marketed by Janssen in the United States.

The analysis began with 50 terabytes of "read" data generated by DNA sequencers from samples originally obtained from each of the study participants. These source data were fed into a 14-step processing "pipeline" using open source software tools. Key components of the analysis were mapping the DNA read sequences from each patient against a reference genome and calling to identify the variants between the two.

The read mapping and variant calling were done by Kristopher Standish, a UC San Diego graduate student working under Nicholas Schork, formerly with STSI and now with the J. Craig Venter Institute. SDSC provided high-performance computing and storage resources, as well as expertise to set up and optimize the computational pipeline.

"The need to conduct analysis of 438 full human genomes in a relatively short timeframe necessitated a thorough understanding not only of the computational workload, but of the memory, storage, and input/output requirements", stated Wayne Pfeiffer, an SDSC Distinguished Scientist and the Center's lead researcher in the collaboration. "The emergence of 'Big Data' challenges such as those in human genomics has brought to the fore situations where computer analyses are more likely memory-and I/O (input/output)-bound than compute-bound, meaning that while the actual computer processors may have plenty of capacity, the ability to store and/or move around large amounts of data becomes the limiting factor in throughput."

In the case of the Janssen collaboration, one step in particular - the "sort" step of the read mapping stage - was particularly challenging, requiring a relatively small number of processor cores, but rapid access to several terabytes of data, more than can be kept in the supercomputer's high performance main memory. The conventional approach of storing data on hard disk drives during the sort step resulted in a severely I/O-bound situation, dramatically limiting throughput.

"The solution was to take advantage of Gordon's flash memory, which provides much higher speed than conventional disk drives for the random access I/O operations of the sort step", stated Wayne Pfeiffer. "Several terabytes of flash were aggregated into what we call "BigFlash" nodes, which significantly reduced the I/O bottleneck in this step and contributed to helping researchers meet the project's timelines."

"The bulk of the analysis was completed in six weeks - including learning time on Gordon - using more than 300,000 core hours of computer time", stated Glenn K. Lockwood, a user services consultant at SDSC. "That analysis would have taken more than four years of 24/7 compute time on an 8-core workstation."

The collaboration also demonstrated the need for large-scale, high-performance computing resources when analyzing hundreds of human genomes in constrained timeframes. With 340 teraflops of computing power, 64 terabytes of main memory, and 300 terabytes of flash memory, Gordon ranked among the 50 fastest supercomputers in the world when it debuted in late 2011, according to the TOP500 list.

According to Glenn K. Lockwood, at the project's peak throughput, the WGS pipeline was using 350 terabytes of storage on SDSC's high-performance storage system and 5,000 processor cores representing 30 percent of the system capacity.

"The Janssen collaboration validated our vision for the Gordon system", stated Michael Norman, SDSC's director and principal investigator for the Gordon project. "We saw that emerging Big Data challenges such as human genomics would dictate new supercomputer architectures where memory and IOPS (I/O operations per second) would be more important than raw computing power, so we designed the system accordingly."

The Gordon supercomputer and other SDSC computational and storage systems are available to industrial collaborators on a space-available basis for conducting research and development. Interested parties should contact Ron Hawkins , director of industry relations.
Source: San Diego Supercomputer Center - SDSC

Back to Table of contents

Primeur weekly 2014-03-24

Special

Crowd computing takes a closer look at the social life of molecules ...

Exascale supercomputing

Dongarra calls for renewed focus on research into high-end math ...

The Cloud

HP expands testing for mobile and Cloud-based application delivery ...

Kamal Osman Jamjoom Group LLC transforms performance management system with Oracle HCM Cloud ...

EuroFlash

HPC performance insight improves usage at Cardiff University ...

Early-Bird registration opens for International Supercomputing Conference ...

Europe's most powerful supercomputer cleared for users ...

Bright Computing expands product line to manage HPC, OpenStack, and Apache Hadoop clusters ...

World-renowned computational biophysicist, Klaus Schulten to deliver ISC'14 keynote ...

Innovative computer under scrutiny ...

Plans for world class research centre in the United Kingdom ...

2nd International Workshop on OpenCL full technical programme is now available ...

Excelian offers a fully managed compute Grid solution with pay-as-you-go infrastructure costs ...

Prêt-à-fabriquer: Real-time simulation of textiles ...

Follow the ant trail for drug design ...

CFAED presents the new microchip 'Tomahawk 2' at the DATE'14 in Dresden ...

USFlash

Cray to install China's first Cray XC30 supercomputer at Hong Kong Sanatorium and Hospital ...

Dot Hill storage products drill down to support oil & gas applications ...

The New York Genome Center and IBM Watson Group announce collaboration to advance genomic medicine ...

SDSC's Gordon supercomputer assists in whole-genome sequencing analysis under collaboration with Janssen ...

HP delivers powerful system with faster analytics engine for SAP HANA environments ...

Southern Methodist University announces name for new supercomputer: ManeFrame ...

Oregon physicists use geometry to understand 'jamming' process ...

NIST chips help BICEP2 telescope find direct evidence of origin of the universe ...

University of Idaho researchers gain new advantage with one of the nation's most powerful computers ...

Study finds forest corridors help plants disperse their seeds ...

Computer analyzes massive clinical databases to properly categorize asthma patients ...

Start-up focuses on reliable, efficient cooling for computer servers ...