Back to Table of contents

Primeur weekly 2011-11-28

The Cloud

Shipping company signs $150 million-plus data centre services agreement with HP ...

Chinese genomics giant BGI releases latest bioinformatics software and datasets ...

New EMC Atmos Cloud Delivery Platform enhancements simplify delivery of Cloud Storage-as-a-Service - fully functional in minutes ...

UniCredit Global Information Systems S.p.A. Global Markets & Treasury selects Platform Computing to deliver a private Cloud environment ...

Adaptive Computing joins forces with SGI to deliver robust HPC and Cloud solutions ...

Fujitsu simplifies and unifies Cloud management with new ServerView Resource Orchestrator V3 ...

EMC OnDemand brings Captiva, Document Sciences and Documentum to the hybrid Cloud ...

UniCredit Global Information Systems S.p.A. Global Markets & Treasury selects Platform Computing to deliver a private Cloud environment ...

Desktop Grids

CERN has 2020 vision for LHC upgrade ...

Mersenne@home launched as new BOINC project ...

New Stanford software takes Folding@home's biological research to supercomputers ...

EuroFlash

projectiondesign launches 2nd Generation LED projector, the FL35 wqxga ...

PRACE awards 91 million processor core hours on the DECI call to 35 projects, new call open ...

Numascale, IBM and the University of Oslo together in European PRACE project for emerging technology ...

EMI 1 Update 10 now available ...

Fujitsu launches Hybrid Cloud Services for Microsoft Windows Azure customers ...

The impending revolution of low-power quantum computers ...

USFlash

K computer research results awarded ACM Gordon Bell Prize ...

Cray enters the integrated storage market ...

Adaptive Computing rocks the enterprise with new Moab HPC Suite - Enterprise Edition ...

3 DOE labs now connected with ultra-high speed network ...

Rankings released for supercomputers doing big data ...

NVIDIA Tesla GPUs again power world's greenest Petaflop supercomputer ...

Researchers get $7.5 million grant to combat nerve agents ...

K computer no. 1 in four benchmarks at HPC Challenge Awards ...

Whamcloud and Fujitsu to collaborate on Lustre development ...

SGI customers accelerate research efforts with new AMD Opteron 6200 series processors ...

HP to transform server market with single platform for mission-critical computing ...

EMC expands "no compromise" storage for High Performance Computing at SC11 ...

Chinese genomics giant BGI releases latest bioinformatics software and datasets

12 Nov 2011 Shenzhen - BGI, a large genomic organisation, has introduced several bioinformatics analysis pipelines and software, including assembly and binning tools, genetic variation software, as well as two Cloud-based green solutions for genomic-based research. In addition,GigaScience, an upcoming research journal published by BGI, has launched its new, freely accessible, large-scale database: GigaDB. The launch of GigaDB is heralded by the release of numerous large datasets of different types and from a variety of organisms. GigaDB is unique because it is directly affiliated with a journal and all of its datasets are assigned a Digital Object Identifier (DOI), which allows these data to be directly cited in future publications.

On the first day of the "6th International Conference of Genomics" (ICG-6) hosted by BGI, the researchers reported the availability of and information on updated and newly available bioinformatics applications, pipelines, and tools. These include the Short Oligonucleotide Analysis Package (SOAP series etc.) and Cloud-based software (Hecate 2, Gaea 2, GAMA, GSNP and Adam.) for Next-Gen data analysis, as well as others.

According to BGI's researchers, the updated SOAP series includes SOAP3, a GPU-accelerated short read alignment tool; SOAPindel, an indel finder; SOAPfusion, a gene fusion detector; SOAPsplice, a splice-junction detector; SOAPdenovo-Trans, a de novo transcriptome assembler; and Metacluster 4.0, a binning solving tool for metagenomics data. The SOAP toolkit is freely available at http://soap.genomics.org.cn .

Dr. Zhiyu Peng, Vice President of Research & Cooperation Division at BGI, gave a detailed introduction about SOAPsplice and SOAPfusion, which are two RNA-Seq data-based analytic tools designed specifically to detect splice junctions and gene fusions, respectively. Tests on SOAPsplice, using both simulated and real datasets, revealed its high sensitivity and high specificity. These qualities become more obvious under conditions of low sequencing depth. Analyses using SOAPfusion showed it currently has the highest sensitivity and lowest false discovery rate of all currently published gene-fusion detection tools.

In regard to these new tools, Dr. Peng stated that the "Emergence of the RNA-Seq technology provides unprecedented opportunities and accelerates the speed in the detection of fusion genes and splice junction sites. In particular, the gene fusion discovery performed by SOAPfusion provides an accurate and specific way which will greatly accelerate the study of genomic alternations in cancer as well as the therapeutic cancer studies."

SOAPdenovo-Trans is an assembler designed to handle alternative splicing and differing expression levels among transcripts for de novo transcriptome assembly using short RNA-Seq reads. Discussing this assembler, Dr. Yin Long Xie, Senior Bioinformatician of BGI, stated: "We evaluated SOAPdenovo-Trans on samples of mouse and rice as the animal and plant models, and the results showed this assembler could provide a more accurate, complete and faster way to construct the transcript sets."

Another area that requires extensive next-gen data analysis is metagenome studies. Metagenomic data creates difficulties for researchers due to a fundamental computational problem - how to group together sequence reads from similar species - which is particularly relevant when carrying out binning. At the release conference, Prof. Sim-Ming Yiu from the University of Hong Kong gave a presentation on some existing solutions and Metacluster 4.0, the latest software tool, for providing an excellent means to solve this binning problem. According to Prof. Yiu, this tool is able to handle 100 species and at varying abundance ratios.

With the rapid development of high-throughput sequencing technology over the past ten years, genomic studies have gradually become a standard approach in a wide range of research areas. Given that such research creates huge amounts of data, Cloud computing is becoming a favorable solution for large-scale bioinformatic analysis, both in terms of resource utilization, flexibility, and efficiency, as well as time and cost savings for massive data generation and computation.

Many IT industries and large genomic organizations have been gradually shifting their analytical methods to use Cloud-based green - more energy efficient - solutions for processing the enormous amounts of biological data. "With the co-operation with BGI, we have made many achievements in software development on green Cloud computing", stated Dr. Mian Lu from the Hong Kong University of Science and Technology, "A data processing pipeline has been re-implemented on GPU platform, and we have improved its efficiency: which could take only 6 hours to finish processing the data which needed 90 hours before."

One of the important green solutions that Cloud computing provides is based on the extensively shortened computation times needed when using the software that is developed on specialized hardware. GSNP and GAMA are two discovery tools for genetic variation implemented on the GPU platform. GSNP is used to detect single-nucleotide polymorphisms, and GAMA is a software tool used to estimate allelic frequencies. Compared with its predecessor SOAPsnp, GSNP achieves higher performance through improved sparse representation for base information and the massive data parallelism on the GPU. Dr. Lu noted that, "Within about 2 hours, a former three days process on human genome, can be done using GSNP." The original version of GAMA could take up to a year or more to compute the allele frequencies for a group of 1,000 individuals, however, Dr. Lu noted that the new version of "GAMA can generate the result in two days."

Dr. Lu also talked about another tool called Adam that was "developed by exploiting hardware features, which could sort and remove duplicate from massive data. Its performance has been improved by three times, handling 150GB data with a node of 25GB memory", stated Dr. Lu. For further information, about the new software and pipelines. For more information you can visit http://jil.genomics.org.cn .

In addition to their announcements on new software developments for specialized hardware, the BGI Bioinformatics Department also revealed their updated "flexible computing" solutions for de novo assembly and resequencing analyses: Hecate 2 and Gaea 2. Their original versions, Hecate 1 and Gaea 1, had been released in July of this year and had drawn significant attention worldwide from many biological researchers and news reporters.

In comparison with the former version, Hecate 2 has greater scalability than do the original algorithms, especially in terms of cost and time. "Hecate 2 adopts more sophisticated models for solving massive scale constraint optimization problems in de novo assembly in a fine-grained manner, which enables data from different sequencing platform to be assembled simultaneously and leads a dramatic improvement of the assembly quality in terms of accuracy, length and coverage", stated Evan Xiang, R&D Director at the Flexible Computing Center of BGI.

Evan Xiang also commented on Gaea 2, saying that it linearly increases processing speed with increasing cluster size and, added that, "the performance of Gaea 2 could surpass current available alignment software by aggregating their advanced functionalities into a unified Cloud based solution."

GigaDB hosts publicly available, large-scale datasets and also provides every dataset with a unique DOI. A DOI enables researchers to specifically reference these datasets in independent publications where these data are used. GigaDB is associated with the journalGigaScience, an upcoming research journal published by BGI and BioMedCentral.

The launch of GigaDB is accompanied by the release of seventeen large datasets on top of those already hosted such as the genome of the recent deadly outbreak strain of E. coli O104. These datasets now span much of tree of life, with data hosted from plants, animals (vertebrate and invertebrate) and microbes. The plant data includes whole-genome data from the foxtail millet, the potato, the Chinese cabbage, the domestic cucumber, the pigeonpea, and sweet and grain sorghums. The animal data includes whole-genome data from three species of ants, a roundworm (Ascaris suum), the naked mole rat, the domestic sheep, domestic and wild silkworms, the Tibetan antelope, and three different datasets (whole genome, transcriptome, and methylome) from a single Asian man.

These data are all freely accessible and will be of great use for analyses being done in a wide range of life-science fields. The DOI issued to each dataset allows researchers to directly cite the data itself - as a separate entity from the data analysis papers. This is a major step in promoting extremely rapid data release. As data can now be cited directly, data producers can now be properly acknowledged and recognized for their work and no longer need to wait to release the data until a more extensive analysis paper has been written, reviewed, revised, and published.

Additionally, DOIs make these data permanently accessible, easy to find and use, and available to replicate previous work. Five of these GigaDB newly released datasets illustrate the future of early data release: they are made available with a DOI, allowing the data producers to receive citable credit, for rapid use by the community before the analysis papers are published. The analysis paper for the sorghum genome has recently been accepted inGenome Biologyand is expected to be published later this month, demonstrating a new gold standard of placing a dataset citation in the references where it can be easily tracked.

GigaDB is available at http://GigaDB.org
Source: BGI Shenzhen

Back to Table of contents

Primeur weekly 2011-11-28

The Cloud

Shipping company signs $150 million-plus data centre services agreement with HP ...

Chinese genomics giant BGI releases latest bioinformatics software and datasets ...

New EMC Atmos Cloud Delivery Platform enhancements simplify delivery of Cloud Storage-as-a-Service - fully functional in minutes ...

UniCredit Global Information Systems S.p.A. Global Markets & Treasury selects Platform Computing to deliver a private Cloud environment ...

Adaptive Computing joins forces with SGI to deliver robust HPC and Cloud solutions ...

Fujitsu simplifies and unifies Cloud management with new ServerView Resource Orchestrator V3 ...

EMC OnDemand brings Captiva, Document Sciences and Documentum to the hybrid Cloud ...

UniCredit Global Information Systems S.p.A. Global Markets & Treasury selects Platform Computing to deliver a private Cloud environment ...

Desktop Grids

CERN has 2020 vision for LHC upgrade ...

Mersenne@home launched as new BOINC project ...

New Stanford software takes Folding@home's biological research to supercomputers ...

EuroFlash

projectiondesign launches 2nd Generation LED projector, the FL35 wqxga ...

PRACE awards 91 million processor core hours on the DECI call to 35 projects, new call open ...

Numascale, IBM and the University of Oslo together in European PRACE project for emerging technology ...

EMI 1 Update 10 now available ...

Fujitsu launches Hybrid Cloud Services for Microsoft Windows Azure customers ...

The impending revolution of low-power quantum computers ...

USFlash

K computer research results awarded ACM Gordon Bell Prize ...

Cray enters the integrated storage market ...

Adaptive Computing rocks the enterprise with new Moab HPC Suite - Enterprise Edition ...

3 DOE labs now connected with ultra-high speed network ...

Rankings released for supercomputers doing big data ...

NVIDIA Tesla GPUs again power world's greenest Petaflop supercomputer ...

Researchers get $7.5 million grant to combat nerve agents ...

K computer no. 1 in four benchmarks at HPC Challenge Awards ...

Whamcloud and Fujitsu to collaborate on Lustre development ...

SGI customers accelerate research efforts with new AMD Opteron 6200 series processors ...

HP to transform server market with single platform for mission-critical computing ...

EMC expands "no compromise" storage for High Performance Computing at SC11 ...