Stephen Ficklin, a computational biologist in the WSU Department of Horticulture, and colleagues will build the networks with $895,000 from the National Science Foundation. Their project will help test the Scientific Data Analysis at Scale project, or SciDAS, a $2.9 million NSF-funded effort to improve the United States' cyber infrastructure and help scientists make better use of it.
"Improving our cyber infrastructure helps make the U.S. more competitive in research", Stephen Ficklin stated. "It keeps us in the forefront of data science."
The WSU team will collaborate with Clemson University and RENCI, the Renaissance Computing Institute, which is a collaboration between the University of North Carolina-Chapel Hill, North Carolina State University and Duke University.
The researchers will take data from the National Center for Biotechnology Information, a public repository of genomic information, for as many species as possible and create networks that show how each gene interacts with every other one.
"In the end, we will create the most complete repository of gene coexpression networks that exists anywhere", Stephen Ficklin stated.
By following network connections, he said, scientists could discover genes that benefit agriculture, medicine, animal science and other fields.
"As a WSU researcher, I hope to help plant breeders", he stated. "I want to build networks like these as tools for breeders to find traits theyre interested in. They could use biomarkers to screen their seedlings in weeks instead of months or years."
"Plant breeders can look for genes known to be associated with good or bad traits and use them to make traditional crosses", he added.
With tens of thousands of genes in every organism, the computer power required to create these gene-expression networks is vast - well beyond the capability of a single supercomputer. A single test case by Stephen Ficklin and collaborators required 1200 processors and four weeks - divided among 70,000 computation jobs.
SciDAS gives researchers an easier way to spread their computational needs across existing large-scale resources, such as the Open Science Grid or Cloud Lab, growing or shrinking their demands as needed.
The final network will be kept, as part of SciDAS, on three petabytes (3 million gigabytes) of storage at WSU, RENCI and Clemson University.