One of the National Science Foundation's (NSF) priority goals is to improve the nation's capacity in data science by investing in the development of infrastructure, building multi-institutional partnerships to increase the number of U.S. data scientists and augmenting the usefulness and ease of using data.
As part of that effort, NSF has provided $31 million in new funding to support 17 innovative projects under the Data Infrastructure Building Blocks (DIBBs) programme. Now in its second year, the 2014 DIBBs awards support research in 22 states and touch on research topics in computer science, information technology and nearly every field of science supported by NSF.
"Developed through extensive community input and vetting, NSF has an ambitious vision and strategy for advancing scientific discovery through data", stated Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This vision requires a collaborative national data infrastructure that is aligned to research priorities and that is efficient, highly interoperable and anticipates emerging data policies."
This year's data cyberinfrastructure awards build capacity and capability across the nation and across research communities and complement previous awards.
"Each project tests a critical component in a future data ecosystem in conjunction with a research community of users", Irene Qualters stated. "This assures that solutions will be applied and use-inspired."
NSF sees these building blocks as digital components that can be joined together to develop the foundations for a robust data infrastructure. The building blocks encompass hardware, software and networking tools, as well as the communities and people who manage data and who are the practitioners of data science.
Of the 17 awards, two support early implementations of research projects that are more mature; the others support pilot demonstrations. Each is a partnership between researchers in computer science and other science domains.
One of the two early implementation grants will support a research team led by Geoffrey Fox, a professor of computer science and informatics at Indiana University. Geoffrey Fox's team plans to create middleware and analytics libraries to allow data science to work at large scale on high-performance computing systems, also known as supercomputers.
Fox and his interdisciplinary team plan to test their platform with several different applications, including those used in geospatial information systems (GIS), biomedicine, epidemiology and remote sensing.
"Our innovative architecture integrates key features of open source Cloud computing software with supercomputing technology", Geoffrey Fox stated. "And our outreach involves 'data analytics as a service' with training and curricula set up in a Massive Open Online Course or MOOC."
Other institutions collaborating on the project include: Arizona State University, Emory University, Rutgers University, University of Kansas, University of Utah and Virginia Tech.
The other early implementation project is led by Ken Koedinger, professor of human computer interaction and psychology at Carnegie Mellon University. Whereas Geoffrey Fox's team focuses on problems in sensing and the life sciences, Ken Koedinger's team concentrates on developing infrastructure that will drive innovation in education.
The team will develop a distributed data infrastructure called LearnSphere that will make more educational data accessible to course developers, while also motivating more researchers and companies to share their data with the greater learning sciences community. LearnSphere will include a graphical user interface, a library of analytical methods and a wide variety of educational data gathered from such sources as interactive tutoring systems, educational games and MOOCs.
"We've seen the power that data has to improve performance in many fields, from medicine to movie recommendations", Ken Koedinger stated. "Educational data holds the same potential to guide the development of courses that enhance learning while also generating even more data to give us a deeper understanding of the learning process."
Other institutions collaborating on this project include: MIT, Stanford University and the University of Memphis.
The DIBBs programme awarded each early implementation project $5 million over 5 years.
The second group of awards supports pilot demonstrations that build upon the advanced cyberinfrastructure capabilities of existing research communities to address specific challenges in science and engineering research and extend those data capabilities to meet broad community needs. The awards provide $1.5 million over 3 years.
Among the projects supported by DIBBs awards are efforts to develop cyberinfrastructure to visualize geo-chronological data, like uranium dating of corals at the College of Charleston; data capture and curation for materials science research at the University of Illinois Urbana-Champaign; and efforts to manage data emerging from the Laser Interferometer Gravitational-wave Observatory or LIGO at Syracuse University.
The DIBBs programme is part of a co-ordinated strategy within NSF to advance data-driven cyberinfrastructure. It complements other major efforts including the DataOne project, the Research Data Alliance and Wrangler, a groundbreaking data analysis and management system for the national open science community.