Clowder was designed to address the preservation, sharing, navigating, and reuse of large and diverse collections of data that is now essential to scientific discoveries. These data navigation needs are also important when addressing the growing number of research areas where data and tools must span multiple domains. "Clowder was built from common needs across different research communities", stated Kenton McHenry, Co-PI. To support these needs effectively, new methods are required that simplify and reduce the amount of effort needed by researchers to find and utilize data, support community accepted data practices, and bring together the breadth of standards, tools, and resources utilized by a community.
But there's also a need to make this data more accessible and usable. "Often times, data can be difficult to curate and or share, so now, we're able to capture data that would be lost otherwise", stated Kenton McHenry. Metadata plays a key role in order for data to be usable. However, getting that metadata can be a manual and tedious process, but having machine learning-based tools analyze data and extract metadata makes the curation process more accessible now with Clowder to automate a portion of that process.
Additionally, Clowder as a science gateway makes it easier for users to access advanced HPC/Cloud resources and analysis tools. With Clowder's auto curation feature, researchers can upload data into a Dropbox-like interface, and trigger complex analysis tools operating in the background.
Dr. Praveen Kumar will lead an effort to work with nine Critical Zone Observatories (CZO) across the United States - where researchers study the region of the environment from the top of the plant canopy to the bedrock beneath, known as the critical zone - to help organize their data and demonstrate the applicability of the system. "We want to use this system to do scientific investigation using this cross-observatory data, and the purpose is to make sure that the systems put in place are designed to support valuable science investigation rather than being arbitrarily stacked together", Dr. Kumar stated. "The scientific investigation may create requirements for the organization of the system, the architecture of the system."
Clowder has had and will continue to have a major impact on materials and semiconductor research areas, since it is an integral part of the 4CeeD system. Material scientists and semiconductor fabrication researchers can use 4CeeD to capture, curate, coordinate, correlate, and distribute their data from scientific instruments, such as microscopes, to private Cloud infrastructure. The Cloud-based infrastructure conducts this work in a trusted and real-time manner, using the modified Clowder data management system for instrument data management. 4Ceed is funded by the National Science Foundation and led by Dr. Klara Nahrstedt, Director of the Coordinated Science Laboratory and the Ralph and Catherine Fisher Professor of Computer Science.
"Discovering new materials can take decades, in part due to the time it takes to conduct research, thanks to the loss of knowledge that occurs when vital information is tossed out or is inaccessible", Dr. Nahrstedt stated. "4CeeD enables researchers to capture, curate, analyze, and correlate instrument data during experiments in real-time, search for experimental data with specific instrument parameters and receive insights into their own work, a task that would not be possible without the power of the Clowder system."
Clowder is essential to enabling 4CeeD in two ways: First, the data management system, maintained by Engineering IT Shared Services, provides 4CeeD with reliable Cloud computer and data services, which current and future scientists need to advance their work. Second, Clowder will help provide 4CeeD access to a larger user base, particularly to researchers in the Materials Research Lab and Micro and Nano Technology Lab, while offering advanced data management techniques to students, who are the next generation of scientists working with complex data sets.
Clowder aims to continue working towards sustainability in order to become a true open source project, that is decentralized and robust. Until this funding from the NSF, Clowder has never been funded solely as Clowder, but has been built up by meeting a common need to effectively share and analyze data across numerous projects. This grant is a step closer to the sustainability of software, and is an investment to make it sustainable across organisations.
"All of us are looking forward to building this roadmap for future software needs from the research community by bringing together these partners", stated Kenton McHenry.
"Over the past six years, Clowder has benefitted from contributions from developers and stakeholders across many projects of different size and scope. This new effort will allow the Clowder community to continue to grow beyond individual projects to develop a flexible framework for data management across many scientific disciplines", stated Luigi Marini, Software Architect for Clowder.
The project enhances Clowder's core systems for the benefit of a larger group of users. It increases the level of interoperability with community accepted resources and tools, hardens the core software, and distributes core software development, while continuing to expand usage. Governance mechanisms and a business model are established to make Clowder sustainable, creating an appropriate governance structure to ensure that the software continues to be available, supportable, and usable. The effort engages a number of stakeholders, taking data from diverse but converging scientific domains already using the Clowder framework, to address broad interoperability and cross domain data sharing. The overall effort will transition the grassroots Clowder user community and Clowder's other stakeholders (such as current and potential developers) into a larger organized community, with a sustainable software resource supporting convergent research data needs.