"NCCS previously provided long-term storage on tape media, which resulted in data that was difficult for users to discover and integrate into compute workflows", stated Bennett Samowich, NCCS CSS Lead. "By holding curated data products that are archived at other locations, CSS allows fast access to datasets in support of scientific research."
"In particular, AI and Machine Learning workflows can take advantage of the availability of these datasets", Bennett Samowich noted. "CSS is the best location for final data products that can become input to other projects."
CSS-hosted datasets span a range of types and sources, including:
The path "/css" provides access to all of the CSS data, which is stored using IBM's Spectrum Scale General Parallel File System (GPFS). High-speed internal InfiniBand and 40 Gigabit Ethernet networks connect CSS to the NCCS compute and Data Services environments.
While CSS is available as read-only, its 15-petabyte capacity - growing to 30 petabytes later this summer - will allow NCCS users to make their own data available to other users. Besides Earth science data, NCCS is open to hosting users' astrophysics datasets in the future. NCCS encourages users to share their data by requesting CSS access and moving their final data products to CSS. Data must fall under an active Data Management Plan; you can contact the NCCS User Services Group to get help setting up a plan.
In addition to CSS, "NCCS will continue to provide local storage on both Discover and ADAPT for intermediate datasets", Bennett Samowich said. "This local storage will continue to support workflows that require fast I/O during computation as well as analysis of research results."