Under the current system, users can save data to both personal and project directories - but that habit has often created confusion over where specific project data can be found by other team members. For example, if a user is assigned to multiple projects and saves all of this different data into a personal directory, other team members may have difficulty identifying which data belongs to what project. So, starting in January 2020, all data storage will be restricted to project areas.
"By only allowing writing to project directories, we eliminate confusion as to what file belongs to which project", stated Mitchell Griffith, an archival storage software developer for HPSS at the OLCF, a US Department of Energy (DOE) Office of Science User Facility at DOE's Oak Ridge National Laboratory. "If a user writes a file to project C's directory, then that project owns the file."
The new hierarchy is based on the project directory layout used by the OLCF's Spider storage system, making the two systems more compatible and eventually allowing project teams with sensitive, proprietary, or export-controlled data to be able to use HPSS to archive their data. Today, those project teams do not have access to HPSS, partly due to the current layout structure.
Each project will have three writable directories:
This project-based structure also helps avoid potential disorganisation when users leave a project as they move to other institutions, graduate, or start their own research projects.
"This is to prevent ambiguity in file ownership. Projects last longer than people, and we want to ensure some basic metadata about what is written to HPSS", Mitchell Griffith stated. "By doing this, we can associate a project's description with the file and have a better understanding of what is written to HPSS."
Users with data currently stored in their personal directories (at /home/$USER) will be encouraged to start transferring it to the correct project areas; Mitchell Griffith said the HPSS team expects the data migration to be fully completed in 12 to 18 months.
To easily transfer their data, Mitchell Griffith advises users to employ the "mv" (move) command instead of the "cp" (copy) command. The mv command is a quick metadata operation, whereas cp will copy data in HPSS, which can be a lengthy process.
Once a user's home directory is empty, it will become an unwritable managed directory and links will be automatically created to projects accessible by the user.
Meanwhile, the HPSS team will rename the current /proj/PROJECTID directories to /hpss/prod/PROJECTID/proj-shared, then add a link to that area.
"There will be an HPSS downtime when this is done, but this means technically the old project data is in a different location", Mitchell Griffith stated. "However, the link should allow the users to access the data like they have always accessed the data. We are doing metadata operations for the restructure instead of copying data."