"As more scientific research efforts at all scales are becoming ever-more reliant on the generation, management and dissemination of digital data, the availability of resources like Corral has become an essential component of research cyberinfrastructure", stated Chris Jordan, manager of Data Management and Collections at TACC. "We generate and analyze petabytes of data at HPC centres like TACC, and this makes Corral an incredibly valuable resource for the research community in Texas."
With Corral 3, TACC expects to host approximately 800 data collections, including more shared data sets, and Chris Jordan said he is seeing the size of collections grow. For example, instead of five terabyte collections for a single research lab, currently the most common allocations on Corral, he predicted there will be more large-scale collaborative projects with many users and 10s of terabytes of data. Examples include CyVerse and Galaxy, which together store over a petabyte of data on Corral; the Center for Space Research, with 200 terabytes; and DesignSafe, with more than 50 terabytes.
Corral 3 is the third in a long line of systems available for researchers across the country dating back to 2009. The system continues to assist academic researchers, serving as the primary storage and data management resource designed and optimized to support large-scale data collections and a collaborative research environment.
Corral 3 is largely funded by the University of Texas Research Cyberinfrastructure (UTRC). UTRC is a University of Texas system-wide initiative to provide new capabilities that advance current and future research efforts across all UT institutions. The system is installed at both UT Austin and UT Arlington to enable data multi-site data replication. Compared to Corral 2, the new system is similar in terms of the basic technology and user interfaces. "Researchers shouldn't see a difference in their environment and the accessibility of their data", Chris Jordan stated.
Some important specifications have changed, however.
TACC has updated all of the components, replacing three terabyte hard drives with eight terabyte hard drives that allow more capacity in a smaller space and power footprint, and providing newer servers and newer software so that the performance increases over the previous generation. Dell, DataDirect Networks and IBM are the technology partners. The total capacity of the system is more than twice that of Corral 2, and the peak network performance will also double.
"DDN delivers the world's highest performing storage solutions for Big Data, high performance computing, enterprise and Cloud workloads at any scale; and after a decade of successful collaboration, we are deeply honoured to continue as a trusted partner of TACC", stated Alex Bouzari, chief executive officer, chairman and co-founder, DataDirect Networks (DDN). "The addition of DDN's advanced SFA storage system with its massive scalability, unrivaled performance and low latency interconnect, provides world-class researchers with a powerful tool to enable collaborative research that leads to groundbreaking scientific discoveries with TACC's Corral 3."
According to Chris Jordan, the main challenge that researchers face is data movement and management - users rarely use Corral alone, they use Corral along with Stampede, Wrangler, Lonestar, visualizations, or web applications, so getting the data to a certain location at the right time is difficult.
To solve this issue with the introduction of Corral 3, TACC is moving toward a unified data infrastructure, Chris Jordan said. "Instead of a Corral log-in node, we'll provide data login nodes, so users have a more unified view of their data. They'll be able to see what's on all of the different systems and directly copy data from one system to another rather than having to use file transfer utilities."
Storing and managing data at this scale requires powerful tools. Many researchers on Corral use the iRODS data management system to handle the complexities of working with data at this scale. With the new hardware for Corral 3, TACC is also updating the iRODS tools and associated software to provide improved web interfaces and REST interfaces that users can use to work with their data.
Chris Jordan stated: "We'll have an object storage component; there will be many newer web technologies that people can use to access their data. That should provide a much easier way for them to see what's on the system, and to manipulate and download data to be able to share it with others."
"The main things we'll be working on over the lifetime of Corral 3 will be the software and service infrastructure, and the ways that people access and manage data - there is still a lot of work to be done in that area. We'll continue to add new services to help people better manage their data", Chris Jordan concluded.