Because of its open source licensing, ability to reduce I/O constraints, and scalability, Lustre has been adopted widely by high-performance computing (HPC) users worldwide. But as the needs of HPC users evolve, so too must Lustre.
To that end, the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility, played a significant role in an ORNL event to share knowledge and discuss the future development of the parallel file system. The International Workshop on the Lustre Ecosystem: Challenges and Opportunities, which took place March 3 and 4 in Annapolis, Maryland, brought together Lustre users from academia, industry, and government to explore improvements in the parallel file system's performance and flexibility. OLCF staff members gave talks and technical presentations on both days of the workshop, sharing knowledge related to managing and optimizing the Lustre environment that could benefit other users.
The event was organized by the US Department of Defense (DOD)-HPC Research Programme at ORNL, a collaboration between DOD and ORNL. The programme has interests and competencies in extreme-scale HPC, particularly advanced architectures, metrics, benchmarks, system evaluations, programming environments, fully distributed data centres, and parallel file systems. Neena Imam, Mike Brim, and Sarp Oral of ORNL's Computing and Computational Sciences Directorate were the workshop co-chairs.
Historically, the OLCF has been a leader in deploying the largest known Lustre production file system", stated Mike Brim, a research associate in ORNL's Computer Science and Mathematics Division. "Because of this, we oftentimes run into problems before anyone else. This workshop gave us an opportunity to share the challenges we've overcome and make our solutions available to a wider audience who may be following the same path."
The first day of the programme featured a keynote presentation by Eric Barton, lead architect of the High Performance Data Division at Intel and a long-time proponent of Lustre. On day two, presentations covered technical topics, including burst buffer systems, dynamic file striping, and monitoring toolkits for Lustre.
Jason Hill, the OLCF's HPC Operations storage team leader and tutorial chair for the workshop, led sessions covering networking and the OLCF's efforts to minimize the effects of file system hardware and software failures.
Lustre has a lot of flexibility in the way you can configure it", Jason Hill stated. That's one of its great powers, but that's also one of its downfalls. You either have to be an expert in all the areas of the ecosystem that you create or obtain that support from a vendor. The hope is that other members of the Lustre community can benefit from our experience."
A major focus of the workshop concerned adapting Lustre to efficiently handle diverse, non-scientific workloads, such as those produced by Big Data-type applications. ORNL currently is spearheading this initiative.
Lustre was designed with scientific simulation in mind, which means it's good at sequential read and write I/O workloads", stated Sarp Oral, file and storage systems team lead for the OLCF Technology Integration Group. "Big Data workloads are different, requiring lots of small data reads and randomized access. Lustre is not well suited for these read-heavy, random I/O workloads today. Much of the discussion focused on what could be done to improve Lustre's capabilities in this area."
The first step in diversifying Lustre's I/O workload capabilities is to create tools that measure how the parallel file system currently handles big data workloads, Mike Brim said. "After we've characterized those workloads, we can start talking about what changes are necessary to make Lustre a more general purpose, high-performance parallel file system."
Enhanced workload capability could help expand Lustre's user base, historically a niche market, to include organisations and businesses in a growing number of sectors that are leveraging data mining and analytics tools. Increased capability also could benefit long-time Lustre adherents. For example, a more robust Lustre could give computational scientists improved data analysis capabilities, such as real-time data visualization.
If we can improve the productivity of analysis workloads on Lustre, we can improve the productivity of scientists by giving them insights more quickly", Mike Brim stated.