LANL is one of the premier supercomputing and scientific research institutions in the world. Its mission is to solve national security challenges through scientific excellence. To support and enhance its constantly evolving environment for scientific simulations and technical computing architectures, LANL sought a high-performance, open, scalable, and reliable site-wide Lustre file system that represented the best overall value.
LANL selected Aeon Computing's high-performance open Lustre Scalable Unit to meet the compute-intensive demands of several computing clusters by delivering two separate file systems. Each system featured 14 Petabytes of storage capacity and up to 160 GB/second I/O performance using Lustre on OpenZFS file system.
Aeon Computing's deployment represents the largest known ZFS-based Lustre file system that does not rely on hardware-based or proprietary RAID storage technology.
Using Aeon Computing's Lustre storage, LANL brings a large, reliable, and open standards-based performance tier data storage resource to its different HPC platforms, with shared access across its wide-ranging supercomputing environment.
Aeon Computing's Lustre file system, based on its Lustre Scalable Unit, delivers 14 Petabytes at up to 160 Gigabytes per second performance over single-rail FDR14 Infiniband. Each Lustre Scalable Unit is comprised of two Lustre OSS nodes and 120 6 Terabyte Enterprise 12G SAS disk drives employing OpenZFS with raidz2 data parity protection. Additional resiliency is provided by multipath and high-availability failover connectivity, eliminating single points of failure. The two 14 Petabyte file systems deployed by LANL use 5,020 6 Terabyte disk drives combined.
Aeon Computing's Lustre File System has the ability to handle a wide range of compute-driven storage and data I/O workloads, ranging from small jobs to jobs spanning many thousands of processor cores in parallel.
Aeon Computing, an HPC and Lustre file system storage vendor, has been awarded a contract by Los Alamos National Security LLC (LANL) to provide two Lustre file systems to enhance LANL's technical supercomputing capabilities in support of its national security mission. Each of the two Lustre file systems provides 14 Petabytes of data storage capacity and is capable of up to 160 Gigabytes per second of parallel access performance. These next-generation systems push the limits of Lustre storage performance.
The two 14 Petabyte Lustre file systems will serve the intense data IO workloads of both the facility-wide open research computing and the security-focused computing missions. Each file system is connected to the high-speed computing fabric, with 2.35 Terabits per second of fabric bandwidth using FDR14 Infiniband. The two Lustre file systems employ OpenZFS and high-availability for data integrity and redundancy. Each Lustre file system contains 40 Lustre OSS nodes, each capable of 4 Gigabytes per second of sustained data performance. The two Lustre file systems are powered by end-to-end enterprise-grade technology, including LSI/Avago 12G SAS (serial attached SCSI), Mellanox FDR14 Infiniband, HGST 12G Enterprise SAS disk drives, SanDisk 12G SAS SSDs, and Intel server technologies.
The file systems are integrated into site-wide monitoring infrastructure without the need for cumbersome or closed vendor APIs. "We were targeting an open solution that would utilize our Tri-Lab Operating System TOSS with Lustre, and provide a great performance to cost ratio", stated Kyle Lamb, Infrastructure Team Lead in the High Performance Computing Division at Los Alamos National Laboratory. "Utilizing commodity hardware and OpenZFS for RAID provides a cost-effective high performance solution with the added benefit of compression to increase available usable capacity. This allows us to provide the high density performance required for our existing clusters as well as our future Commodity Technology Systems."
Jeff Johnson, co-founder of Aeon Computing, stated: "We were able to architect a Lustre file system to meet LANL's needs that was affordable and employed open standards in hardware and software. We were able to deliver a solution that met and exceeded LANL's rigorous demands of multi-system HPC data IO and provide a system that was truly open."
The Aeon Lustre Scalable Unit is a 12U system containing 120 enterprise SAS disk drives and two Lustre OSS system nodes that are fully redundant and hot-swappable, including hot-swap OSS nodes. The Lustre Scalable Unit features 12G SAS storage technology and supports single and dual raid QDR, FDR and EDR Infiniband as well as Intel's Omnipath fabric and 10/40/100 Gigabit Ethernet. The Aeon Computing Lustre Scalable Unit is available for sale and can be used in a wide range of Lustre file system designs.