5 Apr 2016 Berkeley - Quincey Koziol, who has spent more than two decades helping to develop and refine the Hierarchical Data Format (HDF) I/O library used by thousands of organisations across government, academia and industry to manage large scientific data files, recently left the HDF Group to join NERSC as principal data architect in the Data and Analytics Services group.
As director of core software and high performance computing at the HDF Group, Quincey Koziol spent the last 11 years developing the HDF5 I/O middleware package and overseeing the group's HPC development efforts. In his new position, Quincey Koziol will help lead NERSC's data management efforts, including investigating object storage technologies and participating in defining the storage subsystem for the NERSC-9 system. He will also continue to provide technical leadership for the HDF5 project. In fact, he sees his migration to NERSC as a win-win for all involved.
"I've been working with Berkeley Lab for 10 years or so through my role at the HDF Group and have collaborated with a number of researchers here, including John Shalf, Prabhat and Surendra Byna", Quincey Koziol stated. "When this position opened up, it looked like a way to continue doing interesting things for data management and science. Coming to NERSC also gives me a larger platform and more access to thousands of users who are interested in high-performance I/O."
"This bodes well for the future of HDF5 as well", added Prabhat, who leads the Data and Analytics Services group at NERSC. "Quincey is one of the leading experts in the data management space, and his move here will help NERSC and Berkeley Lab create and execute new capabilities in this space", he stated. "And because we think HDF5 is such a critical technology in the scientific world, we want to foster closer working relationships with the HDF Group and the rest of the DOE Labs and HPC centres around this technology."
HDF was originally developed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, where Quincey Koziol worked for 15 years before leaving to co-found the non-profit HDF Group. It was at the NSCA that he first became involved with HDF. In 1987, NCSA set out to create an architecture-independent software library and file format, called Hierarchical Data Format, to address the need to move scientific data among the many different computing platforms in use at NCSA at that time. It was developed as an open source product and distributed free of charge under a University of Illinois license.
"HDF is really three things", stated Mike Folk, president and co-founder of the HDF Group, which was spun out of the University of Illinois to advance and support HDF technologies and ensure long-term access to HDF data. "It is a very, very flexible format for storing data; it is software that makes it possible to do I/O really fast and compress data effectively, which allows high end applications with lots of I/O to run very effectively in an HPC environment; and it is a data model that provides a way for people to describe any kind of data meaningfully. HDF5 files also include the metadata necessary for efficient data sharing, processing, visualization and archiving."
During the early 1990s, organisations in government, academia, and industry adopted HDF for an increasing number of applications demanding high performance and quality, and it went through a number of iterations and enhancements as a result. In the mid 1990s, however - with growing pressure on DOE labs to boost their computational capabilities - it became apparent that neither HDF nor any other format was not going to scale to the next generation of computers. With support from the DOE labs, HDF was re-architected to create a version that could more effectively scale up to the more powerful and massively parallel systems and the much larger data files they were expected to generate.
The result was HDF5, a completely new format and I/O library designed to organize, store, discover, access, analyze, share and preserve diverse, complex data in continuously evolving heterogeneous computing and storage environments. HDF5 supports all types of digital data, regardless of origin or size, from remote sensing data collected by satellites and computational results from nuclear testing models to high-resolution MRI brain scans.
"HDF5 would not exist were it not for Quincey's leadership over the past 18 years", Mike Folk stated, "and much of that success has been enabled by the contributions and support of our colleagues in the DOE labs. In this period Quincey has also built a strong group of engineers at The HDF Group who are ready to take on his mantle going forward, and of course to continue working closely with Quincey. As Principal Data Architect at NERSC, Quincey will be expanding his attention to a broader range of technologies, at the same time keeping ties with the HDF team that he has built and mentored so well."
Looking ahead, a number of joint projects between Berkeley Lab Computing Sciences and the HDF Group are already under way, Prabhat noted, such as the ExaHDF5 project and Proactive Data Containers, and several new collaborations are in the works.
"HDF5 has been and will be a critical technology for scientists and HPC centres in the exascale world and for experimental data for the DOE, and we are vested in its continued success", Prabhat stated. "We have signed a letter of collaboration between NERSC and the HDF Group that ensures Quincey will continue providing technical leadership and architectural guidance on HDF5, which bodes well for HDF5 and the HDF Group."