This is the largest number of CPUs ever used concurrently over this duration - about 54 hours - for a single high-energy physics experiment. This unprecedented amount of computing enabled scientists to carry out some of the most complicated techniques used in neutrino physics, allowing them to dig deeper into the seldom seen interactions of neutrinos. This Cori allocation was more than 400 times the amount of Fermilab computing allocated to the NOvA experiment and 50 times the total computing capacity at Fermilab allocated for all of its rare-physics experiments. A continuation of the analysis was performed on NERSCs Cori and Edison supercomputers one week later. In total, nearly 35 million core-hours were consumed by NOvA in the 54-hour period.
The special thing about NERSC is that it enabled NOvA to do the science at a new level of precision, a much finer resolution with greater statistical accuracy within a finite amount of time, said Andrew Norman, NOvA physicist at Fermilab. It facilitated doing analysis of real data coming off the detector at a rate 50 times faster than that achieved in the past. The first round of analysis was done within 16 hours. Experimenters were able to see what was coming out of the data, and in less than six hours everyone was looking at it. Without these types of resources, we, as a collaboration, could not have turned around results as quickly and understood what we were seeing.
The experiment presented the latest finding from the recently collected data at the Neutrino 2018 conference in Germany on June 4.
The speed with which NERSC allowed our analysis team to run sophisticated and intense calculations needed to produce our final results has been a game-changer, said Fermilab scientist Peter Shanahan, NOvA co-spokesperson. It accelerated our time-to-results on the last step in our analysis from weeks to days, and that has already had a huge impact on what we were able to show at Neutrino 2018.
In addition to the state-of-the-art NERSC facility, NOvA relied on work done within the SciDAC HEP Data Analytics on HPC (high-performance computers) project and the Fermilab HEPCloud facility. Both efforts are led by Fermilab scientific computing staff, and both worked together with researchers at NERSC to be able to support NOvAs antineutrino oscillation evidence.
The current standard practice for Fermilab experimenters is to perform similar analyses using less complex calculations through a combination of both traditional high-throughput computing and the distributed computing provided by Open Science Grid, a national partnership between laboratories and universities for data-intensive research. These are substantial resources, but they use a different model: Both use a large amount of computing resources over a long period of time. For example, some resources are offered only at a low priority, so their use may be preempted by higher-priority demands. But for complex, time-sensitive analyses such as NOvAs, researchers need the faster processing enabled by modern, high-performance computing techniques.
SciDAC-4 is a DOE Office of Science program that funds collaboration between experts in mathematics, physics and computer science to solve difficult problems. The HEP on HPC project was funded specifically to explore computational analysis techniques for doing large-scale data analysis on DOE-owned supercomputers. Running the NOvA analysis at NERSC, the mission supercomputing facility for the DOE Office of Science, was a task perfectly suited for this project. Fermilabs Jim Kowalkowski is the principal investigator for HEP on HPC, which also has collaborators from DOEs Argonne National Laboratory, Berkeley Lab, University of Cincinnati and Colorado State University.
"This analysis forms a kind of baseline. Were just ramping up, just starting to exploit the other capabilities of NERSC at an unprecedented scale," Kowalkowski said.
The project's goal for its first year is to take compute-heavy analysis jobs like NOvAs and enable it on supercomputers. That means not just running the analysis, but also changing how calculations are done and learning how to revamp the tools that manipulate the data, all in an effort to improve techniques used for doing these analyses and to leverage the full computational power and unique capabilities of modern high-performance computing facilities. In addition, the project seeks to consume all computing cores at once to shorten that timeline.
The Fermilab HEPCloud facility provides cost-effective access to compute resources by optimizing usage across all available types and elastically expanding the resource pool on short notice by, for example, renting temporary resources on commercial clouds or using high-performance computers. HEPCloud enables NOvA and physicists from other experiments to use these compute resources in a transparent way.
For this analysis, "NOvA experimenters didn't have to change much in terms of business as usual," said Burt Holzman, HEPCloud principal investigator. "With HEPCloud, we simply expanded our local on-site-at-Fermilab facilities to include Cori and Edison at NERSC."