Supercomputers live fast and retire young. That's because technology advances quickly and each generation of computer processors is significantly faster, and cheaper to operate, than the one before it. Expectations were high for the successor to Ranger, a system called Stampede built by TACC that proved to be 20 times more powerful than Ranger and only used half the electricity.
"We knew, as we were designing Stampede that we had to inherit a huge amount of workload from the systems that were going offline", stated Dan Stanzione, executive director of TACC and the principal investigator of the Stampede project. "And at the same time, you could see that architectural changes were coming, and we had to move the community forward as well. That was going to be a huge challenge", Dan Stanzione stated.
The challenge was and still is to match the breakneck speed of change in computer hardware and architectures. With Ranger, one fundamental architectural change was going to four, four-core processors on a computer node. "It was clear that this trend was going to continue", Dan Stanzione stated.
This trend toward "manycore" processors, as they are known, would force changes to the programming models that researchers use to develop application software for high-tech hardware. Since scientific software changes its structure much more slowly than hardware, sometimes over the course of years, it was critical to get researchers started down the road to manycore.
"We needed to take on this enormous responsibility of all of the old workload that was out there for all of the systems that were retiring, but at the same time start encouraging people to modernize and go towards what we thought systems were going to look like in the future", Dan Stanzione stated. "It was an exciting time."
Designing the Stampede supercomputer required foresight and awareness of the risks in planning a multi-million dollar computing project that would run seven years into the future. Stanzione and the team at TACC wrote the proposal in 2010 based on hardware - the Intel Xeon E5 (Sandy Bridge) processor and Intel Xeon Phi co-processor, as well as the Dell servers - that were being developed but didn't yet exist. TACC deployed Stampede on schedule in 2013 and consistently met and exceeded its proposed goals of providing to the open science community the computing power of 10 petaflops. An upgrade in 2016 added Knights Landing processors - a standalone processor released by Intel that year - and 1.5 petaflops to the system. What's more, TACC operated a world-class facility to support, educate, and train users in fully using Stampede.
"One of the things that I'm proud of is that we've been able to execute both on time and on budget. We delivered the exact system we had forecast", Dan Stanzione stated.
NSF awarded the University of Texas at Austin $51.5 million for TACC to deploy and support the Stampede supercomputer, which included a hardware upgrade in 2016. During its five years of operations, Stampede completed more than eight million successful computing jobs and clocked over three billion core hours of computation.
More than 11,000 researchers used Stampede directly on over 3,000 projects in the open science community. And tens of thousands of additional researchers accessed Stampede through scientific gateways such as the Galaxy Community Hub and DesignSafe.
Through its five-year life cycle Stampede remained the most powerful, comprehensive system allocated to users of the eXtreme Science and Engineering Discovery Environment (XSEDE). And for three and a half years, the TOP500 organisation ranked Stampede in the world's top 10 most powerful computing systems. It remained in the top 20 in 2017, its last year of operations.
Many thousands of people received computer science education as a result of the Stampede supercomputer project. More than a hundred live training events with webcasts were conducted at TACC and at conferences such as the Supercomputing Conference and the International Supercomputing Conference. TACC's Integrative Computational Education and Research Traineeship Research Experiences for Undergraduates (ICERT REU) programme, the Code @ TACC summer STEM high school camp, and the Verizon Innovative Learning Summer Entrepreneurship Experience together instructed hundreds of underserved students in hands-on coding. And 292.8 thousand unique users took advantage of the Stampede Virtual Workshop, a collaboration of the Cornell Center for Advanced Computing (CAC), the Arizona State University Fulton High Performance Computing Initiative (HPCI), and the Texas Advanced Computing Center. Clemson University, the Ohio State University, the University of Texas at El Paso, Indiana University, the University of Colorado at Boulder, and the Institute for Computational Engineering and Sciences - also at the University of Texas at Austin - were additional academic partners that that helped make the Stampede project possible.
Of the over 3,000 projects that relied on resources of the Stampede supercomputer, the discovery of gravitational waves by the NSF-funded Laser Interferometer Gravitational-Wave Observatory (LIGO) in 2015 is one of the most important. Gravitational waves are ripples in the fabric of space-time created by black hole collisions.
The waves briefly changed the distance from the Earth to the Sun by the size of one atom, an effect so tiny that only until recently they've eluded scientists efforts to extract their signal from the background noise. Like light and electricity, gravitational waves give scientists a new medium to study and understand the universe.
LIGO researchers worked with TACC experts to improve their software. Then the researchers used about seven million core hours on Stampede to help analyze the first detected gravitational waves. "The collaboration with TACC computing experts and the computing cycles provided by Stampede both supported the first direct detection of gravitational waves 100 years after Albert Einstein first predicted their existence in his theory of General Relativity", stated Stuart Anderson, a research manager for LIGO based at Caltech.
"Stampede was an excellent tool for improving our understanding of the universe we live in, from the smallest scale of sub-atomic particles to detecting gravitational waves that have traveled a million light-years to the earth, and a lot of exciting science and engineering in between", Stuart Anderson stated.
The collaboration between LIGO project technical staff and TACC technical staff boosted the LIGO code performance three-fold. "I think we underestimated how much performance you could squeeze out on modern processors by really taking a careful look at the code", stated Peter Couvares, staff scientist at LIGO.
"Stampede was our target platform. In fact, we did so much benchmarking on Stampede that the Stampede CPU remains what we call our Advanced LIGO reference CPU. TACC was responsible for helping us get that initial 'byte' out of the apple, but we just kept going. I don't know if it would have happened the way it did if we hadn't gotten started with TACC the way we did. It really helped start a culture and a project of optimization that's continuing to bear fruit", Peter Couvares stated.
Getting the track and intensity right for a hurricane could make the difference between life and death for those near the coast. Penn State University researcher Fuqing Zhang has produced some of the most accurately predictive hurricane intensity forecasts in the world. Fuqing Zhang used over 22 million core hours on Stampede to clarify the finer details of the miscellaneous forces inside hurricanes. The methods Fuqing Zhang developed in a 2015 study of 100 tropical storms between 2008-2012 reduced errors by 25 to 28 percent in forecast intensity for lead times of 2 to 4 days compared to the corresponding official forecasts issued by the National Hurricane Center.
The research enabled by Stampede, said Fuqing Zhang, is helping make the world a better place. "We developed techniques that enable better prediction and understanding of severe weather, including hurricanes and severe weather which certainly have profound significance to society. Much of the research outcome has been adopted or implemented by the research community, federal agencies, and operational weather prediction agencies that will eventually benefit the general public", Fuqing Zhang stated.
Strange things make our physical reality, according to University of California Santa Barbara physicist Robert Sugar. His group, the Multiple Instruction, Multiple Data Lattice Computation Group (MILC), has used over 26 million core hours on Stampede to probe the forces that hold together quarks, the building blocks of matter. "These studies can only be carried out by very large-scale numerical simulations", Robert Sugar stated.
Quantum chromodynamics (QCD), the fundamental theory of the strong force of nature, describes how quarks are bound together to form neutrons, protons, and other elementary particles. Ultimately, Robert Sugar and colleagues are trying to understand the standard model, our current set of theories of the fundamental forces of nature of which QCD is one component. Among their goals are to determine some of the fundamental parameters of the standard model; and to search for physical phenomena which will require a theory beyond the standard model for their explanation. They're doing this by combining experiments, such as those at CERN's Large Hadron Collider, with computation to find more accurate numerical values of fundamental constants in the standard model.
"We have, using Stampede and other computers, calculated some of these quantities to a fraction of a percent - the most accurate calculations that have been done to date. But we continue to probe. We haven't calculated all of the things that we want yet", Robert Sugar stated.
"The NSF investment in machines like Stampede is tremendously important to science", Robert Sugar stated. "And of the NSF machines, if you put aside Blue Waters, Stampede since it's come online has always been the most powerful machine in the NSF's XSEDE programme."
The Stampede supercomputer helped scientists understand and predict tornados - whether or not they form and where they will go, according to the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma (OU). "The Stampede supercomputer is our most critical tool", stated CAPS director Ming Xue, a professor of meteorology at OU. Ming Xue's group has used 41 million core hours on Stampede to simulate tornados.
"The majority of our research has been done on Stampede ever since it came into existence", Ming Xue stated. Understanding tornados requires very high resolution tornado simulations, with grid spacing on the order of 50 meters when studying an area 100 by 100 kilometers square. "These runs can only be done on supercomputers like Stampede", Ming Xue stated.
Ming Xue's research resulted in significant progress toward understanding a tornado's formation dynamics, or 'tornadogenesis'. "We have published a series of papers understanding the tornado dynamics and processes in leading journals", stated Ming Xue. Stampede simulations allowed Ming Xue to discover the role in its formation of a tornado's horizontal rotation, or vorticity. "We discovered a new, very important process for producing vorticity, a new source of vorticity that contributes to the spin-up of tornados", Ming Xue stated.
What's more, CAPS collaborates with the National Oceanic and Atmospheric Administration (NOAA) using Stampede to produce real-time forecasts of severe storms that include natural hazards like strong wind, hail, and tornados. CAPS works with the National Storm Prediction Center and the National Severe Storms Laboratory to produce the most advanced real-time ensemble weather forecasts for possible implementation by the National Weather Service.
"Stampede was a national supercomputing resource that enabled a lot of cutting-edge research and science advances. We just hope that there will be even bigger and more powerful resources to be available, such as Stampede2. In terms of improving weather forecasting, that will certainly benefit the entire society", Ming Xue stated.
DNA strands use electrostatic attraction or repulsion to fold together or come apart. This property enables cells to store genetic information, replicate and repair that information, and regulate how that information is expressed. Computational physicist Aleksei Aksimentiev of the University of Illinois at Urbana−Champaign used 24 million core hours on Stampede to elucidate the mysteries of DNA-DNA interactions.
"When Stampede first came online, it was a game changer. Simulations that used to take more than a month could be done in less than a week. It was much more than just speeding up the calculations: for the first time, the simulation research and development cycle matched that of experimental biophysics, allowing for iterative approach to the solving the same scientific problem", Aleksei Aksimentiev stated.
"Among many successful projects on Stampede, perhaps the most rewarding are the most difficult ones", Aleksei Aksimentiev said. He explained that one of the most puzzling biophysical processes is the assembly of DNA - a highly charged polymer - into compact structures at the cells' nucleus, which seems to defy the very basic law of electrostatics: repulsion of same-sign charges.
"Using Stampede, we set out to figure out the molecular mechanism of the same sign charge attraction just to find out that our modeling approach was not accurate enough to account to existing experimental data. After a painful and lengthy recalibration effort we arrived with the most advanced molecular force field to describe DNA-DNA interactions, which allows us to accurately evaluate forces within compact DNA structures, elucidating the role of salt ions in turning the repulsive DNA-DNA interactions into attractive ones", Aleksei Aksimentiev stated.
The story did not end there. "Our improved methodology of molecular simulations led us to a discovery of DNA-DNA attraction governed by a pattern of epigenetic markers that prescribes which parts of the DNA code is translated into proteins and which part is not and, ultimately, define the cell type. The same methodology is now being used to design of DNA nanostructures for delivering drugs into cancerous cells and building synthetic mimics of biological molecular motors", Aleksei Aksimentiev stated.
The universe formed the first stars relatively shortly after the Big Bang. Stampede simulations helped University of Texas at Austin astronomer Volker Bromm find answers to how the first stars formed and how they shaped the transition to the stars we see today. "Stampede has enabled my group to simulate how the first stars and galaxies transformed the early universe from its simple initial state into one of ever-increasing complexity", Volker Bromm stated.
Volker Bromm used Stampede to connect the theory of the initial conditions created by the Big Bang with observations from the most powerful telescopes such as the Hubble Space Telescope. "Stampede has allowed us to evolve the equations describing the universe forward to the point where we can directly probe its state with observations. Specific examples are our simulations of how the first supernova explosions occurred, and how supermassive black holes were first created. Our predictions will soon be tested with NASA's James Webb Space Telescope (JWST), to be launched at the end of 2018", Volker Bromm stated.
"The Stampede supercomputer has been a true discovery machine. For astronomy, it has served as an ideal partner to our most powerful telescopes. Whereas the latter allow us to look backward in time, by detecting light that was emitted in the distant past, Stampede has acted as a 'forward time machine', allowing astrophysicists to evolve the laws of nature onward in time. This confluence of time machines, supercomputers and telescopes, gives us the power to reconstruct and understand the entire history of the universe", Volker Bromm stated.
Researchers used Stampede as a computational microscope. Biochemist Michael Feig of Michigan State University used 28 million core hours on Stampede to simulate complex interactions inside of cells and help make new discoveries.
"We built a model of a bacterial cytoplasm for the first time and discovered how proteins may behave differently in such environments", Michael Feig stated. "For example, in (non-cell) experiments, proteins are typically very stable. But in the very dense crowded cellular environment, there are many opportunities for all kinds of interactions that can lead to destabilization. There is also new insight into how proteins and small molecules (metabolites, drugs) move much slower in the cell, again due to the many 'distractions'", Michael Feig stated.
Michael Feig also used Stampede to make a discovery about RNA polymerase and its role in transcription, where the enzyme copies DNA to make RNA and eventually proteins. "Using Stampede simulations, we were able to explain in detail how that enzyme can make RNA copies from DNA during transcription with very little errors but relatively high speed", Michael Feig stated. "Stampede is a great resource that is well-managed and provides reliable, very-high end supercomputing resources without which we could not have done our research during the last years."
Modelling the physics of slow moving fluids has many real-world applications such as in geophysical flows from mantle convection, oil flow through rocks, and blood flow in capillaries. Highly-efficient simulation tools have been created using the Stampede supercomputer for these 'Stokes flows' by professor George Biros of the Institute for Computational Engineering and Sciences at the University of Texas at Austin. George Biros is a two-time winner of the Association for Computing Machinery's Gordon Bell Prize in computation.
"We're interested in problems that are not tractable with commodity computing technologies. We're interested in problems in geophysics, in health care, in medical imaging, in data analysis, and in computational science that require technologies that are provided by TACC, like Stampede and Maverick, basically high performance computing, because the turnaround times for using standard services are not good enough for us."
George Biros was one of the first Stampede users to take advantage of its Xeon Phi co-processors.
"TACC helped us succeed by providing the best libraries to link to, the best ways to compile and the best way to optimize our code by having TACC computational scientists work with us, especially because we were one of the first users of the Intel Xeon Phi technology. We had active advice in which TACC staff helped us with code performance optimization on the new Knights Landing Phi processor. The accessibility to the resources and the very quick turn-around time to all our requests for help was essential to achieving these technological breakthroughs", George Biros stated.
Whole Mantle Convection
Scientists took a solid step towards better understanding the dynamics in Earth's deep interior, a main driver for earthquakes and volcanic eruptions. The research was conducted on Stampede by the science team of Omar Ghattas of the Institute for Computational Engineering and Sciences (ICES) at the University of Texas at Austin. Omar Ghattas co-authored a study using Stampede that won the 2015 Gordon Bell Prize. It modelled the flow of rock thousands of kilometers deep in the mantle of the whole Earth.
The Stampede supercomputer at TACC supported the science computation of this research, including development of the algorithms and numeric solvers as well as the visualizations that show the science results. The team scaled their work up to 1.6 million cores on the IBM Sequoia supercomputer at Lawrence Livermore National Laboratory.
Study lead author Johann Rudi of ICES developed and tested the solvers and performed the science runs of the simulations on Stampede. "Mainly, my research was done on the Stampede supercomputer at TACC. One part of the work is developing algorithms. That was wholly supported by TACC machines. And also, the help that I got from TACC experts was very valuable to me", Johann Rudi stated.
Designing the Future of Supercomputing
Stampede ushered in the era of the Many Integrated Core, or MIC processors. While Intel's general purpose Sandy Bridge processor powered the base system, the main innovative component of Stampede came from its adoption of Intel's Xeon Phi processors, a harbinger of where chip design was headed. "It was a preview of what future architectures were going to look like five and ten years out", Dan Stanzione stated.
The Xeon Sandy Bridge moved the PCI Express controller directly onto the Sandy Bridge chip, which improved its ability to handle data-driven science problems, according to Bill Barth, director of High Performance Computing at TACC. "It enabled there to be extra bandwidth between the chips, because it freed up some other resources", Bill Barth stated. "Users saw an improvement if they had to move data back and forth between the memories that are associated with each of the two chips that are on the Stampede node, which is increasingly the case", added Bill Barth.
The heterogeneous architecture of the Sandy Bridge processor and Xeon Phi co-processor deployed on Stampede at the scale of 6,400 nodes was something new, which changed how some scientists coded for supercomputers. "Stampede gave people the first large-scale opportunity where they had to have a lot of vectorization and some sort of Message Passing Interface (MPI) plus something", Bill Barth stated. "The community is moving to some sort of threading model like OpenMP for the 'plus something' part of MPI. And if they had the personnel resources to start on that kind of work, Stampede gave them that opportunity."
The changes users made to their codes so they would run well on the Xeon Phi helped prepare them for the technologies emerging today. "The combination of MPI and OpenMP has really become the dominant programming model", Dan Stanzione stated. "We feel like we went with the right programming model to get people moving in the right direction."
In late 2016, Stampede added 508 new nodes with the second generation Intel Xeon Phi processor, called Knights Landing (KNL), which use the MIC as the primary processor instead of on a separate card like on the first generation Knights Corner.
"It's an evolution with a lot of things we've learned from Knights Corner to improve performance", Dan Stanzione stated. "We think - especially with our scalable large runs on Stampede with the big community-used codes - an enormous amount of the workload is going to run on the Knights Landing as its primary and only processor at this point. It's turned out not only to be a fast way to do things, but a very cost-effective way."
The Knights Landing upgrade to Stampede provided a bridge to its replacement, Stampede2, built and deployed by TACC this year with support from a $30 million award from NSF. When complete in the Fall, Stampede2 will provide the open science community with an 18 petaflop KNL-based computing system, roughly twice as powerful as the original Stampede, using only half as much power.
"The one thing I'd want people to remember about the Stampede supercomputer is that it brought a lot of new technologies to the fore and was a remarkable technical wonder in and of itself", Dan Stanzione stated. "The reason we build and deploy these systems is for them to be instruments of science. If you don't deliver the science for people doing research in advancing technology and advancing competitiveness, and making people's lives better, then you haven't delivered. You've just built a really big, interesting toy. I think the most important thing about Stampede was not that it was big and cutting-edge, but that it delivered for a huge number of people over its lifespan and kept the nation moving forward in innovation."