"Before accessing Mira, TermoFluids had been used on production simulations up to around 5,000 CPU cores and scalability tests up to 10,000 CPU cores", Ricard Borrell, the research manager behind Termo Fluids, commented. "On Mira we have increased this figure by an order of magnitude and have now run the code up to 131,072 CPU cores. Not only did this include the most time-consuming part of the simulations, i.e. the time-integration, but other aspects that can become critical overheads such as the pre-processing, the simulation set-up, and IO operations for check pointing as well."
Through Mira, the code could be run on much larger problems up to billions of unknowns. This required some changes, however, on the type of some integer variables in order to avoid those that fell out of range. When achieving this order of magnitude leap in the size of the problem and number of parallel processors being used, Richard Borrell encountered new problems in the code that only appeared at this larger scale. Given that the issues couldn't be reproduced on a smaller scale in order to find the bugs, the team turned to Allinea DDT.
"Debuggers are essential tools for our users as they scale their application on Mira, and there have been several instances where users have leaned on a debugger to find issues as they have scaled on the system", Kalyan Kumaran, Manager, Performance Engineering at ALCF, explained. "Allinea scaled their debugger to perform well on leadership class systems like ours. This helped us to choose this tool as we were looking for a debugger that would scale to the entire Mira system." He added that as most ALCF users access the systems remotely, a remote connection client such as Allinea's is important for ease of use.
"High-performance computing resources like Mira give developers like us the power to go further", added Richard Borrell. "Extracting that full performance is critical if you want to handle more complex problems, finer resolutions and achieve new frontiers - and you'll need Allinea DDT to do it."
More information on how this was achieved is available in the case study .