A second question which comes to mind is why one should be needing exascale to reach the emission goals in designing new combustion technologies.
According to Jacqueline Chen, the current design methodologies are largely phenmenological. There is a significant increase in computational capability that will dramatically reduce the design cycle for new combustion technologies and new fuels. The co-design centre is focusing on drirect numerical simulation methodologies. This is the scientific base for novel fuels at realistic pressures.
The goal of combustion exascale co-design is to consider all aspects of the combution simulation process from formulation and basic algorithms to programming environments to harware charateristics needed to enable combustion simulations on exascale architectures.
Jacqueline Chen told the audience that combustion is a surrogate for a much broader range of multiphysics computational science areas. The ExaCT partners will interact with the applied mathematics community on mathematical issues related to exascale. The petascale codes provide a starting point for the co-design process.
The mathematical tools used for co-design are, among others, S3D for compressible formulation and Low Mach Nubmer (LMC), a model that exploits separation of scales between acoustic cave speed and fluid motion. Next to this, other tools are required including second-order projection formulation, detailed kinetics and transport, and block-structure adaptive mesh refinement.
The expectation is that exascale will require a new code base, stated Jacqueline Chen, based on high-fidelity physics. The new code will support both compressible and low mach number formulations and provide support for embedded UQ and in situ analytics.
For instance, there has to be a relevant turbulence, pressure, and temperature. The domain and grid size involves 5 cm3 and 5 micron grid. The number of time steps is 6ms/5 ns timesteps, including 1.2e6 steps.
Jacqueline Chen warned that the petaflop work flow model will not scale. Performkng the simulation is not enough, the researchers need to analyze the results. The I/O bandwidth constraint makes it infeasible to save all the raw simulation data to persistent storage. Thus, the researchers will have to integrate the simulation and analysis.
Proxy machines are being used, involving a proxy applications solver for uniform grid compressible flow proxies and Low Mach Number and AMR tools.
For the co-design methodology, measurement alone is not sufficient. The researchers require an analytic performance model to validate the performance with hardware simulators and measurements and to confirm key productions.
The performance modelling tool chain consists in automatically predicting the performance for many input codes and software optimazations in order to predict the performance.
Byfl implemented as a language and architecure independent middle-stage compire pass providing answers to some initial question from the vendors: what is memory bandwidth per flop?
Jacqueline Chen confronted the audience with the following co-design questions: what is the instruction mix for the computational trhougput? How many registers are needed to capture scalar variables to avoid cache spills? As for memory bandwidth, how sensitive is the application to MB?
Even though transcendentals and division ops might be low in count, they can dominate the CPU time, warned Jacqueline Chen. "Neither software optimizations alone nor hardware optimizations will get us to the exascale, we have to apply both", insisted the speaker.
The previous analysis assumes an ideal network behaviour. The researchers have to use SST macro to model the contention.
The domain-specific language is a language of reduced expressiveness targeted at developers in a specific focused problem domain.
The researchers are exposing programming locality and independence and expressing parallelism in S3D.
The programming is based on logical. The tasks are coded in a familiar sequential style. Legion runtime uses region information to automatically extract parallelism and map tasks using the same data together to benefit from locality, according to Jacqueline Chen.
The widening gap between compute power and available I/O rates will make it infeasible to save all the necessary data for post-processing, Jacqueline Chen feared.