The art of simulation in the life sciences involves a large spectrum, starting from the atom over the small molecule to the macromolecule and further beyond to the cell, tissue, and organ, explained Rossen Apostolov. The time scales of molecular simulations are increasing accordingly.
The ScalaLife project is a joint EU-FP7 funded project between HPC centres and institutes in Germany, Sweden, Spain and the United Kingdom. The consortium is dealing with hierarchical parallellization, as the speaker explained.
To this end hybrid architectures and stream computing are becoming the norm in order to scale the code. This includes hybrid-MPI/OpenMP, hybrid-GPU/CPU, and accelerators with FPGA.
The researchers are scaling the problem with Ensemble computing, Rossen Apostolov told the audience. He mentioned GROMACS, a project which started in 1995 in Groningen, the Netherlands but most of the core developers are now in Sweden. The GROMACS team developed a highly tuned code for molecular dynamics.
Hybrid-MPI/OpenMP is being used in GROMACS, the speaker showed. Using the Ewald summation,
separate electrostatics are moving into long and short range parts. The short range decays fast while the
long range is solved in receprocal space, just as for crystals. This corresponds to the sum over the infinite periodic copies, Rossen Apostolov explained.
PME requires all-2-all communications, but PP does not, the speaker continued. Typically 1/4th of the CPUs do all-to-all PME, the rest are particle-particle interactions.
Hybrid-CPU/GPU architecture consists of MPI between domains, involving hybrid parallelization and GPUs. There is a good basic scaling, the speaker explained. FPGAs with GROMACS are a successful formula with regard to performance, time to market, prototyping cost, reliability of execution, and long-term maintenance.
Rossen Apostolov showed some rough initial results: 300 M/s using 4 core CPU with hyperthreading; 960 M/s using 4 core CPU with hyperthreading using SSE; and 3125 M/s using a C2075 GPU.
The DALTON project started in 1982. In this project, the master/worker scheme is turned into a multiple master/team of worker scheme, Rossen Apostolov explained.
For scaling the problem, Ensemble computing is on the rise. For example, fault tolerant with checkpointing and heartbeat monitoring is applied in the Copernicus project.
With regard to the performance, the most important metric is the real time to result. This is tunable since more cores per simulation give the results sooner, as the speaker showed.
The DISCRETE package has been developed at BSC/IRB in Barcelona. It implements the Discrete MD method in which particles move until conclusion.
Rossen Apostolov concluded by describing the future needs in simulation. The researchers need smart job dispatchers and auto-tuning mechanisms. There is a variety of software and hardware nowadays, so dispatchers and auto-tuners are very much in demand.