RAJA is a software library of C++ abstractions targeting portable, parallel loop execution. These abstractions insulate the application from the back-end programming model details. Developers port their RAJA application to new back-ends by implementing template parameter execution policies, which are typically stored in header files. These execution policies include statements that express how loops should be executed and how indexes should map to the back-end indexes. This allows the kernel body to remain unchanged while porting to a new back-end. To ensure that the abstraction layer does not introduce overhead, the RAJA Performance Suite is used to assess the performance of loop-based HPC kernels implemented in both RAJA and the underlying back-end programming model.
Previous releases of SW4 were OpenMP implementations for multi-threaded CPU execution. Recent releases utilize RAJA with implemented execution policies using OpenMP and CUDA statements for targeting CPUs and NVIDIA GPUs respectively. The RAJA SYCL and OpenMP-Target backends will be available for execution on Aurora. The existing execution policies will be implemented for these back-ends.
The porting effort was initiated with the SW4lite proxy application, which provided a development vehicle for driving preparation while also allowing the developers to quickly identify issues for rapid resolution. The RAJA-SYCL back-end execution policies have been implemented in the SW4lite proxy application for early testing and experimentation.
Enabling the RAJA on Intel devices has been accomplished by utilizing oneAPI and several extensions in the DPC++ compiler. Intel's Unnamed kernel lambdas are critical for portability libraries to support general kernel execution. The Unified Shared Memory extension allows abstraction libraries to decouple loop execution from memory management. Intel's Extended Atomics and Global ID access have enabled support for the RAJA reduction object.
The developers have also made important use of many features of the SYCL programming model. Principal among these is the use SYCL nd_ranges to support fine-grained control over loop execution. The nd_ranges provide the flexibility required by a library to handle complex and simple loop executions. Through nd_ranges the RAJA-SYCL backend can launch simple one-dimensional SYCL kernels or complex three dimensional kernels with explicit work group sizes.