Many-core devices, with their many small, power-efficient processing units, provide massive threading and SIMD processing. The value of this unprecedented hardware parallelism is widely acknowledged by industry, but has been slowly adopted, partly due to the lack of an open standard. Nevertheless, some device-specific and general APIs, such as CUDA, OpenACC, and OpenCL provide solutions to users for porting their codes to devices.
OpenMP is a well-known open standard for shared memory multiprocessing. Recently, the OpenMP language committee has extended the standard to include support for heterogeneous, non-shared-memory computing. OpenMP extensions now provide the ability to run code on both the host and a device in a "work sharing" manner within a single program. The execution model starts on a host processor. Sections of code encapsulated by OpenMP target directives are launched for execution on a device, while optionally allowing the host to execute in parallel with the device. The host controls all the allocation of device memory, transfer of data, queuing target executions on a queue, and managing their completion.
Significantly, OpenMP now provides a single, parallel model for threading, worksharing, device targeting, teams, and SIMD execution. A single paradigm provides a portable platform for development and a highly composable platform for integrating heterogeneous executions within a single program.
The complete article can be downloaded at http://primeurmagazine.com/repository/PrimeurMagazine-AE-PR-08-15-11.pdf