Tiago Cogumbreiro

O Irrepupável

Back to top

Sunday, January 03, 2010

OpenCL review

OpenCL is an open standard being created by the Khronos Group, pushed by Apple and AMD. This technology features data parallelism and task parallelism. It is supposed to target a varied range of devices, for example, there is an embedded profile for mobile devices. MacOS X 10.6 has support for OpenCL. There is support for Linux and Windows in x86 and x86-64 architectures via AMD.

OpenCL may be touted as being in relation to multicores as OpenGL is in relation GPUs. But in fact, OpenCL is a broader technology than OpenGL is, since it targets not only multicores but GPUs as well. Additionally, there is a tight coupling between both of these technologies, e.g. it is possible to share a buffer between an OpenCL operation and an OpenGL operation.

In OpenCL there is a concept of kernel that is akin to SIL places. A kernel consists of a C program whose entry point is a function. Kernels communicate via shared memory (buffers). A work-item is composed by one ore more kernels that are executed sequentially. Each kernel may be target of data parallelization. A work-item may be target of task parallelization. Work-items are executed on a device (e.g. a CPU, a GPU) that consists of one or more processing unit.

Technically kernels are C string that must be compiled every time the program loads. Notice that the OpenCL runtime may cache compilation.

The site HPC Wire published a very good insight from a person with experience with HPC (OpenMP in particular) “Compilers and More: OpenCL Promises and Potential” from which I would like to quote two paragraphs:

So, are OpenCL programs going to be performance portable or not? Sadly, not. [...] An optimized kernel for one device may or may not perform well on another, but is unlikely to be optimal for that second device.

The intent is apparently that OpenCL will support an ecosystem of tools, middleware and applications, not to be the portable parallel abstraction. Even though the kernels are not performance portable, even if you have to tune your kernels for each device, the ability to write your kernels for different devices in the same language is a great leap forward from where we are today.

References

0 comments: