One important OpenCL app for me is OpenCL-Z since it provides information of avaiable implementations in a nice GUI.. similar to CUDA-Z or GPU-Z.. CAL-z anybody?
As a consequence of some binary compatibility instability it's very hard for supplying one binary that is working future proof (see previous post)..
I have solved this by providing a OpenCL builtin wrapper that wraps the functions with the two calling conventions defining for every function two function types. This is possible as OpenCL-z uses dl_open HandleLibrary for getting to the pointers bypassing the static .lib library stuff..
Once the mess is solved I will use an array of pointers for every function..
Actually OpenCL-z finding of libraries is hardcoded to find only AMD and Nvidia implementations (but anyway Nvidia is searched in the standard place also where eventually is going to be put the OpenCL ICD for Windows) :
1) Nvidia implementation:
a)$WINDIR/system32/opencl.dll (wich works on Win x64 for both x32 and x64 with 190.89 Dll's)
b)/usr/lib/opencl.so (wich works for both Linux x64 and x32 with 190.89 .so)
2) AMD implementation:
a)$ATISTREAMSDKROOT/lib/{x86,x86_64}/opencl.dll
b)$ATISTREAMSDKROOT/lib/{x86,x86_64}/opencl.so
I will also add support for adding locations of implementations in a text file..
I have been able to achieve that before the full wrapper since OpenCL-Z only uses few functions 5 or so.. . Well for checking device binaries support I need, say 10 more.. but less than the 6x or more need by OpenCL..
Also I have had to fix support for more than one platform and also for platforms supporting more than 1 device.. and realtime changing of device information..
Currently I have one bug namely that platforms for which no device is returned crash the initialization (I will try to fix before I post..).. : this can be when for example in a Win7 machine with ATI and Nvidia cards both running I disable one card (for example disabling the scree attached to it)
Also I have included logos for AMD Stream platform, and Intel platform..
Well seems I have to add Apple and S3 implementations to the mix.. and well not also an IBM one for the Cell..
I have added few missing key feature checks (as they allow to currently differentiate the two currently implementations) : OpenGL interop, image support, and if OpenCL imp can get device binaries and build programs from that binaries..
I have got it working on Linux with minor tweaks, thanks WxWidgets library, to the code and Code::Blocks project file.
Future work is to move it to use my OpenCL wrapper, port to Snowleo and add as CUDA-Z two key performance metrics:
* Peak Gflops for integer, SP and DP floating point ( using MAD kernels (for int,int24,float and doubles))
* Device bandwith, Device to Host and Host to Device bandwith (with both direct and mapped access and paged and pinned memory)..
Some other key performance metrics worth investigating for adding to the program are:
* Atomics performance in both local &global mem (now that Fermi is coming) and with use in Append Consume buffers..
* Cache test. Minikernels for studying how potential it has (using for example SPMV kernels..)
* OpenGL/OpenCL interop
* Test HD video GPU decoding->OpenGL->OpenCL->OpenGL using say constant spread filters, etc..
* Simulatenous kernel and device transfers (is supported in CUDA and seems that also in CAL and Brook+ but in OpenCL, anybody?)
* Simultaneous device to host and host to device transfers (for checking using by the Fermi dual DMA engines and also 5xxx has it?)
* Simultaneous kernels execution (also for checking Fermi and 5xxx implementations using saying two kernels using both half of the compute units and with equal load (but anyway different kernels))..
and other metrics for OpenCL extensions as global data share, SAD.. info about maximum threads in flight
Also checking mem as memtest for device memory porting existing cuda programs or estimating bit error failure rate?
Also demos as GPU Caps Viewer, etc..
Both the three
Screenshoots: (linux and windows showing 3 devices..)
Friday, 23 October 2009
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment