GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 23 October 2009

Improved OpenCL-Z!

Posted on 14:15 by Unknown
One important OpenCL app for me is OpenCL-Z since it provides information of avaiable implementations in a nice GUI.. similar to CUDA-Z or GPU-Z.. CAL-z anybody?

As a consequence of some binary compatibility instability it's very hard for supplying one binary that is working future proof (see previous post)..

I have solved this by providing a OpenCL builtin wrapper that wraps the functions with the two calling conventions defining for every function two function types. This is possible as OpenCL-z uses dl_open HandleLibrary for getting to the pointers bypassing the static .lib library stuff..
Once the mess is solved I will use an array of pointers for every function..

Actually OpenCL-z finding of libraries is hardcoded to find only AMD and Nvidia implementations (but anyway Nvidia is searched in the standard place also where eventually is going to be put the OpenCL ICD for Windows) :

1) Nvidia implementation:
a)$WINDIR/system32/opencl.dll (wich works on Win x64 for both x32 and x64 with 190.89 Dll's)
b)/usr/lib/opencl.so (wich works for both Linux x64 and x32 with 190.89 .so)
2) AMD implementation:
a)$ATISTREAMSDKROOT/lib/{x86,x86_64}/opencl.dll
b)$ATISTREAMSDKROOT/lib/{x86,x86_64}/opencl.so

I will also add support for adding locations of implementations in a text file..

I have been able to achieve that before the full wrapper since OpenCL-Z only uses few functions 5 or so.. . Well for checking device binaries support I need, say 10 more.. but less than the 6x or more need by OpenCL..

Also I have had to fix support for more than one platform and also for platforms supporting more than 1 device.. and realtime changing of device information..
Currently I have one bug namely that platforms for which no device is returned crash the initialization (I will try to fix before I post..).. : this can be when for example in a Win7 machine with ATI and Nvidia cards both running I disable one card (for example disabling the scree attached to it)

Also I have included logos for AMD Stream platform, and Intel platform..
Well seems I have to add Apple and S3 implementations to the mix.. and well not also an IBM one for the Cell..

I have added few missing key feature checks (as they allow to currently differentiate the two currently implementations) : OpenGL interop, image support, and if OpenCL imp can get device binaries and build programs from that binaries..

I have got it working on Linux with minor tweaks, thanks WxWidgets library, to the code and Code::Blocks project file.

Future work is to move it to use my OpenCL wrapper, port to Snowleo and add as CUDA-Z two key performance metrics:

* Peak Gflops for integer, SP and DP floating point ( using MAD kernels (for int,int24,float and doubles))
* Device bandwith, Device to Host and Host to Device bandwith (with both direct and mapped access and paged and pinned memory)..

Some other key performance metrics worth investigating for adding to the program are:

* Atomics performance in both local &global mem (now that Fermi is coming) and with use in Append Consume buffers..
* Cache test. Minikernels for studying how potential it has (using for example SPMV kernels..)
* OpenGL/OpenCL interop
* Test HD video GPU decoding->OpenGL->OpenCL->OpenGL using say constant spread filters, etc..
* Simulatenous kernel and device transfers (is supported in CUDA and seems that also in CAL and Brook+ but in OpenCL, anybody?)
* Simultaneous device to host and host to device transfers (for checking using by the Fermi dual DMA engines and also 5xxx has it?)
* Simultaneous kernels execution (also for checking Fermi and 5xxx implementations using saying two kernels using both half of the compute units and with equal load (but anyway different kernels))..

and other metrics for OpenCL extensions as global data share, SAD.. info about maximum threads in flight
Also checking mem as memtest for device memory porting existing cuda programs or estimating bit error failure rate?
Also demos as GPU Caps Viewer, etc..

Both the three

Screenshoots: (linux and windows showing 3 devices..)
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ►  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ▼  2009 (125)
    • ►  December (51)
    • ►  November (53)
    • ▼  October (21)
      • IBM OpenCL support!
      • Whises for OpenCL 1.1 and more!
      • 3D Vision and Direct3D 11
      • H264 harware decoding/ encoding GPUs
      • Interop GPU computing graphics apis stuff
      • 3D vision good stuff
      • Getting PTX, AMD_IL from languages:
      • Updated CUBLAS before CUDA 3.0
      • ATI and Nvidia extensions for DX11 and 10.1 ARBs
      • OpenGL 5870 extensions
      • Nvidia 195
      • News from the web!
      • Mem export in OpenCL
      • Double precision support in GPU computing APIs and...
      • Support 3d image write on CUDA and with OpenCL wra...
      • About OpenCL OpenGL interop..
      • Improved OpenCL-Z!
      • About binary compatiblity on OpenCL..
      • AMD IL backend for LLVM and getting AMD IL in MacOS?
      • Building OpenCL kernels from AMD IL code or device...
      • A CAL wrapper for getting AMD IL from OpenCL AMD G...
Powered by Blogger.

About Me

Unknown
View my complete profile