GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 25 February 2010

3 new tools!

Posted on 07:39 by Unknown
3 New GPU tools!

Swan: A simple tool for porting CUDA kernels to OpenCL

What is it?

Swan is a small tool that aids the reversible conversion of existing CUDA codebases to OpenCL. It does several useful things:
  • Translates CUDA kernel source-code to OpenCL.
  • Provides a common API that abstracts both CUDA and OpenCL runtimes.
  • Preserves the convenience of the CUDA <<< grid, block >>> kernel launch syntax by generating C source-code for kernel entry-point functions.

Why might you want it?

Possible uses include:
  • Evaluating OpenCL performance of an existing CUDA code.
  • Maintaining a dual-target OpenCL and CUDA code.
  • Reducing dependence on NVCC when compiling host code.
  • Support multiple CUDA compute capabilities in a single binary

Limitations

It's not a drop-in replacement for nvcc. Host code needs to have all kernel invocations and CUDA API calls re-written.
Swan does not support a few things. In particular:
  • CUDA C++ templating in kernel code.
  • OpenCL Images/Samplers (analogous to Textures).
  • Multiple device management in a single process.
  • Compiling kernels for the CPU.
  • CUDA device-emulation mode.
Furthermore, it's a work in progress. It works for our code but no promises it will for yours

Cloo 0.6.2
A new version of Cloo is out.
It introduces a tracking mechanism for kernel arguments (sampler or memory objects) which prevents them from being claimed by the GC in case the user application doesn't refer to them in later code. This behaviour has been backported to the existing Set*Argument methods since it is safer. You can override auto-tracking using the newly added overloads.
A critical bug affecting image read operations together with some other minor glitches were fixed.
As for breaking changes rename any ComputeImage.PixelSize to ElementSize and you're good to go.

Clootils have been improved, too. Now, you can take advantage of some bells and whistles which control the program building behavior.

CAL++ v. 0.8 release
anouncement
C++ to IL generator/compiler with C++ bindings for CAL
http://sourceforge.net/projects/calpp/


The CAL++ library has been just released. Project homepage is located here http://sourceforge.net/projects/calpp/ .

The project consist of two main components. One is C++ binding for CAL ( it's really much easier to develop new CAL applications using bindings ) and second is C++ to IL generator/compiler.

The C++ generator/compiler has syntax very similar to OpenCL ( with few necessary exceptions ). Also it supports all devices which can run CAL kernels ( finally OpenCL like language for 3xxx ).

It has some advantages over OpenCL compiler. To name few

- it's much closer to CAL - it allows to write code which is almost as good ( or as good ) as handwritten IL. Look at the matrix multiplication example - it has almost the same ISA as prunedtree original code ( it differs only where I've added some changes ).

- Advantage of using C++. I really wouldn't like to use double-double ( or quad float ) technique without C++.

- Powerfull control over loop unroling and code selection ( at IL compilation time ). The C++ language acts like preprocesor.

- It has LDS support for 4xxx, doubles, etc. And if something is missing it can be added really easy.

But as always there are some pitfalls to this approach

- it isn't OpenCL . Having standard is always usefull.

- Only partial support for structs ( it can be much improved but never as good as OpenCL ).

- CAL++ is much closer to IL and some more knowledge about IL is required to achive full potential ( hmmm I think this is also the case with OpenCL ).

- optimization is only performed by CAL IL compiler ( which isn't that good ).

With the library there are some examples included. I think the fastest matrix multiplication might be a small gem here .

I hope that CAL++ will be usefull to someone .

Doesn't compile under Windows MSVC 2008!
Use 0.8a for GCC 4.4!
QA:
1. Have you tested on Windows?

No. But with the exception to C++ compiler problems it should work ( there is nothing platform specific in the code ).

2. Also have you added 24 bit integer instructions? they are useful for getting thread id fast for example..

CAL++ is converting code to IL. So 24 operations need to be available in CAL IL. And unfortunatelly it isn't the case.

I'm thinking how hard is to add also GDS?

Using anything that isn't available in IL is really hard ( or close to impossible ).

When CAL supported ISA assembler compilation ( 3xxx family ) you could generate ISA ASM. I would call it really, really hard as you need to be aware of many architecture limits ( and those informations simply aren't available ).

But for 4xxx, 5xxx family to use ISA requires to write your own driver stack ( as CAL doesn't support asm any more ) - I think it's simply impossible at the moment.

" It cannot be compiled at the time as it depends on some CAL Vector/Matrix classes which aren't available for public use." are this AMD NDA code or is your own code?

It's my own code, but it's far from being ready. For vectorquantization example is can be easily replaced by Image2D with simple functions to fill data.

are you using any magic in it? or I can code some wrappers?..

The Matrix/Vector code is using a little bit of magic . Any vector/matrix expression ( like vec_a = 3*vec_b + vec_c + log(vec_d) ) is converted to proper kernel ( trick with using templates for delayed execution ) and executed on gpu. It saves a lot of time with writing custom kernels .

From TODO:
1. Add UAVs support,logical operations and more double math functions and as_typen conversion

2. Add il_asm function ( usage example: il_asm("mov %1,%2", v1, v2); would generate "mov r1,r2" )

3. Add documentation and more examples


4. Easier to use local cal arrays, and more user friendly code for IL creation functions
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ▼  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ▼  February (15)
      • Reading Fermi CUDA stuff!
      • Questions about OpenCL AMD d3d9 interop!
      • News 25/2!
      • 3 new tools!
      • Ideas for porting algos to GPU:AVX SSE and MMX ports!
      • About ATI and Nvidia drivers (OCL included)!
      • Shaders: measuring perf, source translation and pa...
      • Enabling OpenCL Image support on AMD GPUs!
      • Running QT everywhere!
      • Parallel algorithms avaiable on CUDA,OCL,DC,CAL: s...
      • More news!
      • Learned from voxel rendering demo code: CUDA 3.0 h...
      • A month of news!
      • About Tesla computing driver!
      • A long report of the silence before the storm: AKA...
    • ►  January (14)
  • ►  2009 (125)
    • ►  December (51)
    • ►  November (53)
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile