Swan: A simple tool for porting CUDA kernels to OpenCL
Furthermore, it's a work in progress. It works for our code but no promises it will for yoursWhat is it?
Swan is a small tool that aids the reversible conversion of existing CUDA codebases to OpenCL. It does several useful things:
- Translates CUDA kernel source-code to OpenCL.
- Provides a common API that abstracts both CUDA and OpenCL runtimes.
Preserves the convenience of the CUDA <<< grid, block >>> kernel launch syntax by generating C source-code for kernel entry-point functions.Why might you want it?
Possible uses include:
- Evaluating OpenCL performance of an existing CUDA code.
- Maintaining a dual-target OpenCL and CUDA code.
- Reducing dependence on NVCC when compiling host code.
- Support multiple CUDA compute capabilities in a single binary
Limitations
It's not a drop-in replacement for nvcc. Host code needs to have all kernel invocations and CUDA API calls re-written.Swan does not support a few things. In particular:
- CUDA C++ templating in kernel code.
- OpenCL Images/Samplers (analogous to Textures).
- Multiple device management in a single process.
- Compiling kernels for the CPU.
- CUDA device-emulation mode.
Cloo 0.6.2
A new version of Cloo is out.CAL++ v. 0.8 release
It introduces a tracking mechanism for kernel arguments (sampler or memory objects) which prevents them from being claimed by the GC in case the user application doesn't refer to them in later code. This behaviour has been backported to the existing Set*Argument methods since it is safer. You can override auto-tracking using the newly added overloads.
A critical bug affecting image read operations together with some other minor glitches were fixed.
As for breaking changes rename any ComputeImage.PixelSize to ElementSize and you're good to go.
Clootils have been improved, too. Now, you can take advantage of some bells and whistles which control the program building behavior.
anouncement
C++ to IL generator/compiler with C++ bindings for CAL
http://sourceforge.net/projects/calpp/
The CAL++ library has been just released. Project homepage is located here http://sourceforge.net/projects/calpp/ .
The project consist of two main components. One is C++ binding for CAL ( it's really much easier to develop new CAL applications using bindings ) and second is C++ to IL generator/compiler.
The C++ generator/compiler has syntax very similar to OpenCL ( with few necessary exceptions ). Also it supports all devices which can run CAL kernels ( finally OpenCL like language for 3xxx ).
It has some advantages over OpenCL compiler. To name few
- it's much closer to CAL - it allows to write code which is almost as good ( or as good ) as handwritten IL. Look at the matrix multiplication example - it has almost the same ISA as prunedtree original code ( it differs only where I've added some changes ).
- Advantage of using C++. I really wouldn't like to use double-double ( or quad float ) technique without C++.
- Powerfull control over loop unroling and code selection ( at IL compilation time ). The C++ language acts like preprocesor.
- It has LDS support for 4xxx, doubles, etc. And if something is missing it can be added really easy.
But as always there are some pitfalls to this approach
- it isn't OpenCL . Having standard is always usefull.
- Only partial support for structs ( it can be much improved but never as good as OpenCL ).
- CAL++ is much closer to IL and some more knowledge about IL is required to achive full potential ( hmmm I think this is also the case with OpenCL ).
- optimization is only performed by CAL IL compiler ( which isn't that good ).
With the library there are some examples included. I think the fastest matrix multiplication might be a small gem here .
I hope that CAL++ will be usefull to someone .
Doesn't compile under Windows MSVC 2008!
Use 0.8a for GCC 4.4!
QA:
1. Have you tested on Windows?
No. But with the exception to C++ compiler problems it should work ( there is nothing platform specific in the code ).
2. Also have you added 24 bit integer instructions? they are useful for getting thread id fast for example..
CAL++ is converting code to IL. So 24 operations need to be available in CAL IL. And unfortunatelly it isn't the case.
I'm thinking how hard is to add also GDS?
Using anything that isn't available in IL is really hard ( or close to impossible ).
When CAL supported ISA assembler compilation ( 3xxx family ) you could generate ISA ASM. I would call it really, really hard as you need to be aware of many architecture limits ( and those informations simply aren't available ).
But for 4xxx, 5xxx family to use ISA requires to write your own driver stack ( as CAL doesn't support asm any more ) - I think it's simply impossible at the moment.
" It cannot be compiled at the time as it depends on some CAL Vector/Matrix classes which aren't available for public use." are this AMD NDA code or is your own code?
It's my own code, but it's far from being ready. For vectorquantization example is can be easily replaced by Image2D with simple functions to fill data.
are you using any magic in it? or I can code some wrappers?..
The Matrix/Vector code is using a little bit of magic . Any vector/matrix expression ( like vec_a = 3*vec_b + vec_c + log(vec_d) ) is converted to proper kernel ( trick with using templates for delayed execution ) and executed on gpu. It saves a lot of time with writing custom kernels .
From TODO:
1. Add UAVs support,logical operations and more double math functions and as_typen conversion
2. Add il_asm function ( usage example: il_asm("mov %1,%2", v1, v2); would generate "mov r1,r2" )
3. Add documentation and more examples
4. Easier to use local cal arrays, and more user friendly code for IL creation functions
0 comments:
Post a Comment