If you have read my previous post and downloaded the wrapper and read the readme.txt contained in you know I have provided a wrapper for getting AMD IL code from programs using the CAL API ( I'm specifically interested in OpenCL programs running on AMD GPU's)..
Well now the next step is to support building kernels form this code!
This is well supported at the OpenCL API level but unfortunately AMD doesn't support it currently (2.0beta4)..
For a complete overwiew of OpenCL on AMD GPUs stay tuned for a forthcoming post..
Fortunately I provide a solution based on an OpenCL wrapper plus .
This has at least two advantages at least:
1) Should eliminate OpenCL kernel compiling time, as we can inject AMD IL code or device assembly code.
2) We can change the code of OpenCL kernels.
Two cases come to my mind:
2.a) Making slight modifications to AMD IL code to support not yet supported functionality in
OpenCL implementation but supported at the CAL level, i.e. moving more fast than AMD :-) For example we can add double precision to OpenCL kernels but with extra work (post coming soon explaining how to convert a VectorAdd from SPFP to DPFP..), and perhaps using SAD, MAD24, MUL24, etc.. once we have a new AMD IL document for 5xxx hardware..
2.b) We can optimize/tune or whatever we want to do to the AMD IL code so we can hopefully reduce register pressure, reorder code, etc..
See how the power of getting at the assembly level can push Matrix Multiplication to near 1TFlop on AMD 4xxx boards..
Code: Get it! (comming soon..)
Friday, 23 October 2009
Building OpenCL kernels from AMD IL code or device assembly code!
Posted on 09:25 by Unknown
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment