This will be things I would like to work say if have to do a MsC:
1.Do a PTX to AMD IL 2.0 converter: use libptx from exoto of ptxparser of ocelot as barra uses cubin and gpgpusim not known..
then from that build a amd il codegen.. now with 5xxx specs is good stuff..
Adding also PTX 2.0 with Fermi instructions ballot etc.. also use bitinsert sad etc.. of AMD 5xxx
if you want ot execute
still lack ptx v1.5 of opencl but cuda backend perhaps supprots it as cusurf errors show opencl uses cuda runtime soemwaht
use a cudart or nvcuda library wrapper and send all that to amd ocl implemetation or better trace cal use of opencl for using cal opencl special functions and do a cal wrapper it's the best..
cubin decode with decuda..
physx,optix.etcc
2. include asm support for opencl for amd and nvidia so they map to ptx and amdil intercepting with opencl builtin get of bin
include sombe magic instruction and use asm("...") builtin function as magic
then post merge with ptx that seems ssa or do liveliness analissis over a cfg
and proper register allocation
instructions can be universial addc clock instruction for amd and nvidia and special as sad instruction,,etc....
amd is going to introduce as instrinsics
also include in cuda compiler as addc guy say nativesadamd() and intercept in ocl wrapper
3. port and redesing matmul,fft, sort and other *good* nvidia implementations to be efficient on ati
4. try to fix optimized cuda codes that no work on ati (say check implicit warp 32 size assumptions) from 3. also try to learn general rules of thumb for on the fly optimization of ptx or cuda programs to kernels..
Thursday, 14 January 2010
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment