The steps needed to run OCL Nvidia samples on AMD drivers is the same as for running on Nivida 195 (require recompilation).. see my post
First device info of a r8xx:
CL_DEVICE_VENDOR: Advanced Micro Devices, Inc.
CL_DEVICE_NAME: Cypress
CL_DRIVER_VERSION: CAL 1.4.467
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 18
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 256 / 256 / 256
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
CL_DEVICE_MAX_CLOCK_FREQUENCY: 949 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_IMAGE_SUPPORT: 0
CL_DEVICE_MAX_READ_IMAGE_ARGS: 0
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 0
CL_DEVICE_IMAGE_MAX_WIDTH: 2d width 0, 2d height 0, 3d width 0, 3d height 0, 3d depth 0
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 256 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 256 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_EXTENSIONS:
CL_DEVICE_PREFERRED_VECTOR_WIDTH: char 16, short 8, int 4, long 2, float 4, double 0
Note R7xx has LOCAL_MEM_TYPE: global so local is emulated..
Upcoming extensions (found in AMD binaries):
cl_AMD_gl_sharing cl_khr_fp16 cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64
also seems that CAL GL interop is coming:
calGLDissociate calGLAssociate calResGLAssociate
Results:
31 samples (fail only 8 samples with fixes)
4 samples fail due to inexistant extensions support:
cl_khr_byte_addressable_store,image support
so with the extensions probably would work: 27/31
I think almost all if not all of the remainig 4 samples have WARPSIZE defines and fail I think due to Nvidia
making strong assumptions that threads execute in groups of 32 simulateosly (warp size) and ATI have wavefronts of size 64..
Resume of fails:
=======
First it's seems that AMD driver compiler doesn't know about -cl-mad-enable compiler flag (seen as warning in log build..)
fixed
=====
oclNbody :mul24 uint
oclParticles,oclScan,oclSortingNetworks : workgroup 256
oclScan: __local int buf;->buf[1]
oclParticles :float3 exista opengl ati no va
lacking extensions
=============
oclDXTCompression,oclHistogram : byte addresable
simpletexture3d,oclVolumeRender: image support
no fix
====
oclMatVecMul(solo 1kerne),radixsort :warp_size
oclQuasirandomGenerator : ejecuta valor malo
oclRecursiveGaussian: no se ve bien (linux si)
Dirt log:
Warning: invalid option: -cl-mad-enable
oclDXTCompression
Build Log:
C:\Users\oscar\AppData\Local\Temp\OCLF8D0.tmp.cl(153): error: write to < 32
bits via pointer not allowed unless cl_khr_byte_addressable_store is
enabled
*w = ((x << 11) | (y << 5) | z);
^
oclHistogram
cl_khr_byte_addressable_store
oclMatVecMul
coalesced 3
asume warp size =32
oclNbody
cambia issue mul24(get_local_size(0), (uint)j)
#ifdef MAC
#define SX_SUM(i,j) sharedPos[i + mul24(get_local_size(0), (uint)j)] // i + blockDimx * j
#else
#define SX_SUM(i,j) sharedPos[i + mul24(get_local_size(0), j)] // i + blockDimx * j
#endif
oclParticles
1.local mem de 512 i ati 256
#define LOCAL_SIZE_LIMIT 1024 ->512U
en .cl i en .cpp
2. float3 ya existe partices.cl canvia a float_3
3. ejecuta opengl en nvidia (Ni idea de como arreglarlo)
oclQuasirandomGenerator
Ni idea de como arreglarlo
Launch QuasirandomGenerator kernel...
Read back results...
Comparing to the CPU results...
L1 norm: 1.000000e+000
TEST FAILED !!!
oclRadixSort
Build Log:
C:\Users\oscar\AppData\Local\Temp\OCL7FBA.tmp.cl(181): internal error:
array_element_type: non-array type
1. __local uint buf;
canvia a buf[1]
2 __local uint num..
3.busca 512 en worksize-> 256
4. (no lo se ) warpsize 32-> 64
oclReduction
passed
Comparing against Host/C++ computation...
GPU result = 0
CPU result = 2139095040
oclScan
__lical buf int
workgroup 512->256
simpletexture3d
C:\Users\oscar\AppData\Local\Temp\OCLF077.tmp.cl(16): error: identifier
"__read_only" is undefined
__kernel void render(__read_only image3d_t volume, sampler_t volumeSampler, _
_global uint *d_output, uint imageW, uint imageH, float w)
^
oclSortingNetworks
workgroup size
static const uint LOCAL_SIZE_LIMIT = 512U;
static const char *compileOptions = "-D LOCAL_SIZE_LIMIT=512";
oclVolumeRender
-----------------------------------------------------------
nbody
ifdef mac
Build Log:
Warning: invalid option: -cl-mad-enable
nbody
_SUM(threadIdxx, threadIdxy).z = acc.z;
^
C:\Users\oscar\AppData\Local\Temp\OCL8880.tmp.cl(162): error: can't find an
instance for opencl builtin function
SX_SUM(threadIdxx, threadIdxy).z = acc.z;
^
Wednesday, 4 November 2009
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment