Nvidia OpenCL samples with AMD OpenCL drivers! ~ GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

The steps needed to run OCL Nvidia samples on AMD drivers is the same as for running on Nivida 195 (require recompilation).. see my post

First device info of a r8xx:
  CL_DEVICE_VENDOR: Advanced Micro Devices, Inc.
  CL_DEVICE_NAME: Cypress
  CL_DRIVER_VERSION: CAL 1.4.467
  CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS: 18
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
  CL_DEVICE_MAX_WORK_ITEM_SIZES: 256 / 256 / 256
  CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
  CL_DEVICE_MAX_CLOCK_FREQUENCY: 949 MHz
  CL_DEVICE_ADDRESS_BITS: 32
  CL_DEVICE_IMAGE_SUPPORT: 0
  CL_DEVICE_MAX_READ_IMAGE_ARGS: 0
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 0
  CL_DEVICE_IMAGE_MAX_WIDTH: 2d width 0, 2d height 0, 3d width 0, 3d height 0, 3d depth 0
  CL_DEVICE_MAX_MEM_ALLOC_SIZE: 256 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE: 256 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
  CL_DEVICE_LOCAL_MEM_TYPE: local
  CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
  CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_EXTENSIONS:
  CL_DEVICE_PREFERRED_VECTOR_WIDTH: char 16, short 8, int 4, long 2, float 4, double 0
Note R7xx has LOCAL_MEM_TYPE: global so local is emulated..
Upcoming extensions (found in AMD binaries):
cl_AMD_gl_sharing cl_khr_fp16 cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64

also seems that CAL GL interop is coming:
calGLDissociate calGLAssociate calResGLAssociate

Results:
31 samples (fail only 8 samples with fixes)
4 samples fail due to inexistant extensions support:
cl_khr_byte_addressable_store,image support
so with the extensions probably would work:  27/31

I think almost all if not all of the remainig 4 samples have WARPSIZE defines and fail I think due to Nvidia
making strong assumptions that threads execute in groups of 32 simulateosly (warp size) and ATI have wavefronts of size 64..

Resume of fails:
=======

First it's seems that AMD driver compiler doesn't know about -cl-mad-enable compiler flag (seen as warning in log build..)

fixed
=====

oclNbody :mul24 uint
oclParticles,oclScan,oclSortingNetworks : workgroup 256
oclScan: __local int buf;->buf[1]
oclParticles :float3 exista opengl ati no va

lacking extensions
=============
oclDXTCompression,oclHistogram : byte addresable
simpletexture3d,oclVolumeRender: image support

no fix
====

oclMatVecMul(solo 1kerne),radixsort :warp_size
oclQuasirandomGenerator : ejecuta valor malo
oclRecursiveGaussian: no se ve bien (linux si)

Dirt log:
Warning: invalid option: -cl-mad-enable

oclDXTCompression
Build Log:
C:\Users\oscar\AppData\Local\Temp\OCLF8D0.tmp.cl(153): error: write to < 32
   bits via pointer not allowed unless cl_khr_byte_addressable_store is
   enabled
   *w = ((x << 11) | (y << 5) | z);
   ^

oclHistogram
cl_khr_byte_addressable_store
oclMatVecMul
coalesced 3
asume warp size =32

oclNbody
  cambia issue mul24(get_local_size(0), (uint)j)
#ifdef MAC
#define SX_SUM(i,j) sharedPos[i + mul24(get_local_size(0), (uint)j)] // i + blockDimx * j
#else
#define SX_SUM(i,j) sharedPos[i + mul24(get_local_size(0), j)] // i + blockDimx * j
#endif

oclParticles
1.local mem de 512 i ati 256
#define LOCAL_SIZE_LIMIT 1024 ->512U
en .cl i en .cpp
2. float3 ya existe partices.cl canvia a float_3
3. ejecuta opengl en nvidia (Ni idea de como arreglarlo)

oclQuasirandomGenerator
Ni idea de como arreglarlo
Launch QuasirandomGenerator kernel...
Read back results...
Comparing to the CPU results...
L1 norm: 1.000000e+000
TEST FAILED !!!

oclRadixSort
Build Log:
C:\Users\oscar\AppData\Local\Temp\OCL7FBA.tmp.cl(181): internal error:
   array_element_type: non-array type
1. __local uint buf;
canvia a buf[1]
2 __local uint num..
3.busca 512 en worksize-> 256
4. (no lo se ) warpsize 32-> 64

oclReduction
passed
Comparing against Host/C++ computation...
GPU result = 0
CPU result = 2139095040
oclScan
__lical buf int
workgroup 512->256

simpletexture3d
C:\Users\oscar\AppData\Local\Temp\OCLF077.tmp.cl(16): error: identifier
   "__read_only" is undefined
__kernel void render(__read_only image3d_t volume, sampler_t volumeSampler, _
_global uint *d_output, uint imageW, uint imageH, float w)
   ^
oclSortingNetworks
workgroup size
static const uint LOCAL_SIZE_LIMIT = 512U;
static const char *compileOptions = "-D LOCAL_SIZE_LIMIT=512";

oclVolumeRender

-----------------------------------------------------------

nbody
ifdef mac

Build Log:
Warning: invalid option: -cl-mad-enable
nbody
_SUM(threadIdxx, threadIdxy).z = acc.z;
   ^

C:\Users\oscar\AppData\Local\Temp\OCL8880.tmp.cl(162): error: can't find an
   instance for opencl builtin function
   SX_SUM(threadIdxx, threadIdxy).z = acc.z;
   ^

GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

Wednesday, 4 November 2009

Nvidia OpenCL samples with AMD OpenCL drivers!

0 comments:

Post a Comment

Popular Posts

Blog Archive

About Me