GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 4 November 2009

Nvidia OpenCL samples with AMD OpenCL drivers!

Posted on 16:11 by Unknown
The steps needed to run OCL Nvidia samples on AMD drivers is the same as for running on Nivida 195 (require recompilation).. see my post

First device info of a r8xx:
  CL_DEVICE_VENDOR: Advanced Micro Devices, Inc.
  CL_DEVICE_NAME: Cypress
  CL_DRIVER_VERSION: CAL 1.4.467
  CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS: 18
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
  CL_DEVICE_MAX_WORK_ITEM_SIZES: 256 / 256 / 256
  CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
  CL_DEVICE_MAX_CLOCK_FREQUENCY: 949 MHz
  CL_DEVICE_ADDRESS_BITS: 32
  CL_DEVICE_IMAGE_SUPPORT: 0
  CL_DEVICE_MAX_READ_IMAGE_ARGS: 0
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 0
  CL_DEVICE_IMAGE_MAX_WIDTH: 2d width 0, 2d height 0, 3d width 0, 3d height 0, 3d depth 0
  CL_DEVICE_MAX_MEM_ALLOC_SIZE: 256 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE: 256 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
  CL_DEVICE_LOCAL_MEM_TYPE: local
  CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
  CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_EXTENSIONS:
  CL_DEVICE_PREFERRED_VECTOR_WIDTH: char 16, short 8, int 4, long 2, float 4, double 0
Note R7xx has LOCAL_MEM_TYPE: global so local is emulated..
Upcoming extensions (found in AMD binaries):
cl_AMD_gl_sharing  cl_khr_fp16  cl_khr_byte_addressable_store  cl_khr_3d_image_writes cl_khr_fp64

also seems that CAL GL interop is coming:
calGLDissociate  calGLAssociate calResGLAssociate

Results:
31 samples  (fail only 8 samples with fixes)
4 samples fail due to inexistant extensions support:
cl_khr_byte_addressable_store,image support
so with the extensions probably would work:  27/31

I think almost all if not all of the remainig 4 samples have WARPSIZE defines and fail I think due to Nvidia
making strong assumptions that  threads execute in groups of 32 simulateosly (warp size) and ATI have wavefronts of size 64..

Resume of fails:
=======

First it's seems that AMD driver compiler doesn't know about -cl-mad-enable compiler flag (seen as warning in log build..)

fixed
=====

oclNbody :mul24 uint
oclParticles,oclScan,oclSortingNetworks : workgroup 256
oclScan: __local int buf;->buf[1]
oclParticles :float3 exista opengl ati no va

lacking extensions
=============
oclDXTCompression,oclHistogram : byte addresable
simpletexture3d,oclVolumeRender: image support

no fix
====

oclMatVecMul(solo 1kerne),radixsort :warp_size
oclQuasirandomGenerator : ejecuta valor malo
oclRecursiveGaussian: no se ve bien (linux si)

Dirt log:
Warning: invalid option: -cl-mad-enable

oclDXTCompression
Build Log:
C:\Users\oscar\AppData\Local\Temp\OCLF8D0.tmp.cl(153): error: write to < 32
          bits via pointer not allowed unless cl_khr_byte_addressable_store is
          enabled
      *w = ((x << 11) | (y << 5) | z);
      ^

oclHistogram
cl_khr_byte_addressable_store
oclMatVecMul
coalesced 3
asume warp size =32

oclNbody
  cambia issue mul24(get_local_size(0), (uint)j)
#ifdef MAC
#define SX_SUM(i,j) sharedPos[i + mul24(get_local_size(0), (uint)j)]    // i + blockDimx * j
#else
#define SX_SUM(i,j) sharedPos[i + mul24(get_local_size(0), j)]    // i + blockDimx * j
#endif

oclParticles
1.local mem de 512 i ati 256
#define LOCAL_SIZE_LIMIT 1024 ->512U
en .cl i en .cpp
2. float3 ya existe partices.cl canvia a float_3
3. ejecuta opengl en nvidia (Ni idea de como arreglarlo)

oclQuasirandomGenerator
Ni idea de como arreglarlo
Launch QuasirandomGenerator kernel...
Read back results...
Comparing to the CPU results...
L1 norm: 1.000000e+000
TEST FAILED !!!

oclRadixSort
Build Log:
C:\Users\oscar\AppData\Local\Temp\OCL7FBA.tmp.cl(181): internal error:
          array_element_type: non-array type
1.       __local uint buf;
canvia a buf[1]
2 __local uint num..
3.busca 512 en worksize-> 256
4. (no lo se ) warpsize 32-> 64

oclReduction
passed
Comparing against Host/C++ computation...
GPU result = 0
CPU result = 2139095040
oclScan
__lical buf int
workgroup 512->256

simpletexture3d
C:\Users\oscar\AppData\Local\Temp\OCLF077.tmp.cl(16): error: identifier
          "__read_only" is undefined
 __kernel void render(__read_only image3d_t volume, sampler_t volumeSampler,  _
_global uint *d_output, uint imageW, uint imageH, float w)
                       ^
oclSortingNetworks
workgroup size
static const uint LOCAL_SIZE_LIMIT = 512U;
static const char  *compileOptions = "-D LOCAL_SIZE_LIMIT=512";

oclVolumeRender

-----------------------------------------------------------


nbody
ifdef mac

Build Log:
Warning: invalid option: -cl-mad-enable
nbody
_SUM(threadIdxx, threadIdxy).z = acc.z;
          ^

C:\Users\oscar\AppData\Local\Temp\OCL8880.tmp.cl(162): error: can't find an
          instance for opencl builtin function
          SX_SUM(threadIdxx, threadIdxy).z = acc.z;
          ^
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ►  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ▼  2009 (125)
    • ►  December (51)
    • ▼  November (53)
      • Two big games coming today: State of the art Direc...
      • News from the web (IV) (big compilation)
      • Wishes in GPU drivers before Q2 2009!
      • CUDA Atomics perf!
      • GPU Compute benchmark results!
      • Interesting AMD Stream forums posts! (old posts)
      • Testing my apps with 8600GTS and WinXP!
      • A lot of Catalyst AMD drivers!
      • News from the web III
      • News from the web II (big compilation)
      • News from OpenCL forums!
      • Bugs in OpenGL AMD drivers: Geometry shader and te...
      • Testing LDS perf in OpenCL!
      • OpenCL bugs!
      • Benchmarking OpenCL and DirectCompute!
      • Benchmarking stientific kernels on OpenCL!
      • News from the web!
      • OpenCL learning and tutorials!
      • Porting CUDA to OpenCL!
      • GPU computing programming contests..
      • AMD 5xxx series overclocking..
      • OpenCL on Apple: update!
      • State of the blog..
      • Places where OpenCL shines!
      • Running Optix with Geforce in Linux
      • New exciting soft and info coming this year!
      • Matmul bench for CUDA, CAL, and MultiCore CPUs!
      • More than 10 places where DX Compute 5.0 is better...
      • CUDA 3.0 has CUBLAS functions for MAGMA with compl...
      • About IBM OpenCL
      • OpenGL interop perf in CUDA and OCL in Linux
      • Fraps like for Linux and for Windows DX11!
      • opencl/opengl linux interop! seen in opencl cuda 3...
      • AMD OpenCl forums (I)
      • About CUDA 3.0 (II)
      • About CUDA 3.0 (I)
      • CAL 2.0 vs 1.4 API
      • Naive OpenCL benchmarks..
      • Managing AMD OpenCL GPU devices and OpenCL backend...
      • About Xvba VAAPI backend..
      • CUDA 3.0 released
      • About Khronos ICD model..
      • Exploring Nvidia OpenCL 195.39 drivers:Bugs , perf...
      • Nvidia OpenCL samples with AMD OpenCL drivers!
      • Nvidia OpenCL samples on Nvidia 195 OpenCL drivers!!
      • AMD OpenCL samples on Nvidia 195 OpenCL drivers!!
      • Optix and OpenCL SDKs with Visual Studio 2010
      • OpenCL on AMD GPUs!
      • Dreaming about Ubuntu 10.04
      • News from the web!
      • OpenCL-z is here!
      • Port of Apple demos to Windows..
      • Shared memory names..
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile