See post "Optix and OpenCL SDKs with Visual Studio 2010" for working i 2010!
Of course all samples work with these fixes..
General performance is very good:
gtx 275 pciex 2.0 a 8x
nbody (30000bodies):
===============
ocl nbody 22-23fps (440-460gflops)
cuda nbody: 24-25fps (480-500flps)
dxcompute nbody: 25fps (500 gflops)
with opencl 190.89 20fps
ahora 22-23fps
bandwith performance
===============
very good:
ocl device bandwith (195) 97gbytes/s
cuda 104gbytes/s
ocl device bandwith (190 driver) 90-95gbytes/s
New drivers support doubles and OpenGL interop.
Enabling OpenGL interop in samples :
=========================
This samples have support:
oclSimpleTexture3D
oclVolumeRender
oclPostprocessGL
oclSimpleGL
1.search commented #define GL_INTEROP and uncomment
2.change (windows only):
cxGPUContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, NULL, &ciErrNum);
to:
#ifdef GL_INTEROP
cl_context_properties akProperties[] = {
CL_GL_CONTEXT_KHR,
(cl_context_properties)wglGetCurrentContext(),
CL_WGL_HDC_KHR,
(cl_context_properties)wglGetCurrentDC(), 0
};
// create the OpenCL context
cxGPUContext = clCreateContextFromType(akProperties, CL_DEVICE_TYPE_GPU, NULL, NULL, &ciErrNum);
#else
// create the OpenCL context
cxGPUContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, NULL, &ciErrNum);
#endif
oclPostprocessGL fix:
beforce initcl
if(!bQATest)
{
// create pbo
createPBO(&pbo_source);
createPBO(&pbo_dest);
}
after:
if(!bQATest)
{
// create pbo
// createPBO(&pbo_source);
==============
Seems Texture 3D OpenGL interop is very bad..
on/off
====
vol3d 14 14
post 95 75
sim 385 320/330
tex3d 290 250
Device
=====
OpenCL SW Info:
CL_PLATFORM_NAME: NVIDIA CUDA
CL_PLATFORM_VERSION: OpenCL 1.0 CUDA 3.0.1
OpenCL SDK Version: 4788711
OpenCL Device Info:
1 devices found supporting OpenCL:
---------------------------------
Device GeForce GTX 275
---------------------------------
CL_DEVICE_NAME: GeForce GTX 275
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 195.39
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 30
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1404 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 224 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 896 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_IMAGE
2D_MAX_HEIGHT 8192
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048
CL_DEVICE_EXTENSIONS:
cl_khr_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_storecl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
CL_NV_DEVICE_COMPUTE_CAPABILITY: 1.3
CL_NV_DEVICE_REGISTERS_PER_BLOCK: 16384
CL_NV_DEVICE_WARP_SIZE: 32
CL_NV_DEVICE_GPU_OVERLAP: CL_TRUE
CL_NV_DEVICE_KERNEL_EXEC_TIMEOUT: CL_FALSE
CL_NV_DEVICE_INTEGRATED_MEMORY: CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_
0 comments:
Post a Comment