Well all regarding Nvidia 195.39!
Has three things:
1. OpenCL:
ICD Model
=========
Seems production quality with OpenCL ICD included from Khronos.
Seems that implementations are added to the Windows Registry:
{HKCU|HKCM} SOFTWARE\Khronos\OpenCL\Vendors
Seems to search for:
VendorSuffix
OpenCLDriverName
But I can't find Nvidia one added after installing the ICD.
Also has hardcoded:
NV
nvcuda.dll
So to add ATI can be as easy as adding:
VendorSuffix=AMD
OpenCLDriverName=opencl.dll
(search ati opencl dll) perhaps rename to avoid name clashing openclamd.dll
also or copy to windows\system or add to PATH or add full path to OpenCLDriverName
Seems that dll has to add :
clGetExtensionFunctionAddress
clIcdDispatchGetPlatformIDsKHR
2. Driver
OpenCL seems to be added to nvcuda.dll
Adds:
clGetExtensionFunctionAddress
clIcdDispatchGetPlatformIDsKHR
from binaries:
New extensions:
cl_khr_fp64
cl_khr_gl_sharing
Still missing:
3d image write (fermi)
atomics 64 bits
half
fp_rounding
2.CUDA 3.0
==========
Adds CUDA 3.0. Dll reports CUDA 3.0.1.
All we can now is Driver API stuff:
Needs? to add writable 3D Arrays
Initial direct3d 11 interop:
cuD3D11CtxCreate
cuD3D11GetDevice
New generic CUDA/graphics interop:
cuGraphicsD3D10RegisterResource
cuGraphicsD3D11RegisterResource
cuGraphicsD3D9RegisterResource
cuGraphicsGLRegisterBuffer
cuGraphicsGLRegisterImage
cuGraphicsMapResources
cuGraphicsResourceGetMappedPointer
cuGraphicsResourceSetMapFlags
cuGraphicsSubResourceGetMappedArray
cuGraphicsUnmapResources
cuGraphicsUnregisterResource
(seems that finally OpenGL texture interop:cuGraphicsGLRegisterImage)
New driver apis:
cuMemcpyDtoDAsync
cuModuleGetSurfRef
cuParamSetSurfRef
Seems surface support (programmable ROPS?):
cuSurfRefCreate
cuSurfRefDestroy
cuSurfRefGetAddress
cuSurfRefGetArray
cuSurfRefGetFormat
cuSurfRefSetAddress
cuSurfRefSetArray
cuSurfRefSetFormat
See:
.surf, via surface instructions, Yes via driver, R/W, Context
.tex, via texture instructions, Yes via driver, RO, Context
My Opinion:
are writable textures (actually random access ones)
equivalent to D3D 11 RWTexture (1D,2D,3D)
as are random access say UAV..
Form Timothy Farrar:
So if one reads between the lines, .surf is effectively a high latency coherent read and writable cache, probably with format conversion, and perhaps blending. Effectively a programmable ROP. Could be how NVidia plans to take on Larrabee's programmibility, opening up efficiency for all sorts of problem solving which requires coherent scatter of small scaler values (say like a z buffer, or binning algorithms). This type of thing simply is too bandwidth inefficient to be useful currently in CUDA. Unfortunately since DX11 doesn't have programmable blending or anything resembling this functionality, my guess is that .surf doesn't see hardware support for a while, perhaps until NVidia sees if it is needed to go against Larrabee. However when CUDA gets .surf, my GL/DX days are over.
Has fermi,sm_2_0,compute_2_0
-DCUDA_NO_SM_20_INTRINSICS
(it's new ?) -DCUDA_DOUBLE_MATH_FUNCTIONS
OpenGL
Well 3.2 but
includes Cg Compiler 3.0.0.1
NV_hull_program generated by NVIDIA Cg compiler
NV_tessellation_program generated by NVIDIA Cg compiler
New extensions
==============
Are all for Fermi? I suspect fp64 ones should work with GTX 200 cards but are not reported on a GTX 200.
GL_NV_transform_feedback3-> multiple buffer streams each frequency
GL_NV_texture_buffer_object_rgb32 -> what?
GL_NV_shader_image_load_store
GL_NV_gpu_shader5
GL_NV_gpu_program_fp64
GL_NV_draw_indirect
GL_EXT_texture_compression_bptc
GL_EXT_tessellation_shader
GL_EXT_gpu_shader_fp64
GL_EXT_gpu_shader5
GL_NV_shader_subroutine ->
Not inlinig subroutines allows true calls to subroutines
(possible recursion support without tricks as Humus..)
dx11 class
GL_NV_shader_subroutine dinamic shder linkage
GL_NV_shader_image_load_store <-> Read and write to textures shader possible with scatter UAV <-> euqivalent to AMD_random_access_target
GL_NV_gpu_shader5 <-> nvidia Fermi assembly
GL_NV_gpu_program_fp64<-> nvidia double assembly
GL_NV_draw_indirect<-> d3d 11 drawIndirect
GL_EXT_texture_compression_bptc <-> new compression format (hrd one?) <-> similar to AMD one
GL_EXT_tessellation_shader <-> tesselation shaders
GL_EXT_gpu_shader_fp64 <-> Double support for GLSL shaders
GL_EXT_gpu_shader5 <-> GLSL equicalent to d3d shader model 5.0
Thursday, 29 October 2009
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment