GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 29 October 2009

Nvidia 195

Posted on 19:55 by Unknown
Well all regarding Nvidia 195.39!

Has three things:
1. OpenCL:

ICD Model
=========
Seems production quality with OpenCL ICD included from Khronos.
Seems that implementations are added to the Windows Registry:
{HKCU|HKCM} SOFTWARE\Khronos\OpenCL\Vendors
Seems to search for:
VendorSuffix
OpenCLDriverName
But I can't find Nvidia one added after installing the ICD.
Also has hardcoded:
NV
nvcuda.dll
So to add ATI can be as easy as adding:
VendorSuffix=AMD
OpenCLDriverName=opencl.dll
(search ati opencl dll) perhaps rename to avoid name clashing openclamd.dll
also or copy to windows\system or add to PATH or add full path to OpenCLDriverName
Seems that dll has to add :
clGetExtensionFunctionAddress
clIcdDispatchGetPlatformIDsKHR

2. Driver
OpenCL seems to be added to nvcuda.dll
Adds:
clGetExtensionFunctionAddress
clIcdDispatchGetPlatformIDsKHR

from binaries:

New extensions:
cl_khr_fp64
cl_khr_gl_sharing

Still missing:
3d image write (fermi)
atomics 64 bits
half
fp_rounding

2.CUDA 3.0
==========
Adds CUDA 3.0. Dll reports CUDA 3.0.1.
All we can now is Driver API stuff:
Needs? to add writable 3D Arrays
Initial direct3d 11 interop:
cuD3D11CtxCreate
cuD3D11GetDevice

New generic CUDA/graphics interop:
cuGraphicsD3D10RegisterResource
cuGraphicsD3D11RegisterResource
cuGraphicsD3D9RegisterResource
cuGraphicsGLRegisterBuffer
cuGraphicsGLRegisterImage
cuGraphicsMapResources
cuGraphicsResourceGetMappedPointer
cuGraphicsResourceSetMapFlags
cuGraphicsSubResourceGetMappedArray
cuGraphicsUnmapResources
cuGraphicsUnregisterResource
(seems that finally OpenGL texture interop:cuGraphicsGLRegisterImage)
New driver apis:
cuMemcpyDtoDAsync
cuModuleGetSurfRef
cuParamSetSurfRef
Seems surface support (programmable ROPS?):
cuSurfRefCreate
cuSurfRefDestroy
cuSurfRefGetAddress
cuSurfRefGetArray
cuSurfRefGetFormat
cuSurfRefSetAddress
cuSurfRefSetArray
cuSurfRefSetFormat
See:
.surf, via surface instructions, Yes via driver, R/W, Context
.tex, via texture instructions, Yes via driver, RO, Context
My Opinion:
are writable textures (actually random access ones)
equivalent to D3D 11 RWTexture (1D,2D,3D)
as are random access say UAV..
Form Timothy Farrar:
So if one reads between the lines, .surf is effectively a high latency coherent read and writable cache, probably with format conversion, and perhaps blending. Effectively a programmable ROP. Could be how NVidia plans to take on Larrabee's programmibility, opening up efficiency for all sorts of problem solving which requires coherent scatter of small scaler values (say like a z buffer, or binning algorithms). This type of thing simply is too bandwidth inefficient to be useful currently in CUDA. Unfortunately since DX11 doesn't have programmable blending or anything resembling this functionality, my guess is that .surf doesn't see hardware support for a while, perhaps until NVidia sees if it is needed to go against Larrabee. However when CUDA gets .surf, my GL/DX days are over.


Has fermi,sm_2_0,compute_2_0
-DCUDA_NO_SM_20_INTRINSICS
(it's new ?) -DCUDA_DOUBLE_MATH_FUNCTIONS

OpenGL

Well 3.2 but
includes Cg Compiler 3.0.0.1
NV_hull_program generated by NVIDIA Cg compiler
NV_tessellation_program generated by NVIDIA Cg compiler

New extensions
==============

Are all for Fermi? I suspect fp64 ones should work with GTX 200 cards but are not reported on a GTX 200.

GL_NV_transform_feedback3-> multiple buffer streams each frequency
GL_NV_texture_buffer_object_rgb32 -> what?
GL_NV_shader_image_load_store
GL_NV_gpu_shader5
GL_NV_gpu_program_fp64
GL_NV_draw_indirect
GL_EXT_texture_compression_bptc
GL_EXT_tessellation_shader
GL_EXT_gpu_shader_fp64
GL_EXT_gpu_shader5
GL_NV_shader_subroutine ->
Not inlinig subroutines allows true calls to subroutines
(possible recursion support without tricks as Humus..)

dx11 class

GL_NV_shader_subroutine dinamic shder linkage
GL_NV_shader_image_load_store <-> Read and write to textures shader possible with scatter UAV <-> euqivalent to AMD_random_access_target
GL_NV_gpu_shader5 <-> nvidia Fermi assembly
GL_NV_gpu_program_fp64<-> nvidia double assembly
GL_NV_draw_indirect<-> d3d 11 drawIndirect
GL_EXT_texture_compression_bptc <-> new compression format (hrd one?) <-> similar to AMD one
GL_EXT_tessellation_shader <-> tesselation shaders
GL_EXT_gpu_shader_fp64 <-> Double support for GLSL shaders
GL_EXT_gpu_shader5 <-> GLSL equicalent to d3d shader model 5.0
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ►  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ▼  2009 (125)
    • ►  December (51)
    • ►  November (53)
    • ▼  October (21)
      • IBM OpenCL support!
      • Whises for OpenCL 1.1 and more!
      • 3D Vision and Direct3D 11
      • H264 harware decoding/ encoding GPUs
      • Interop GPU computing graphics apis stuff
      • 3D vision good stuff
      • Getting PTX, AMD_IL from languages:
      • Updated CUBLAS before CUDA 3.0
      • ATI and Nvidia extensions for DX11 and 10.1 ARBs
      • OpenGL 5870 extensions
      • Nvidia 195
      • News from the web!
      • Mem export in OpenCL
      • Double precision support in GPU computing APIs and...
      • Support 3d image write on CUDA and with OpenCL wra...
      • About OpenCL OpenGL interop..
      • Improved OpenCL-Z!
      • About binary compatiblity on OpenCL..
      • AMD IL backend for LLVM and getting AMD IL in MacOS?
      • Building OpenCL kernels from AMD IL code or device...
      • A CAL wrapper for getting AMD IL from OpenCL AMD G...
Powered by Blogger.

About Me

Unknown
View my complete profile