GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Saturday, 31 October 2009

Whises for OpenCL 1.1 and more!

Posted on 13:35 by Unknown
Make core DirectCompute 5.0 hardware features:
posted http://www.khronos.org/message_boards/viewtopic.php?f=41&t=2160
*Atomics to global and local mem. (int32 base and extended extensions)
-> now that is supported would be good to add:
*Append/consume buffers (see AMD stuff), a global queue/stack accesable with no hazards..
*Byte addressable support.
*Half support (cl_khr_fp16)
*Require that local mem is not of type global (as in 4xxx cards due to write to LDS restrictions..)
*Expanded DirectCompute 5.0 integer support (bit count,bit reverse,etc..)

As doubles (cl_khr_fp64) is an optional feature of compute shaders as 57xx proof that no luck..

Also if it's not currently required require:

*Image support for FULL profile.
*OpenGL interop for GPU devices.

Add extensions or promote to core depending if is AMD/Nvidia specific support or multivendor:

* Multivendor:
*Add support for accessing system mem from GPU kernels:
thats currently supported in both Nvidia and AMD devices an exposed in CUDA 2.2 and up and CAL.
so called pinned system mem (in CUDA 2.2 for GT 200 devices), host mem export (AMD CAL)
*Implement DirectX interop (AMD ships header)
*Getting info of integer support.. if there are native 24 int muls (CUDA devices before Fermi and AMD 5xxx (every ALU)) or int32 muls (Fermi, AMD 4xxx and 5xxx(only 5th ALU))..

AMD proposed ones (some are said hardware features 5xxx press kit some 4xxx hardware support):

*Global Data Share and Wave sync support (GDS,etc..)
*Native SAD hardware support.
*Expose registers shared per SIMD.. (shared registers avaiable in compute shader in CAL which allow doing reductions in fixed number of steps say 2 or 3 vs. logN)


Nvidia ones:

*Improve memory API for supporting CUDA 2.2 mem impovements: Expand support for creating "shared pinned buffers" (in cuda parlance) (buffers of host mem that are pinned and usable from multiple GPUs as pinned mem (using DMA)
and also shared pinned system mem.

*Expose partial simultaneous mem image objects to have read/write support with strict limitations: exposing current RWTexture Direct3D 11 abilities and also of NV_texture_barrier OpenGL extension of reading to an already bound FBO texture
of reading the same texel before writing to it..

*Expose interop with CUDA:
Code interop: support for interchanging PTX kernel code from CUDA functions or OpenCL functions with identical name and arguments (signature) and using at clBuildfromBinaries..
Mem interop: Ability to use mem buffers allocated from CUDA in OpenCL or viceversa..
This should allow directly suportig proposed "shared pinned buffers"

*Fermi support. Provide new extensions supporting this features:

*Expose function pointer and stack support which provides true function calls and recursivity..
*Expose Fermi support for executing host code inside kernels
*Expose Fermi support for allocating mem in kernels (malloc/free functions)
*Expose C++ language in Kernels (?)
*Expose expanded information of ECC support: say ECC protected registers, and mem(local/global), ECC protected path from mem GPU <-> GDDR chips.. also if possible ECC codes info: error detection capability (Fermi can detect 3 bits in and 1 bit recovery support for every xx bits..)
*Add perhaps some exception support (assuming not full C++ support as CUDA 3.0) for managing/getting acknowledged of irrecoverable errors (where (in mem chips or registers) in kernel code.. If not possible in kernel code at least finish kernel and return via some mechanism to the host this info..
*Add perhaps some info of where atomics are implemented for knowing if we can expect high performance or not (say if they are handled in L2/L3 caches (Fermi) or in memory controllers or compute units (ALUs) (preFermi))
Also NVIDIA implement some features that require no extension to OpenCL API as API model allow that.. and allow getting device info querying information of if it's avaiable and other device info support:

For example using multiple command_queues and events support for hardware that supports it:
*Concurrent mem/kernel exec.. CUDA 1.1 devices (G9x,GT200,Fermi) and AMD(?)
*Concurrent kernel execution.. Fermi (also AMD on 5xxx)
*Concurrent H2D and D2H.. using Fermi twin DMA engines.


*Predication support (I have doubts?) Equivalent to CMOV avoiding using branching hardware. Basically avoiding that conditional code gets executed executing both paths.(?)
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ►  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ▼  2009 (125)
    • ►  December (51)
    • ►  November (53)
    • ▼  October (21)
      • IBM OpenCL support!
      • Whises for OpenCL 1.1 and more!
      • 3D Vision and Direct3D 11
      • H264 harware decoding/ encoding GPUs
      • Interop GPU computing graphics apis stuff
      • 3D vision good stuff
      • Getting PTX, AMD_IL from languages:
      • Updated CUBLAS before CUDA 3.0
      • ATI and Nvidia extensions for DX11 and 10.1 ARBs
      • OpenGL 5870 extensions
      • Nvidia 195
      • News from the web!
      • Mem export in OpenCL
      • Double precision support in GPU computing APIs and...
      • Support 3d image write on CUDA and with OpenCL wra...
      • About OpenCL OpenGL interop..
      • Improved OpenCL-Z!
      • About binary compatiblity on OpenCL..
      • AMD IL backend for LLVM and getting AMD IL in MacOS?
      • Building OpenCL kernels from AMD IL code or device...
      • A CAL wrapper for getting AMD IL from OpenCL AMD G...
Powered by Blogger.

About Me

Unknown
View my complete profile