GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Saturday, 3 July 2010

ATI Stream SDK roadmap

Posted on 06:40 by Unknown
I have found a roadmap of ATI Stream SDK till end of year:
DISCLAIMER: It's on Internet and found with some luck.. no breaking of NDA

Let's talk about it..
currently AMD OpenCL lacks:
*opengl interop issues:images interop issues (for example copy buffer to image where image is opengl tex acquired doesn't work)
*expose multiple component images (other than rgba)
*DX interop
*expose all graphics mem (currently 128-256mb)
*Catalyst integration

Stream SDK 2.2 Adds:
*OCL 1.1 (3 component vectors is part and image support ocl 1.1 is multiple component images (r,rg,rgb))
*DX10 interop (seems only that no dx9 or dx11 as Nvidia has)
*mem fences don't generate unneeded barrier isa instructions
*append buffers (what about also about GDS extension)
*seems atomics ocl 1.1 is nothing new? and offline compilation goes final from preview and dpfp adds fma as others are supported now(?)
dpfp fma should allow peak test kernels in benchmarks showing high numbers.. near 400-500gflop/s..

A lot more interesting is 2.3:
*In process compilation of OpenCL kernels means no shipping LLVM compilers (llc,etc..) and hopefully means will be integreated in atiocl.dll so it can ship OpenCL builtin in Catalyst 10.12..
*Library models
*C++ template support in kernels (I hope this means you can specify at least kernels args depeding on template argument for supporting double and float kernels with one code for example similar to CUDA support)
*Adds trig DPFP routines (but still no complete DPFP support seems so horrible as Nvidia shiping since October 2009 and AMD said support coming gradually since end 2009)
The more interesting is last three:
*FFT library: why not also a blas lib, I suspect is ocl based as directcompute has its fft lib
also is going to be part of acml? currently matmul in acml gpu is cal based..
At least I hope to be only binary library and also for Win and Lin so for Mac I hope somehow we can extract  OpenCL kernels or create a wrapper around it and use Wine or something like this to test perf on MAC on AMD boards is correct..
*OpenPhysics: well at least some to play, I expect cloth, soft body and SPH particles support in OpenCL and/or DirectCompute.. well in bullet site there is a preliminary executable with cloth demo and AMD worker talking about state of soft body support (http://code.google.com/p/bullet/issues/detail?id=390#c3) seems since last week also we have directcompute and opencl code for both cloth and soft body in trunk..
Also by September we will have DMM 2.0 as said in GDC that has some OpenCL love for this rigid body+fracture simulatior..
*OpenDecode UVD: Well a cuvid/vdpau library for AMD boards.. Nvidia has put lot of love to GPU video decoding and interop with CUDA/OpenGL with CUVID for Win and Mac and VDPAU for Linux..
VDPAU has since 256 drivers efficient OpenGL and CUDA interop.. CUVID has by def efficient CUDA interop and fast OpenGL/DX interop in Windows.. CUVID for MAC only seems good for feeding data to CUDA as OpenGL interop in MAC is slow right now (and has been so, since ever)..
I expect this brings fast interop to OpenCL on Win and Lin and that adds to DXVA DX interop on Win and AMD xvBA on Linux which VAAPI wrapper seems to provide fast OGL interop..
So Mac seems left but I hope recent video acceleration API on 10.6.3 supports AMD 5xxx cards when released and also that VC1 support is added in addition to h264.. I think this provides fast path to OpenGL textures so as OpenCL/OpenGL interop is fast on Apple provides also OpenCL interop on that platform..
Another thing is if Dual Stream acceleration will be exposed and supported.. on Nvidia I think both DXVA,CUVID and VDPAU expose with a GTX 470 at least..
Also related is Catalyst 10.7 having improved support for VLC 1.1.1 DXVA decoding for AMD cards which I presume relates to fast path GPU/CPU sending of frames works..
Remember also last month Nvidia released a ION driver (257.29) improving perf with DXVA on ION with PCIex x1 as Flash requires (GPU->CPU->GPU roundtrip)..

What's left after OCL 1.1 and stream sdk 2.3:
Well I expect Global Data Share and shared registers extensions,3d image writes, true complete DPFP support (cl_khr_fp64), complete BLAS and FFT lib (as CUBLAS and CUFFT in CUDA),  pinned mem working, host mem accessible from GPU extension, gather4 instructions for image support in OpenCL, and working concurrent kernel and mem transfers (i.e. concurrency in oclCopyCompute CUDA 3.1 example >=20%)


Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ▼  2010 (46)
    • ▼  July (4)
      • Some news!
      • DirectCompute Double precision Mandelbrot demo and...
      • A lot of things you probably don't know.. and a wo...
      • ATI Stream SDK roadmap
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ►  2009 (125)
    • ►  December (51)
    • ►  November (53)
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile