GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Sunday, 13 December 2009

What I would want to know and get from vendors part II: Nvidia

Posted on 06:43 by Unknown
1. Access to WGL_DX_interop OpenGL extension documentation and headers: this extension is shipping since late August in NVIDIA drivers and is very powerful as it provides a fast path between OpenGL and DirectX (interop) so stuff from one API can be seen by other with no host interaction (I mean no transfers to host as I currently needed).. it was talked at GTC 09 but spec was not released..
Also can you say a expected time of when it will be avaiable for Windows Vista/7 users since it's only avaiable on XP currently..
you may think what that could add to the mix at least two/three things:
*Access to GPU video decode (DXVA) and feed that to OpenCL with lowest overhead..
Nvidia can tell us they have CUVID with OpenGL support but assuming someday ATI supports similar extension we can have a crossvendor code path via DXVA..
*Access to current state of the art efficient fluid rendering is currently shipped as a library part of Physx Screen Saver source code..
This accepts only Direct3D interface so accessing for OpenGL needs some interop if wanting to be done efficiently..
*Accessing Direct3D 11 functionality like tesselation from OpenGL .. interchanging tesselated stuff to OpenGL all in GPU mem..

2.Access to NVAPI NDA SDK: this could enable a killer feature.. since I think since 195 Nvidia drivers NVAPI has the capability of getting GPU load and memory bus load and video decoding unit load (think Bluray GPU decode).. (only GT200 and higher and 190xx)
This is used in GPU-z 0.38 so at least some developer has access to this functionality.. I think GPU-z uses NVAPI..
NVAPI public doesn't expose this..
I think this API allows access to 3D Vision internals stuff.. (see below..)
I have tried to get access but you need to be in Nvidia Registered Developer program.. I have tried many times to sign up but I get no response.. this presumably allows to get also access to latest driver builds..

3. Access to Nexus GPU debugger beta ->released
( Doing GPGPU stuff could be done a lot easier with a GPU debugger.. Nexus was scheduled to get released in beta in October.. I have signed to the beta program but I get no response other than in late October that in two weeks we would get the beta build.. )

4.CUVENC lib headers and documentation: For having GPU video encoding.. Nvidia ships in standard drivers similar to CUVID library CUVENC library for accessing GPU hardware encoding and it's used by a lot of commercial video encoders with CUDA support.. in fact all are using this library..
the problem is that it is only exposed to partners I think.. it's not public.. I think now Windows 7 we have Windows MFT library for accessing GPU video encoding I have to test it..

5.Access to documentation about Fermi OpenGL Direct3D 11 like extensions: there is some info in GTC presentation but still no headers or things for working on "it" for real..

6. Access to 3D Vision internal APIs, thats what's Avatar game are getting i.e. I get access to ways for sending a frame to each eye bypassing Nvidia 3D driver..

more or less the same:

About Nvidia source code
========================

1.OpenCL port of the DirectCompute Ocean demo source code? it was shown in OpenCL tutorial in GTC09..
I hope as Nvidia ships DirectCompute Ocean demo source code, Nvidia Ocean OpenCL demo is going to ship soon in GPU Computing SDK..
can someone confirm that and provide us in the meantime the code?
I would love to learn the differences between DirectCompute and OpenCL from other perspective i.e. seeing such complex code (has high perf FFTs in it) side by side as
I want to make some common wrapper around DirectCompute and/or OpenCL and/or CUDA..


2. Physics demos using GPU Compute APIs either using as a base GPU enabled Bullet code (rigid bodies stuff by Harada) and/or using Phyx fluids but coding efficient fluid rendering is
complex to do..
I have seen Nvidia fluid demo (OpenGL) use this technique:

"Screen Space Fluid Rendering with Curvature Flow"
Wladimir J. van der Laan, Simon Green, Miguel Sainz
Some authors are Nvidia guys..

also seems "Physx Screen Saver" uses it (DirectX)
the code is avaiable http://files.thegamecreators.com/darkphysics/ScreenSaversource.zip
but the rendering fluid functionality is a directx based compiled lib:
dxFluidRenderLib.lib
dxFluidRenderer.h
As I want multiOS support I would love or source code of that library so I can modify for OpenGL usage or compiled OpenGL based libraries for
Win/Lin/Mac ..

3. Massiliamo Fatica of Nvidia done a port of Linpack to use both CPU+GPU load balancing them..
"Accelerating linpack with CUDA on heterogenous clusters "
in CUDA forums said that is distributed to universities.. can I get it?

About kernel binaries:
=======================
I think that's the most ridiculous question but anyway for CUDA and OpenCL we can store "compiled" kernels in PTX and launch kernels from that code..
I know that PTX is virtual isa so allows you to target multiple architectures now my question is if PTX generated by nvcc or OpenCL builtin compiler
is mature enough that can not pass that say one year ahead new OpenCL builtin compiler or new say CUDA 4.0 nvcc gets PTX that in turn provides better performance..
I hope a generated PTX generated now achieves same performance that if we compile the kernel to PTX next year..
i.e. that all optimization can be extracted from PTX code..
If not I will have at least for OpenCL to supply kernel source files and compile on the fly..

Also compiling CUDA 2.3 kernels we get PTX 1.4 and OpenCL generated PTX is v1.5 and in CUDA 3.0beta (at least for Fermi target) I seem we get PTX 2.0..
In SDK we get v1.4 doc, current CUDA 3.0 SDK beta 1 provides no PTX 1.5 nor PTX 2.0 info..
Can we get access to these new PTX specs documentation?..




About CUDA 3.0:
===============
It will be good having a module that is able to get info about specific instruction issue rate and latency similar to GPUbench
http://graphics.stanford.edu/projects/gpubench/test_instrissue.html
Well the problem lays in that there are currently some PTX instructions that aren't visible from CUDA C..
This guy for example exposes native addc instruction:
__addc / __uaddc: signed and unsigned addition-with-carry. Carry flag after addition is set automatically.
http://www.mpi-inf.mpg.de/~emeliyan/cuda-compiler/
You can find a paper where he motivates this effort for having some speedup in some integer related scientific codes
see "Efficient Multiplication of Polynomials on Graphics Hardware"..
He is providing a diff to cuda Open64 sources (2.2 I think) and also new headers..
can be this support be added so I perhaps we can instruction issue rate of this instructions..
if not I can manually compile patched sources for every architecture of our benchmark.. (Win,Lin,Mac)(x64 and x32) but I think I will do not..
I said that because some integer multiprecision libraries have a similar problem (It's impossible to access add with carry op from C without having to
add assembly code..)
Now a mix of some previous questions:
It's possible to access native add with carry in Nvidia GPUs in OpenCL?
I think the answer is no and I believe that could be fixed if there was interop between OpenCL and CUDA generated PTX code.. I would with the Cuda addc
enabled compiler compile and addc function and call that from OpenCL..
Anyway also having PTX 1.5 spec documentation will helpfully to find how to patch PTX OpenCL generated code for using that..
Yeah I know that all of this is not in the OpenCL spec support.. but anyway worth investigating..
(I will love to ask this to AMD engineers also enabling use of add with carry if existant in r8xx via use of AMD IL generated code ..)

I see that CUDA 3.0 has surface instructions cusurf..
this is Fermi stuff correct?
Seems that this instructions allow "true" writable textures (I mean without having to use CUDA 2.2 "texture from pitch linear mem" functionality)..
and so have (x,y) addressing for writing to it (so its equal in concept to DirectCompute RWTexture2D?) and presumably format conversion on read/write(?)..

The unique objection I have is I can't find in headers 3D surfaces but I hope 3D surfaces are supported similar in hardware in Fermi due to RWTexture3D in D3d 11 so
I can expect to have 3D surface functions in CUDA 3.0 with Fermi (i.e. I want "true" writable 3D textures..).. I want that to use for 3D stencil codes..
For GPU codes without this support I can use 3dfd code of Nvidia GPU Computing SDK that I think is based on:
3D finite difference computation on GPUs using CUDA
de P Micikevicius - 2009

About CUDA multicore:
I know that Nvidia is still working hard on it because of:
1.http://llvm.org/devmtg/2009-10/Grover_PLANG.pdf
"PLANG: Translating NVIDIA PTX language to LLVM IR Machine"
2.I have seen in CUDA 3.0 beta nvcc binary some strings related to multicore-llvm
seems you have switched from the idea of:
"MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs"
to a more hopefully better one i.e. translating from PTX to LLVM and then using
LLVM efficient bakends for x86..
The question is if that is going to be avaiable soon enough
This will allow me to compare perf in this mode versus check the perf of CUDA ported to OpenCL and then run on OpenCL AMD CPU backend..
or having to write efficient CPU codes..
I'm thinking in sort test examples:
Currently GPU fastest seems to be:
"Designing efficient sorting algorithms for manycore GPUs"
Nadathur Satish, Mark Harris, and Michael Garland
the code it's in CUDPP 1.1 and CUDA SDK sample already..
The problem CPU most efficient one seems to be "Efficient implementation of sorting on multi-core SIMD CPU architecture"
and care has to be taken of writing SIMD enable code..

DirectCompute questions:
=======================
As I know there a basically two models 10_x and 11_0 i.e DirectCompute 4.x and 5.0..
well the problem lays in that CUDA hasn't any restrictions on writing to shared mem and codes I plan on using presumably uses atomics on global mem (as CUDA GPUs except g80 support it.. somerecent codes use it..(?))
this code isn't going to automatically translate to DirectCompute 4.x..
This is no problem for Fermi and AMD 5xxx GPUs but as I think DirectCompute 4.x takes the "greatest common divisor" between CUDA cards and ATI 4xxx
CUDA cards are greatly in disadvantage.. so my question is if Nvidia can and want fix this issue..
Think similar as I remember to have read Nvidia enabled a d3d 10.1 feature in some driver for FarCry2 ?(related to multisample)
I mean at least it allows to compile kernels to cs_5_0 target in GT200 cards for example..
I know some things of these target aren't avaiable as shared mem size in GT200 cards for example is below required but I mean that if kernel uses GT200 hardware restrictions
(for example shared mem usage below 16K) features and requiring hardware resources avaiable in CUDA cards this could be enabled..
This could be a NDA feature(?) for example enabling cs_5_0_gt200 target (it's possible?)
Also a similar hack for enable doubles on GT200 via directcompute..

OpenCL:
======
Well I have to be frank I can find any issue worth mentioning in 195 drivers excepting:

1. I'm not happy with OpenCL Volume3D demo in Windows XP goes nearly as fast as CUDA one.. In fact I get sustained 60fps in CUDA vs 40-60 fps in OpenCL
with a 8600gts.. Note the same OpenCL Volume3D demo run at mediocre 14fps in a high end desktop with gtx 275 in OpenCL in Win7..
while the cuda demo runs at 60fps.. all 195.55 recent OpenCL drivers..
I think Linux OpenCL doesn't suffer also..
So seems the CUDA texture 3d support is good whatever OS but OpenCL Image support for 3D textures has perf issues in Vista/7 systems..
can anyone confirm if they going to fix soon or already fixed?
Doesn't seem ok to say it's because WDDM as CUDA seems not affected..
If I say this is because I want to love to code some volumetric rendering code also perhaps with 3D Vision builtin optional feature and seems that code will suffer with OpenCL backend..

2.I'm waiting for cl_khr_3d_image_writes..
is this is similar in concept to RWTexture3D I think, correct (i.e. (x,y,z) addressing etc..)?
but I think there is going to be hardware support for it only in Fermi and higher, correct?
Assuming that this is Fermi stuff will be avaiable say by Fermi launch drivers or it's already supported in 195.62 if we have a Fermi or there is no
specific time?
I think this allows high perf implementation of 3D stencil codes on d3d 11 architectures as texture is directly written using coordinates and reads
gets cached and at least this and advantage for architectures without global cache (AMD 5xxx cards)
Of course I'm aware of alternative techniques chaching neigboorhood values in shared mem and calculating the stencil from these values..

3. Could say at least if there is any way (hack) for accessing host mem from Nvidia GPUs in OpenCL backend ( pinned system mem in CUDA parlance )
I have no problem even if it's playing with PTX code..
If I say that is because I want to run kernels over big problem and
that would perhaps be lenghty in time enough so that a progress bar would be welcome.. I know of the watchdog time issue for kernels running for more
than x seconds and I think also Nvidia recommends dividing the kernel for solving this issue..
Yeah I know doesn't Nvidia recommend that..

Better would be some roadmap on an extension supporting this feature (by the way it's supported by the hardware on AMD cards also)

Mac stuff
=========

CUDA Mac:
Of course I plan to run OpenCL-OpenGL interop eneabled codes and CUDA-OpenGL ones and perf of course the CUDA benchmark on the Mac will suffer as still is going thorugh host the interop..
Can we expect it fixed sometime say before April-May 2010..

OpenCL Mac:
Can someone confirm if double extensions is going to be avaiable say in April-May 2010 (10.6.3-4?) on Nvidia GPUs GT200 GPUs for example similar to with 195 Windows and linux drivers.

OpenGL Mac:
Sorry if I'm so ignorant in this matter..
but what's the problem about Nvidia Mac drivers shipping still OpenGL 2.1 drivers
(yeah with some 3.x stuff).. I remember seeing a Nvision08 presentation by and Nvidia OpenGL guy saying coming to Mac all the OpenGL 3.0 stuff at that
time..
Nvidia is shipping in drivers download page "custom" drivers for GTx285 mac edition why can they ship custom drivers if not with 3.x support at least with all 3.0,3.1,3.2 ARB equivalent extensions and possible other Nvidia extensions..

Optix and Physx for Mac?
Assuming we want to port some simple GPU raytracing and GPU physics code can we have it working on Mac.. As nor Optix nor Physx libraries are avaiable for
Mac currently..
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ►  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ▼  2009 (125)
    • ▼  December (51)
      • GPU computing on AMD.. an history perspective!
      • Catalyst 9.12: hotfix (III)
      • Catalyst 9.12 Linux and Windows links and release ...
      • Source code of DirectCompute bechmark(OpenCL and D...
      • Catalyst 9.12 adds OpenGL 3.2 support (and more..)!
      • 16/12 news!
      • Catalyst 9.12 released
      • PS3 OpenCL work and AMD OpenCL ICD
      • Christmas Wish list (I): Monitors
      • 3d Stereoscopic players!
      • Today news!
      • What will I do if I have 3D Vision OpenGL QB
      • GLEW,GLUT,Freeglut, MesaGLUT and more
      • Nvidia 195 new drivers and Flash player beta 2!
      • Running ATI GPUs in Sisoft Sandra 2010!
      • Memcheck GPUs!
      • Emulate 3D kernel launch grid
      • things found in CUDA forums
      • Siggraph 2009 (Asia too..)!
      • Architecture ideas for future GPUs!
      • Dificulties in coding, achieving high perf an meas...
      • Learned from HPG09 stuff!
      • Nvidia driver 187.98 add new files!
      • What I would want to know and get from vendors par...
      • What I would want to know and get from vendors par...
      • Some news II (post #100!)
      • What I would want to know and get from vendors par...
      • physics on GPU: source code!
      • OpenCL with MingW! (and more)
      • Some news!
      • String matching on GPUs!
      • Lots of OpenCL soft coming!
      • 10 Raytracing GPU demos! (more or less)
      • New Nvidia tools and crossvendor GPU instrumentati...
      • About Catalyst 9.12 and 10.1!
      • CUDA 3.0 forums stuff!
      • Upcoming GPU tutorials!
      • News from the web! (9 December)
      • Compiling the CUDA compiler!
      • Understanding Nvidia GT200 GPU and CUDA implementa...
      • Open Source GPU Computing benchmarks
      • CUDA TopCoder contest stuff (with source code of t...
      • CUDPP news!
      • DirectCompute stuff!
      • Nvidia GPU computing news!
      • GPU Computing calendar for December 09 and January...
      • Nexus FAQ!
      • Nvidia Nexus beta1 GPU debugger shipped!
      • GPU virtualization (and what to expect in VMs)!
      • AMD OpenCL news! (almost all..)
      • News posted 2/12/2009! (megacompilation)
    • ►  November (53)
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile