GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Sunday, 4 July 2010

A lot of things you probably don't know.. and a worth it..

Posted on 12:06 by Unknown
*TCC support for GF100 products will be out next week also this drivers will add support for simultaneously running this drivers with normal graphics drivers (that support OGL,DX,DXVA,etc..) I suspect graphics and TCC driver will have to have same version as both write dll's in windows system..
I hope still inf trick works so I can enable on Geforce Fermi and also that this works with Nsight also.. anyway is not severe as 25x drivers seems to add support for CUDA cards (Geforces even) without extending desktop on it so kernels exec time needn't be time limited for TDR.. before it required to use two Nvidia cards and one can be not desktop extended but if you used say a ATI card and a Nvidia card without desktop extended on Nvidia so to use Nsight for example (which requires no desktop extended) it will fail since CUDA will not find a CUDA card..
*There is support for Fermi on MacOs right now on Nvidia 19.5.8f03 drivers released month before but wuthout reposting so have NVDAGF100HAL.kext..
Anyway it only works OGL support as both CUDA and OCL don't use it..
I have to use NVloader injector which anyway doesn't work with Fermi on 64 bit kernel mode.. note gf 275 works in 64 bit with this injector also..
note i wanted to fix and all I found was a cuGetExportTable and something like MacCompatibiltyTID used by a checkcompatibility executable perhaps fixing it will work..
One in Nvidia forums assumed OCL broken fixed creating a OGL context beforce searching for OCL devices (oclgetdevice) but this trick didn't work..
*Storing ELF binaries instead of CUBIN deletes use of decuda hopefully one very interesting solution is..

*Seeing MAGMA webinar seems big release for SC2010 with some big features check magma presentation for what to expect..
*Physx 3.0 nearing to launch as Physx Visual Debugger includes support for it in release note says..
Note this brings concurrent kernels support for Fermi for improved perf on physics simulations.. hopefully also includes wrinkle meshes feature studied by Mueller.
Note also GPU AI notes once Function pointers supported on CUDA will use it so expect a new release sometime optimized even more for Fermi too..
Probably anuonced at Siggraph.. even launching later..
Hope too see also APEX shipping for other than Big AAA games i.e. downloadable for everyone..
Lastly I expect Optix 2.0 and Cg 3.0 final  for Siggraph and let's see also in time OpenRL with OpenCL support for GPUs would be interesting for ATI.. Note also Luxrender GPU 1.6 brings Stocasthic Photon Mapping and uses OCL on ATI GPUs also..
*Nsight also is moving fast from beta in early June now is RC state.. launching at siggraph?
*ATI Doubles on DirectCompute are broken.. altough feature flag is supported..
now we can test it with June DX compiler before it was broken for doubles inside control flow (loops, if,etc..)
Mainly compiling works but rendering shows issues vs Fermi which supports nicely..
Download my code.. (coming soon..)
*ATI GLSL driver is somewhat broken at least seems to geometry shaders as I fixed Nvidia Physx fluid demo to use non Cg code on GLSL code and some other fix related to point rendering and now seems to work but not without instabilities present as noise in screen even outside the window it fills..
Download ant test.. (coming soon..)
Also GLSL driver don't implement fetching integer textures with integer coordinates (texel2Dfetch( itex))
*CUDA 3.1 ships with three interesting examples: one is oclTridiagonal a fast tridiagonal solver.. interesting for a DoF cinematic renderer as in Metro using OCL/OGL..
other one is oclCopyComputeOverlap shows two things one is that concurrent kernel and exec is possible in OCL.. via command queues also shows there is an issue in 25x drivers that prevent full scaling I think good is 30% faster code and I obtain 20% on 25x drivers.. on 197 drivers I obtain 30%..
note that on both ATI and Apple platforms even with Nvidia GPUs exhibit no scaling and even negative scaling (-15%)
Good is that is fixed issue in 258.19 OCL 1.1 preview drivers with report CUDA 3.2 so I obtain back 30% overlap.. Note that other 258 drivers don't work (as they report older CUDA code 3.1 and OCL 1.0)..
One more interesting thing is that supposedly even dual dma engine is suposed to work on ocl so overlap would be 50%.. seems restricted to Tesla but Nvidia has been less detailed than double capping on Geforce..
Luckily I have a trick for you 197.44 driver seem to support Dual DMA engine on Geforce Fermi too!
This is OGL 4.0 driver so all you lost to current 256 drivers is CUDA 3.1 features only.. Linux also use OGL 4.0 driver on developer.nvidia.com and you have it...
Note also 197.75 etc don't work only work with this..
*So seems DUAL DMA engine is broken/disabled on Geforce Fermi without any reason other than economical..
*CUDA simpleStream seems to show broken streams on Fermi but it's due to not sending enough work.. a simple fix..
*Matmul by Lschien is one of the fastest ones for CUDA but it fails currently on fermi due to using cubins with obtained modifing tesla asm via decuda cudaasm.. thanks god seems related to volatile keyword don't working correctly pre cuda 3.0.. author suggest a fix assuming this works that uses cuda variant 6.. I have tested and it works so it's fixed I obtain near 850Gflops on Fermi 470 at 1650Mhz..
*Lot of soft updated to CUDA 3.x even 3.1 right now: NPP 3.1,CULA 2.0, JACKET 1.4,OpenMM 2.0 on Zephyr SVN, Gromcas 4.5 beta,GMAC, etc..

More news:


Also Nvidia has released a lot of drivers on 256 brach lets see rough differences/progression:
197.44 first OGL 4.0 driver and also unique supporting Dual DMA engine on Fermi on on Tesla/Quadro boards.. also has no issues in single dma..
256 add cuda 3.1 currently all has issues in concurrent kernel and exec on Fermi at least on OCL
257.15 bluray3d
257.19 nsight june beta drive
257.21 whql (supports nsight)
257.29 ion support accelerated dxva flash with pciex 1x devices
258.18 ocl 1.1 beta (says cuda 3.2!) fixes oclCopyCompute issues (but single DMA on Fermi)
258.48 first supporting Quadro Fermis..
258.69 shipping with 3d vision surround (Nvidia ntersect says youtube 3d support coming soon.. also I hope they add windows DX 3d vision support soon..)
Some other striking news :-) are:
*OpenCurrent 1.1 ships with CUDA 3.0 and multigpu code..
well I have been testing with CUDA 3.1 because I have Ubuntu 9.10 and with CUDA 3.1 GCC 4.4 works ok (so Ubuntu 10.4 is right also..) and has some issue related to now supporting true functions I think I must add some static to a function as cuda 3.1 release notes porting guide says.. with CUDA 3.0 GCC 4.4 doesn't work so I have to check with a Ubuntu 9.04 if I don't fix..
*OpenMP to CUDA compiler is avaiable in Cetus 1.2.
*PGI 10.6 is avaiable integer support in kernels and VS 2010 support at least.

I have tested GATLAS and is good at least 260 gflops on a gtx 275.. and I tested on MAC so at least works in Lin and Mac without much work and says author with 5870 and stream 2.1 achieves some image kernels 1,3 tflops so similar to cal++ matmul in OpenCL! have to test or modify code(?) for double testing..

Some tricks and work to do:
RAW DATA:
I know its lame but at least you can emulate 3d image writes on cuda with surfaces using ptx 3d tricks (post later).
I have to put a sample of CUVID on MAC.
SimpleStreams in cuda seems fermi bad in forums says increase work to 500.
matmul chien says put volatile and check (works!)
bsgp fermi support checking mail with author..
sparse matrix ati code test on fermi..

See fermi benchmarks:
nvidia benchmarks in blog
openvidia benchmarks..
cula blog
jacket blog
same papers of hpg2010 presentations billeter scattering and aov mcguire..
seems also code of rasterization and color stocastic shadow map coming soon..
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ▼  2010 (46)
    • ▼  July (4)
      • Some news!
      • DirectCompute Double precision Mandelbrot demo and...
      • A lot of things you probably don't know.. and a wo...
      • ATI Stream SDK roadmap
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ►  2009 (125)
    • ►  December (51)
    • ►  November (53)
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile