GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 5 February 2010

A long report of the silence before the storm: AKA a month before Fermi..

Posted on 07:29 by Unknown
Sorry raw dump of my ideas:

Altough we are a month of a complete storm if we follow carefully we can hear some thunders of that storm known as Fermi and new software updates:

First the base read graphics arch (Nvidia GF100) and compute arch (Fermi arch)..

also see Deep Dive presentation having more perf chart vs PDF in noticias3d.com or ..
Also altough not kwnown there were two more Deep Dive sessions not much talked about developer relations program showing sled info about demo and Nexus graphics debugging (the first demo I have of debugging a HLSL video as CUDA video has been posted).
Search in cz page..

Tesla computing driuver
GFX cards:
4x slower doubles?

As you will know graphics arch reveal revamped geometry power via parallel rasterizers (4 so 4x perf) and 16x geo power via putting this 16 times..
also now geo buffer and stream out buffers are using L1/L2 caches (and atomics?) so much faster
and general (removing fixed funtion hardware)..
this can be seen at least a removal of fixed functions) and generalizing to work in parallel the rasterizer..
This impacts a geometry hard game as Crysis as 60% faster not bad expecting also shader power to be near to 2x increase..
and I think of GF100 as of 4 GPUs in one chip or GPC.. at has all it needs..

right now is GTX 480 and 470 has h.264 mvc support (bluray 3d by the way HDMI 1.4 3D spec is open) (will be exposed in DXVA or what? also in CUVID VDPAU and or CUVEND?..)
as you know in Mac GPU video encoding are supported by Elemental and video decoding by a shit api (QTKIT) which not exposes decoded frames as OpenGL textures or OpenCL image objects..
Elemental ships in 2.2 with her GPU decoding so have to see is a CUVID using Snow Leo APIs or using shaders..

Also I have seen HDMI 1.4 outputs in Fermi and this would be marvelous as to interop the output
of 3D Vision to Sony 3D monitors (but what glasses I use?)

Lastly 3D Vision has now tri SLI or quad SLI support and all new monitors 24 inch support (3 or 4 right now) I have seen 27inch monitor from ASUS for early June and panels with 3d Vision and touch support are being sampled I think.. but remember
Youtube 3D Vision support, windows supported and browser integration are promised soon..

There are reports that claim

SA 2009 courses things learned:
SC 2009 courses things learned:
I3D 2010 things learned:

would be perfect for a fraps grabbing 3d Vision
One thing I'm sad it will not be is this will be of use for not halting the OS and also in
I hope Nvidia are working on right at least for near future this year..
I can't understand why not would be the case..

1. altough this is not strictly Fermi related, the much needed updates of OpenCL in MacOSX and DirectCompute in Windows are coming in a month I expect..

Direct3D SDK updates are much needed after some 5 months (a 1.5 month before Windows 7 launch) )since last update something like prehistory is this rapid changing world :-)
I hope a GDC 2010 release (so 6 months later) at least with important fixes all know issues: for double support, CS library: FFT,scan, and other fixes reported on XNA forums..

Also would be good if some samples shown by Fermi Deep Dive session at CES are given as that seems DirectX samples and released as hair demo or tesselated water demo.. AMD did the same with 5xxx code (search contributed by AMD in Direct3D SDK)..

Also good demos of Ocean demos are shown by Nvidia a OpenCL code port of DirectCompute and AMD in SA 2009 OpenCL seesion.. would be good to have this..

I am also Nvidia ships more DirectCompute demos in GPU Computing SDK 3.0 final or beta2 which I hope will be released by Fermi time..

I also hope cuprintf released two months ago is integrated in CUDA Toolkit or SDK and hopefully
ported to OpenCL for GPU printf debugging support (as said AMD supports in Linux in CPU and coming to MSVC).. Anyway I expect OCL support to be somewhat restricted due to no template support, etc..)
I would port to OCL but anyway is confidential stuff right now..

See more debugging later..

I also want to talk about CUDA SDK 3.0 a lot more as about ELF, cuda memcheck, CUDA driver RT interop,etc.. but I will wait until final PTX 2.0, 1.5 (OCL) and docs are updated..

As a check point would be good to know how ECC and L1/shared cache is configured enabled..
I remember seeing in some Quadro 195 driver released seeing something about ECC in Control Panel..
but I don't know how L1/shader mem cache is going to be used (parameter to nvcc?, CUDA API fuction,etc..)

10.6.3 is coming this month and has OpenGL 3.x support (well 3.0 seems) (altough netkas claims that not complete as OpenGL extensions viewer doesn't claim GLSL 1.5.0 required support I think this is related to no info on GL 3.x context creation has been published so it's not creating an advanced context but extensions are there.. also comparing to 10.6.2 I see two more 3.2 extensions are supported not bad.. I only hope they are two interesting ones and not directx helper extensions.. give me that plus uniform_object and TBO from 3.1 and I would be more than happy..
So I hope this are at least supported as extensions in Nvidia driver or AMD 5xxx driver..
at Netkas seems is reporting software renderer extension..
oh boy if Apple cared less about a stable platform and give GPU extensions as fast as they come in Windows and Linux would be perfect I don't care about OpenGL 3.x being implemented in software seems a mad situation as much as if Microsoft cared about DirectX reference rasterizer for running actual games (ehem it has WARP..)
If not at least expect 3.1 complete by summer (=10.6.4 or 10.6.5) and perhaps 3.2 by end this year.. so seems 3.2 complete this year..
I hope by that time having also optional 3.2 ext:
GL_ARB_draw_buffers_blend
GL_ARB_sample_shading
GL_ARB_texture_cube_map_array
GL_ARB_texture_gather
GL_ARB_texture_query_lod
at least
GL_ARB_sample_shading
GL_ARB_texture_cube_map_array
GL_ARB_texture_gather
for me are good.

News are that at WWDC is showing 10.7.0 and if you remeber in 2008 had GT200 support so perhaps at least 3.2 complete and Fermi support will be for 10.7.0 WWDC seed..

Also altough a bit premature would be good if with initial 5xxx and hopefully coming this year Fermi support adds also new shader 5.0 extensions (more later)
for me would be perfect similar to Leopard having in 10.5.2 at least a lot of G80 new extensions in Nvidia supported (geo shaders, texture feedback,etc..) ..

OpenCL for MacOS: FFT library perf fixes, also expect some improvementes as double support for Nvidia on GT2xx cards, ATI image support at least this is where I will put my effort being Apple.. Still the bad thing is Apple is no 5xxx support as AMD 4xxx don't have true local mem but this can be changing fast if rumors are true of a expected MacPro shipping this or next month with 24 hardware threads (2 6 cores 32nm Westmere) and hopefully a 5xxx card as option so perhaps good..

Before leaving MACOS also I expect CUDA updates for 3.x:
Talking CUDA on MACos:
you have cuda memcheck
cuda-gdb coming soon.. will add OpenCL at that time also?
cuda 64 bit support (for 3.x)
cuda opengl efficient support (not hoped but can be)
also would be good if for hackintosh users can use Fermi on CUDA 3.0 in MAcos..
i.e. cuda.kext exposes access to that..

Also remember Fermi support will not be completed by 3.0 release well at least if not released as beta2 in march and delay 3.0 for June summer..
so expect a lot more for 3.1 and perhaps some minus things for 3.2
if you not follow gt200 intro, 2.0 had double support and shared mem atomics but until 2.2 we hadn't host pinned mem a feature of gt200..
Amongs the things said to not be present at first are support of recursion and I think also virtual fuction calls and function pointers but I could prove wrong..

Of course this hardware features are supported by here own or since beta: 8x faster double,10x faster context switching and atomics and caches by her own and concurrent kernel and dual dma in beta.. this last two using

Talking about OpenCL:
I expect Nvidia 200.x drivers to add support for DirectX extensions (see GDC 2010).
cl_nv_d3d9_sharing
cl_nv_d3d10_sharing
cl_nv_d3d11_sharing
are published in Khronos OpenCL registry
also in 196.21 I see some d3d10 fuctions..
but that seems crazy as AMD is own DirectX extensions..
would be good khr_dx..
also 3d image writes for Fermi and perhaps half extension for all cards..

Talking about OpenGL:
By the way seeing Nvidia GDC 2010 plans seems WebGL is launching (final spec) at GDC and also expect some updates to OpenGL: well I expect a bunch of EXT extensions and NV AMD extensions supporting new D3D 11 hardware..

well since now we have shipphing two extensions in 196 driver not documented: nvx_meminfo and wgl_dx_interop..
also ATI has added: GL_AMD_shader_stencil_export
GL_AMD_seamless_cubemap_per_texture supprot in 10.1 but this is documented in Khronos (
also added GLX_INTEL_swap_event)
this last is interesting for async glutswapbuffers and events for qeurying when complete not waiting for vsync or similar..

also I hope similar to that VDPAU will come with efficient GLX interop since now it has some overheads and perhaps last extensions can help..




First see GPU Computing tools:

Regarding hardware debugging-> lots of news.

See:

With all these references you know:
For Windows you have Parallel Nsight (codename Nexus) which supports GFX and Compute debugging, profiling and API tracing all integrated in Visual Studio 2008?.. (at least now support CUDA C and HLSL DirectX 9/10 seems)..
The problem is no Windows XP so this platform..
of course upcoming is Direct3D 11 and we hope OpenCL and GLSL but that can be sometime later..
Also release (beta?) is targeted for Q1 2010..

Nvidia names Pro version with Direct
On other OSs you have Visual Profilers for CUDA/OCL in Linux/MacOS..

With that you know that cuda-gdb already has Fermi hardware support and is getting soon support for MacOS and also for OpenCL.. Use it with DDD or Emacs and you have for other

Recapilutationg earlier posts:
Solaris and FreeBSD support for CUDA is working PGI using Noveau stack..
GPU Computing book and programming gems

Raytracing:
Well you have Optix 2.0 beta1 now supports Geforce and Fermi optimizations are promised soon..
CEs videos show frame reate from 0.23 to .67 for a complex demo..
Now don't pred
It's also curious how now they claim that cache helps a lot Nvidia claim 3x improvement over GT200 (well the arhictectural perf increase has to be mitigated by core count (240/512) and speed diferences if any) so seems to me no more than 30% increase in perf per core per clock due to caches in turn agrees with

CUDA multicore:
Well that's hurting me as this is one of the true strenghts of OpenCL right now and Nvidia seems to have left both as initial work was not very good (MCUDA) download it and see a lot of restrictions (texture support not)
and AMD how it is:
well see OpenGL extensions

well with this you can at least check the diff between what I claim and

Catalyst 10.2 RC2 expose that AMD is going the route of exposing extensions as EXTs ones so Fermi and AMD will interoperate and hope Heaven OpenGL demo with tesselation for Linux (windows support also?) is Fermi capable because of that but that also seems no support until March/April 2010 (10.3 or 10.4) as 10.2 has not exposed it..
GL_AMD_gpu_shader5
GL_AMD_conservative_depth
GL_EXT_texture_buffer_object_rgb32
GL_EXT_gpu_shader_fp64
GL_EXT_tessellation_shader
GL_EXT_shader_subroutine
GL_EXT_gpu_shader5
this are found on that and as you see
GL_EXT_gpu_shader5 and GL_AMD_gpu_shader5 seems similar so
no interesting AMD extensions excepting
stencil shader write GL_AMD_conservative_depth
amd random access target..
CAL is at 55x build now
in March at 6xx build final OpenCL SDK
you can find on 10.2:
Hull shader(s) were not successfully compiled before glLinkProgram() was called. Link failed.
Domain shader(s) were not successfully compiled before glLinkProgram() was called. Link failed.
gl_FragStencilRefAMD
subroutineEXT uniform

I have found some of this on Nvidia driver so seems crossvendor D3D 11 OGL extensions are
coming soon (nvidia launch day and ATI at GDC or April or May I hope)

Hopefully Ubuntu 10.4 AMD driver (fglrx 10.4 beta) ships in mid March also has adds with OpenCL in driver support so no more SDK with OpenCL.so that would be perfect if they can ship also with image support, production ogl interop and byte_addresable_store.. assuming atomics local and global are prodution quality I don't know.. also hope that as VGA arbitration is supported I can have simultanoeus AMD and Nvidia GPus working and OpenCL detecting two platforms.. A dream come true :-)

Also from GDC 10:
Nexus:NVIDIA's New Game Development Environment: NVIDIA Parallel Nsight
http://developer.nvidia.com/nsight
seems APEX tools are coming (anounced detailts at GDC 09)
and for Tegra profiling PerfHUD ES coming..


Latly not related but talking about Ipad and MacOS in general..
first MacPro said with 5xxx and also 10.7 seed in June and touch Imacs coming..
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ▼  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ▼  February (15)
      • Reading Fermi CUDA stuff!
      • Questions about OpenCL AMD d3d9 interop!
      • News 25/2!
      • 3 new tools!
      • Ideas for porting algos to GPU:AVX SSE and MMX ports!
      • About ATI and Nvidia drivers (OCL included)!
      • Shaders: measuring perf, source translation and pa...
      • Enabling OpenCL Image support on AMD GPUs!
      • Running QT everywhere!
      • Parallel algorithms avaiable on CUDA,OCL,DC,CAL: s...
      • More news!
      • Learned from voxel rendering demo code: CUDA 3.0 h...
      • A month of news!
      • About Tesla computing driver!
      • A long report of the silence before the storm: AKA...
    • ►  January (14)
  • ►  2009 (125)
    • ►  December (51)
    • ►  November (53)
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile