GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

Hi,
before talking about just released AMD Stream SDK 2.0 in detail.. we are gonna see how AMD/ATI GPU computing stack has evolved over last few years..
I think obviating GPGPU programs running via graphics API with tricks the first serious AMD GPGPU promise come with a Siggraph 2006 paper named DPVM (data parallel virtual machine) if I remember.. in September of that year both the CTM and Firestream cards were anounced..
the FireStream cards we merely ATI Radeon perhaps with lower clocks for reliability and higher RAM capacity.. at the time it all was R580 based.. it was high prices..
the CTM was announced to avaiable as NDA stuff in time for Christmas 2006.. promised assembly level programming for AMD cards and a feature highly wished was scatter GPU (Xbox 360 AMD end 2005 GPU).. Note also not was a virtual language assembly portable among GPU generations so if you take the herculean effort of programming in that interface benefits will be lost in R6xx generation..
In this time 110 Gflops matmul was state of the art..
Note at the same time Nvidia anounced on 8 Nov 2006 CUDA API and G80 CUDA hardware being C level, PTX portable assembly with scatter but also thread groups coperating with threadsyncs and local fast mem.. It was released as alpha in that year.. this become in time for Christmas..

all in all the GPU software was OpenGL and DirectX so Folding²home was..
come 2007..
this year r600 hardware launched at just before summer 2007 and CTM adding that support come..
still no AMD IL I think.. I think I downloaded PeakStream beta that have CTM libraries compatible with r600 hardware also.. This was high level stuff which linked to CTM libs.. there was no much time avaiable as was aquired by Google.. the big news just before end of 2007 come..
Just for Christmas came Radeon 3xxx series which among adding Direct3D 10.1 support in hardware (APIs RTM in Feb/Marc/aPril next year) added new GPU computing features as doubles..
This end of year AMD released first public (no NDA) and virtual assembly compatible among generations (AMD IL).. that was named CAL.. also come with Brook+ which provided a Brook CAL backend with added improved performance and reliabilty over OpenGL stuff.. also has hardware features of 3xxx exposed as doubles and scatter, global bluffers,etc.. Still now sutff using it.. Folding²home come with Brook+ came mid 2008.. Still Brook+ exposed no scatter, multi GPU, pinned mem, doubles, etc..
By the same time (and over 2007) Nvidia had achieved CUDA stable release (summer) and just for end of the year textures support.. and concurrent kernel and mem transfers, async kernel exec and mem transfers, atomics to global mem.. for CUDA 1.1 hardware which was all present excepting 8800GTX/GTS..
Come on 2008..
This year 48xx hardware come exposing among all "compute shader" i.e. more cudafied view of work.. so local groups with local mem and forgetting pixel shader.. Still many issues as local mem has strict write rules (each thread writes to its area..) and no 2007 CUDA stuff (atomic support,etc..)
By end of the year 2008 AMD just shipped CAL libraries as part of Catalyst release..
added 48xx features as compute shader mode for CAL programs exposing LDS (local mem), shared registers, thread id, etc.. and also stabilized a lot this API adding features (textures support, Vista, VC 2008 support)..
This year OpenCL spec was shipped and AMD comitted to having and implementation in H1 2009 and production code by end 2009 or begin 2010..
Still remained some issues: fixing multiGPU for 4870x2, and Brook+ features..
That year was good for Nvidia in mid 2008 introduced GT200 arch with doubles, relaxed mem coalescing rules and bigger register file among others.. and other features that will be exposed in 2009 (as capacity of accessing host mem from GPU kernels)..
CUDA progressed towards 2.0 (summer) with GT200 support, doubles, Vista support, 3D textures, matmul volkov code,etc..
By end of the year CUDA achieved 2.1 with GPU hardware debugger for Linux as alpha, and VC 2008 support..
Come on 2009..
That year AMD released new Direct3D 11 hardware with r8xx hardware which brought compute shaders and was the first AMD chip designed for OpenCL and DirectCompute..
the hard
AMD has ended having OpenCL production and DirectCompute for 5xxx and 4xxx hard..
see more next post..
Also OpenGL 3.2 plus ARB ext bring Direct3D 10.1 support..
So what's lacking or expect having in H1 2010..
I hope first both DirectCompute and OpenCL stabilize.. at least their supporter implementations..
first Apple needs fix perf and runtime issues which prevent still running complex and high perf code (as their FFT lib) of running at full speed on Apple platform.. also provide up to date fatures on GPUs ofered by their vendors (Nvidia add doubles.. AMD in 4xxx image support and gl_sharing.. in 5xxx drivers all 5xxx stuff).. a lot of that is 10.6.3 stuff I hope..
Also Microsoft has to ship fixed DirectCompute compiler for double usage and other bugs..
Also AMD has to finally have usable ICD dll by Nvidia and AMD, image support, OpenGL interop general (fix OpenCL create of context just before OpenGL creation of resources, and OpenGL image interop), byte addresable, 3d_image_writes.. and doubles general usage..
More long term expose 5xxx stuff as OpenCL ext (AMD_IL assembly insert in OpenCL kernels.. GDS.. wave sync.. concurrent kernels.. virtual functions)
Finally AMD needs bring OpenGL AMD 5xxx extensions..
I expect 10.6.3 seeds this year.. some leak by end Januray.. release February or early March..
Direct3D SDK for January or begin February..
Next AMD OpenCL SDK by end Feb 2010 or mid March 2010..
And OpenGL 5xxx extensions I am pessimistic and hope doc before GDC 2009 so mid march 2010..
Nvidia instead I hope by (mid) february will release Fermi with DirectCompute 5.0 and OpenCL with 3d_image_writes.. all release day.. Also OpenGL Fermi extensions..
So all in all at least by 15 March 2010 more or less we will have all I want.. excepting OpenCL 1.1 and AMD OpenCL 5xxx ext.. thats for end H1 2010 or latter..
Also perhaps for H1 or H2 is Apple OpenGL 3.2 support and OpenCL for Fermi and 5xxx that year..