March 2013 ~ GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

Wednesday, 20 March 2013

2013: a good year for new API revisions and launches?

Posted on 11:05 by Unknown

Hi I was thinking this year could probably be a year with a lot of new specs published related to GPU world..
just released:
*OpenMP 4.0rc2 with accel device targets added
*OpenACC 2.0 brings GK110 cuda 5.0 features to directives world..
I expect OpenACC 2.0 support PGI imp in June which will also give support for MIC and AMD via OpenCL (so Intel HD5000 too in Windows version?)
also perhaps CAPS for summer..
Regarding OpenMP 4 I see Intel Fortran update with SIMD features already and perhaps some beta by summer for Intel CPUs (and Xeon Phi?) maybe production quality at SC13 both Intel compilers 2014 and some PGI beta..
Not so HPC world related, let's see:
*WebCL final spec should be coming soon.. let's see how fast they progress.. I would like to see optional 1.2 support with all new SA2012 exts for MSAA and depth sharing.. what troubles me is OpenCL extension mechanism for getting function address..
*WebGL 2.0 should be announced exposing OGL ES 3.0 to WebGL world.. I think ANGLE support is a year lagging.. I still would want a version with "destkop profile" with core support and number version setting (like 4.2 core) and optional extensions usage if wanted..
*OpenCL 2.0 for Siggraph?
*Dx 12? for Windows blue alpha this year at PDC (Oct-Nov 2012) I hope is a major revision spec with new DirectCompute features (recursion, pointers,etc..)..
*C++ AMP 2.0? new C++ AMP with function calls recursion separate compilation and pointer support would be good..
Khronos also is expected this year to bring:
*StreamInput for unified LeapMotion, Occulus and Kinect support among other things..
*OpenCV like api
Also next year with Maxwell expected DX12 support:
*OGL 5
*CUDA 6
and for mobile platforms we expect at least OGL ES 4.0..

Posted in | No comments

Nvidia GTC thoughts: ARM,roadmap,demos..

Posted on 06:24 by Unknown

Last day anouncement of full OGL 4.3 support on Tegra products on ARM is huge..
I assume also Nvidia will bring all his supported extensions such bindless exts,VBUM(pointer support in GLSL), direct state access etc.. and not only core OGL 4.3..
also related Nvidia exposes some features not exposed in DX11 like support for writes to image2DMS objects will be like RWTexture2DMS if that existed on DX11 (note only RWtexture2d and texture2dms exist)..
Nvidia showed on Ubuntu and one question will be if they will/can? bring all that goodness to Android world.. I suspect at least if not they will bring via a lot of extensions to OGL ES driver similar with what they are doing to Tegra4 this year..
Also note they shown 319 drivers series and hope this will bring EGL support in NV drivers for Linux at least (related to Mir on Ubuntu efforts) and not only EGL OGL ES support like AMD currently but full OGL profile in EGL and as said for Windows too.. note I'm expecting also Optimus in Linux world soon so that could be the series..
note Nvidia showed a demo running Optix on ARM (on his blog says 1day port) and also CUDA 5.0 demos and finally IslandGL well this is a port of one of Fermi DX11 demos ISland which is avaiable to download as demo on Nvidia and features highly tesselation usage.. with that comes clear why renewed interested in NV to bring an OGL 4.0 SDK.. hope almost al techniques in NV DX11 SDK get ported as stocastich transparency to name one..
with that I recap tooling support so we experience seamless transition from PC world to Tegra world..
*Cg for ARM: bring Cg to ARM world or at least and offline GLSL 4.3 compiler that cgc has..
*OGL 4.0 SDK for ARM: as said new OGL SDK is in the works and a ARM version should be good
*OpenCL for ARM: Nvidia doesn't talk about it but I think market pressure will force them to port to ARM also.. note still no OCL 1.2 with new exts in end 2012
*TXAA: With OGL 4.3 and AAA games becoming norm in Android and IOS markets is a matter of time until one wants TXAA that tegra5 will have exposed in OGL..
*Optix: well shown so ported..
*Physx and Apex: Physx is for Android shipping interesting will be to see if they enable GPU support once Tegra5 ships with CUDA GPU (I suspect yes expecting 128/192 shaders and that also had G80 which they supported).. more interesting will be to see if Apex (right windows world only) get ported as that brings turbulence /GPU rigid bodies with fracturing the former one which is being used a lot lately in f2p games..

*Also note that will bring full DX11 to Windows on ARM products which is impressive.. I'm not certain if DX11.1 or not as GPU shipping in Kayla could be GK208 a 3rd gen Kepler says press so that feature could be in..

Lastly an slightly unrelated I would like that NV released her impressive demos of last two/three years as executables so we can test in house like:
GTC 2010
Lighthouse
GTC 2012
Fracture raytraced demo
Optix water demo
GTC 2013
Wave works (perfect storm demo)
Face works

Some minor things to say to roadmap:
*Maxwell PC GPU: well they said will have unified virtual addresing and that tegra 6 will have denver CPU and Maxwell GPU but Maxwell desktop GPUs will have a CPU? and in that case a Denver (i.e. some form of ARM64) one? I suspect yes as if not makes little sense to say unified virtual addresing.. but why they will not say directly on Maxwell PC product? also seems if it's a Q1/Q2 2014 product can be a bit premature..
also they forgot to say context switching and preemption on GPU what I'm expecting too..
*Volta GPU: well I expected to be called Einstein after all codename it was mentioned by Dally.. can be that a latter product or is a Echelon codename? anyway that would add also very efficient interconnects (perhaps 3-6X byte/s/watt improvements)

After all also seems a new DX is in the works with possible spec published this year.. question is if it will be on first Maxwell products or not.. i.e. in GM10x or GM11x products as early 2014 seems a little soon.. perhaps after all first maxwell could arrive H2 2014 and that could bring a good release with:
*Dx11.2 or DX12 support and equivalent OGL support (5.x?)
*Hierarchical two level warp dispatcher
*Unified register/shader mem L1/ pool
*Scalar ALU next to vector ALUs like AMD 7xxx series..
*CPU with UVA access..
*Context switching and GPU preemption

Posted in | No comments

Monday, 4 March 2013

My wishes for OCL 2.0

Posted on 14:28 by Unknown

Hi,
I think it's time for publishing my OpenCL 2 requests so they maybe get considered for inclusion into it:
I'm not requesting what it hopefully will be in it like C++ extensions etc..
Depending on wheter they plan for support on existing GPUs or not will determine if some of these can be included. Anyway getting a cl_khr or cl_ext extension would be good..
but just before it a good remainder of thing that still are to being implemented..

*starting to see cl_ext_device_fission implemented on GPUs that should be doable on AMD 7xxx GPUs but on NV GK110 still not? even better seems new AMD Sea Islands should support partitioning in up to 8 sub GPUs..
*Implementing new OCL 1.2 extensions like graphic ones MSAA and depth access..

For OpenCL 2 would be good to have:
*Atomic counters (cl_ext_atomic_counters_32) in core.. they provide an order of magnitude improvement vs global atomics at least on old D3D11 HW (Fermi, AMD 5xxx series) and are foundation of HW accelerated queues.
*Kernels can send interrupts to CPU and/or initiate host system calls.. that seems coming for a while I think even Fermi whitepaper suggested that but still no avaiable.. AMD SI support SEND_MSG in ISA as Lottes suggest in his blog so AMD should be able too..
*warp/wavefront vote functions: this functions are in NV HW since GTX 2xx (2008) useful for example in currently most better dynamic mem allocator for GPUs see "Fast Dynamic Memory Allocator for Massively Parallel Architectures" they said:
"The used hardware must provide a voting function for an effi cient implementation" thus seems and OpenCL port will need exposure of that..
*Dynamic parallelism: well that should be expected also now that GK110 is shipping and also seems SI could support some limited form of it as shown in a ADFS session..
*Named barriers: Well this is shipping in CUDA since Fermi days and can be used for warp specialization like in CUDADMA project that can bring better memory bandwith explotation in some apps and also as shown in HPP study can bring support for "true function composability" i.e. GPU functions that use barriers can call other GPU functions that use barriers without breaking expected usage see HPP paper by Gaster et al.
*Crossvendor MultiGPU like CUDA P2P functionality: i.e. memory from one GPU addressable by other GPU directly from kernel without previous copy (also present in cl_amd_bus_addressable in AMD OCL)
*Exposing some common intra warp/wavefront ops? (like existing NV shuffle.. makes sense more like median, min/max could be beneficial for platforms like Xeon Phi but not on GPUs)
*Expose some cross vendor multimedia extension ISAs? i.e. some common cl_amd_media_ops/cl_amd_media_ops2 and ptx SIMD instructions.. this can be good jointly with interop with video encoders and encoders for accelerated video processing and even NV uses in their fast raytracing kernels..
*Finalize to bring parity vs exisiting compute exposure in graphics APIs like OGL 4.3/D3D 11 compute shaders: like said atomic counters where one thing..
->other being new gather4 instuctions..
->DispatchComputeIndirect: i.e. ability to launch kernel with size of workgroup total size being fetched from GPU mem.. it's more efficient for variable work kernels that depend on work generated by a previous kernel.. in this case we avoid a CPU trip but note that could be done with new Dynamic Parallelism so perhaps doesn't need to be exposed..
->Promote into core MSAA and depth extensions
->MipMap support like in CUDA 5
->compressed tex formats support
->a cross vendor extension for bindless support (assuming will get broad support in coming years)
->cross vendor ext for sparse texture/buffer support..

To finalize also exposing advanced control of ld/st operations such as cache modifiers and even using texture path (in GK110)..

Finally seems future GPUs could support unified register/local mem mem so explicit size control for better optimization could be good, also seems local mem could be allocated dynamicaly inside a kernel via extension to barrier function argument for better use of it so an extension to barrier operator could be good and also a scalar processor is present on recent archs so altough could be intended for executing common scalar code in kernel (extracted by a compiler) could also be exposed for direct programmability..

Coming not shortly(?) for me with atomic counter bringing possibly very fast queues and exposing all graphics functionality in kernels in OpenCL like said above primary targets to expose are rasterizer, z-buffer and rop functinality..
*the most interesting for me is exposing Z-buffer.. GPUDet papers shows an usage of it..
*exposing rasterizer what could be?: exposing perhaps via a generalized dynamic parallelism a funtion that takes a buffer or "geometry" to rasterize and some kernel that would be called in some specified grid size (tiles 8x8?) via dynamic parallelism.. all in all somewhat crazy seems..

More thoughts?

Posted in | No comments

GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

Wednesday, 20 March 2013

2013: a good year for new API revisions and launches?

Nvidia GTC thoughts: ARM,roadmap,demos..

Monday, 4 March 2013

My wishes for OCL 2.0

Popular Posts

Blog Archive

About Me