GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 23 December 2009

GPU computing on AMD.. an history perspective!

Posted on 03:22 by Unknown
Hi,
before talking about just released AMD Stream SDK 2.0 in detail.. we are gonna see how AMD/ATI GPU computing stack has evolved over last few years..
I think obviating GPGPU programs running via graphics API with tricks the first serious AMD GPGPU promise come with a Siggraph 2006 paper named DPVM (data parallel virtual machine) if I remember.. in September of that year both the CTM and Firestream cards were anounced..
the FireStream cards we merely ATI Radeon perhaps with lower clocks for reliability and higher RAM capacity.. at the time it all was R580 based.. it was high prices..
the CTM was announced to avaiable as NDA stuff in time for Christmas 2006.. promised assembly level programming for AMD cards and a feature highly wished was scatter GPU (Xbox 360 AMD end 2005 GPU).. Note also not was a virtual language assembly portable among GPU generations so if you take the herculean effort of programming in that interface benefits will be lost in R6xx generation..
In this time 110 Gflops matmul was state of the art..
Note at the same time Nvidia anounced on 8 Nov 2006 CUDA API and G80 CUDA hardware being C level, PTX portable assembly with scatter but also thread groups coperating with threadsyncs and local fast mem.. It was released as alpha in that year.. this become in time for Christmas..

all in all the GPU software was OpenGL and DirectX so Folding²home was..
come 2007..
this year r600 hardware launched at just before summer 2007 and CTM adding that support come..
still no AMD IL I think.. I think I downloaded PeakStream beta that have CTM libraries compatible with r600 hardware also.. This was high level stuff which linked to CTM libs.. there was no much time avaiable as was aquired by Google.. the big news just before end of 2007 come..
Just for Christmas came Radeon 3xxx series which among adding Direct3D 10.1 support in hardware (APIs RTM in Feb/Marc/aPril next year) added new GPU computing features as doubles..
This end of year AMD released first public (no NDA) and virtual assembly compatible among generations (AMD IL).. that was named CAL.. also come with Brook+ which provided a Brook CAL backend with added improved performance and reliabilty over OpenGL stuff.. also has hardware features of 3xxx exposed as doubles and scatter, global bluffers,etc.. Still now sutff using it.. Folding²home come with Brook+ came mid 2008.. Still Brook+ exposed no scatter, multi GPU, pinned mem, doubles, etc..
By the same time (and over 2007) Nvidia had achieved CUDA stable release (summer) and just for end of the year textures support.. and concurrent kernel and mem transfers, async kernel exec and mem transfers, atomics to global mem.. for CUDA 1.1 hardware which was all present excepting 8800GTX/GTS..
Come on 2008..
This year 48xx hardware come exposing among all "compute shader" i.e. more cudafied view of work.. so local groups with local mem and forgetting pixel shader.. Still many issues as local mem has strict write rules (each thread writes to its area..) and no 2007 CUDA stuff (atomic support,etc..)
By end of the year 2008 AMD just shipped CAL libraries as part of Catalyst release..
added 48xx features as compute shader mode for CAL programs exposing LDS (local mem), shared registers, thread id, etc.. and also stabilized a lot this API adding features (textures support, Vista, VC 2008 support)..
This year OpenCL spec was shipped and AMD comitted to having and implementation in H1 2009 and production code by end 2009 or begin 2010..
Still remained some issues: fixing multiGPU for 4870x2, and Brook+ features..
That year was good for Nvidia in mid 2008 introduced GT200 arch with doubles, relaxed mem coalescing rules and bigger register file among others.. and other features that will be exposed in 2009 (as capacity of accessing host mem from GPU kernels)..
CUDA progressed towards 2.0 (summer) with GT200 support, doubles, Vista support, 3D textures, matmul volkov code,etc..
By end of the year CUDA achieved 2.1 with GPU hardware debugger for Linux as alpha, and VC 2008 support..
Come on 2009..
That year AMD released new Direct3D 11 hardware with r8xx hardware which brought compute shaders and was the first AMD chip designed for OpenCL and DirectCompute..
the hard
AMD has ended having OpenCL production and DirectCompute for 5xxx and 4xxx hard..
see more next post..
Also OpenGL 3.2 plus ARB ext bring Direct3D 10.1 support..
So what's lacking or expect having in H1 2010..
I hope first both DirectCompute and OpenCL stabilize.. at least their supporter implementations..
first Apple needs fix perf and runtime issues which prevent still running complex and high perf code (as their FFT lib) of running at full speed on Apple platform.. also provide up to date fatures on GPUs ofered by their vendors (Nvidia add doubles.. AMD in 4xxx image support and gl_sharing.. in 5xxx drivers all 5xxx stuff).. a lot of that is 10.6.3 stuff I hope..
Also Microsoft has to ship fixed DirectCompute compiler for double usage and other bugs..
Also AMD has to finally have usable ICD dll by Nvidia and AMD, image support, OpenGL interop general (fix OpenCL create of context just before OpenGL creation of resources, and OpenGL image interop), byte addresable, 3d_image_writes.. and doubles general usage..
More long term expose 5xxx stuff as OpenCL ext (AMD_IL assembly insert in OpenCL kernels.. GDS.. wave sync.. concurrent kernels.. virtual functions)
Finally AMD needs bring OpenGL AMD 5xxx extensions..
I expect 10.6.3 seeds this year.. some leak by end Januray.. release February or early March..
Direct3D SDK for January or begin February..
Next AMD OpenCL SDK by end Feb 2010 or mid March 2010..
And OpenGL 5xxx extensions I am pessimistic and hope doc before GDC 2009 so mid march 2010..
Nvidia instead I hope by (mid) february will release Fermi with DirectCompute 5.0 and OpenCL with 3d_image_writes.. all release day.. Also OpenGL Fermi extensions..
So all in all at least by 15 March 2010 more or less we will have all I want.. excepting OpenCL 1.1 and AMD OpenCL 5xxx ext.. thats for end H1 2010 or latter..
Also perhaps for H1 or H2 is Apple OpenGL 3.2 support and OpenCL for Fermi and 5xxx that year..
Read More
Posted in | No comments

Thursday, 17 December 2009

Catalyst 9.12: hotfix (III)

Posted on 13:10 by Unknown
do you want:
crossfire enabled with eyefinity?
OpenCL support in 9.12?
OpenCL support for 5970?
etc..
then you need hotfix:
http://support.amd.com/us/kbarticles/Pages/ATICatalyst912Hotfix.aspx

TOMORROW more deep study
Highlights of the ATI Catalyst™ 9.12 hotfix release include:

Support for the OpenCL™ GPU component of the ATI Stream SDK v2.0

* The ATI Catalyst™ 9.12 hotfix release provides full support for GPU acceleration of OpenCL when used in conjunction with the ATI Stream SDK v2.0. This feature is supported on the ATI Radeon™ HD 5970 Series, ATI Radeon™ HD 5800 Series, ATI Radeon™ HD 5700 Series, ATI Radeon™ HD 4000 Series and the AMD FireStream™ 9200 Series of products. For more information about ATI Stream technology and OpenCL, please visit http://www.amd.com/stream.


Support for ATI CrossFireX™ on Eyefinity configurations

* The ATI Catalyst™ 9.12 hotfix release provides ATI CrossFireX™ support on Eyefinity configurations, allowing users to take advantage of their additional GPUs for increased gaming performance when driving high resolution Eyefinity display groups
* Supported on the ATI Radeon™ HD 5970 Series, ATI Radeon™ HD 5800 Series, and ATI Radeon™ HD 5700 Series


Support for DisplayPort audio

* The ATI Catalyst™ 9.12 hotfix release adds support for DisplayPort audio for DisplayPort panels


Fixes for the following issues:

* Resident Evil 5 – Performance drop during the opening cinematic
* Call of Duty Modern Warfare 2 – Performance drop observed with a supported ATI CrossFireX™ configuration when using the thermal scope
* Wheelman – Game freezes while loading the game menu
* Flashing in various OpenGL titles – City of Heroes, Enemy Territories: Quake Wars, Riddick
* Heaven benchmark (DirectX 9 mode) – Grass flickers and white boarders observed around the edges
* ATI Radeon™ HD 5970 Series, ATI Radeon™ HD 5800 Series, and ATI Radeon™ HD 5700 Series – issue with HDMI: DTS-HD and Dolby True-HD not working properly with certain receivers.
* The desktop mouse cursor becomes enlarged at random when using Windows 7

http://forum.beyond3d.com/showthread.php?p=1369975#post1369975
http://www.rage3d.com/board/showthread.php?p=1336111668
Read More
Posted in | No comments

Catalyst 9.12 Linux and Windows links and release notes(II)

Posted on 11:38 by Unknown
First they add officially:
1.ATI Catalyst™ support for DirectCompute 10.1
2.This release of ATI Catalyst™ provides support for OpenGL 3.2 extension support on the ATI Radeon™ HD 5xxx trough 2xxx series.

note some errors as transform_feedback2 and GL_ARB_fragment_coord_conventions aren't exposed per see but seem to have the functionality builtin..
also aren't said arb extensions providing d3d 10.1 support..

Linux
https://a248.e.akamai.net/f/674/9206/0/www2.ati.com/drivers/linux/catalyst_912_linux.pdf
https://a248.e.akamai.net/f/674/9206/0/www2.ati.com/drivers/linux/ati-driver-installer-9-12-x86.x86_64.run
what's new:
1.Support for New Linux Operating Systems
2.ATI Catalyst Control Center Linux Edition: Displays pages: user interface
enhancements
so no OpenGL 3.2?

bugs:
CrossFire may fail to be enabled on some Ubuntu 9.10 configurations with ATI
Radeon HD 5900 Series adapter

From release notes:
Note: Although listed in the documentation,
ADL_Adapter_ClockInfo_Get is not available in the current version of the ADL. The code for obtaining the clock information was excluded due to inaccurate information being reported when the card is over-clocked. Future updates of ADL SDK will reflect this change. As an alternative, ADL_Overdrive5_ODParameters_Get() can be used but this API is only supported on R600 and above ASICs.
Relase notes:


The Drag and Drop Transcoding feature is a beta level feature provided only for evaluation purposes.
Note: The Drag and Drop Transcoding feature is only supported on single and dual core CPUs, and supported on devices that support Media Transfer Protocol (MTP).

ATI Catalyst™ support for DirectCompute 10.1
This release of ATI Catalyst™ provides full support for DirectCompute 10.1 for the ATI Radeon™ HD 4800 Series, ATI Radeon™ HD 4700 Series for both single card and ATI
CrossFireX™ supported configurations.

This release of ATI Catalyst™ provides support for OpenGL 3.2 extension support on the ATI Radeon™ HD 5800 Series, ATI Radeon™ HD 5700 Series, ATI Radeon™ HD
4000 Series, ATI Radeon™ HD 3000 Series and ATI Radeon™ HD 2000 Series. The
following is a list of OpenGL™ 3.2 features and extensions added in ATI Catalyst™
9.12:
* Support for OpenGL Shading Language 1.50
* BGRA vertex component ordering (GL_ARB_vertex_array_bgra)
* Drawing commands allowing modification of the base vertex index
(GL_ARB_draw_elements_base_vertex)
* Shader fragment coordinate convention control
(GL_ARB_fragment_coord_conventions)
* Provoking vertex control (GL_ARB_provoking_vertex)
* Seamless cube map filtering (GL_ARB_seamless_cube_map)
* Multisampled textures and texture samplers for specific sample locations
(GL_ARB_texture_multisample)
* Fragment depth clamping (GL_ARB_depth_clamp)
* Geometry shaders (GL_ARB_geometry_shader4)
* Fence sync objects (GL_ARB_sync)
* transform_feedback2
* texture_cubemap_array
Read More
Posted in | No comments

Source code of DirectCompute bechmark(OpenCL and DirectCompute)!

Posted on 08:55 by Unknown
It includes OpenCL and DirectCompute benchmarks..
with 9.12 you have DirectCompute 4xxx so..
it already has working internally for AMD multiGPUs on OpenCL like AMD 4870x2..
seems DirectCompute worked multiGPU on 5xxx already..
Supports also different GPUs as ATI and Nvidia.. dev has 4890 and 8800gt..
latest builds 0.43 (?) has fixes for setting DirectCompute version as supported by every one (sets minimum version or different each on?)
Load balances for max power?
Now finding I have found code:
seems fft code from Volkov fft cuda samples and already used in OpenCL ppt in Siggraph 2008, already used also in DirectCompute in Nvidia Ocean demo(?):

http://dl.dropbox.com/u/1416327/directcomputebenchmarkkernels035.txt
http://dl.dropbox.com/u/1416327/directcomputebenchmarkkernels043.txt
Read More
Posted in | No comments

Catalyst 9.12 adds OpenGL 3.2 support (and more..)!

Posted on 08:12 by Unknown
GPU caps viewer says so..
goes up from 164 to 177 extensions adds:
OpenGL Shading Language 1.50
gl_amd_tex_cubemap_array
gl_amdx_name_gen_delete
gl_arb_compatibility
gl_arb_depth_clamp
gl_arb_draw_elements_base_vertex
gl_arb_sample_shading
gl_arb_sync
GL_ARB_texture_cube_map_array
gl_arb_texture_gather
gl_arb_texture_multisample
gl_arb_texture_query_lod
gl_ext_timer_query
wgl_arb_create_context_profile

it has already this OpenGL 3.2 plus 10.1 ARB ext stuff:
GL_ARB_draw_buffers_blend
GL_ARB_vertex_array_bgra
gl_arb_seamless_cubemap
GL_ARB_geometry_shader4
GL_ARB_provoking_vertex
GL_ARB_seamless_cube_map

So all in all adds OpenGL 3.2 (GLSL 1.5.0) plus all new ARB ratified extensions introducing Direct3D 10.1 hardware functionality..

Seems GL_ARB_fragment_coord_conventions isn't exposed as extension but I think it's supported in OpenGL 3.2 contexts..

Now all remains is:
1.info on new OpenGL 5xxx extensions..
GL_AMD_gpu_shader5
GL_AMD_patch_tessellator
GL_AMDX_random_access_target
gl_amd_texture_compresion_dxt6,7
GL_EXT_tessellation_shader-> multivendor nvidia has it for Fermi?
GL_AMD_video_instruction?
2. still no good Nvidia extensions:
ext_separate_shader_objects
ext_direct_state_access
ext_transform_feedback2 seems also -> equivalent nv_ ext gt200 nvidia originally
some nvidia bindless stuff
texture_barrier
copy_image (copy textures among GPUs)..
some opengl dx interop
3. show CAL and OpenCL GL interop
wglResourceAttachAMD wglEndCLInteroperabilityAMD wglBeginCLInteroperabilityAMD
wglMakeAssociatedContextCurrent
Read More
Posted in | No comments

Wednesday, 16 December 2009

16/12 news!

Posted on 12:49 by Unknown
More than Catalyst 9.12 is being delayed 17
has compute support for 4xxx
we have:
1.Perfkit 6.x for 190.32 drivers..
(I can't seem to find is registered developers stuff..)
2.Fastra 2: most powerful workstation (12tflops 6 gtx 295 and single GPU=13gpus)
fastra 1 was 4 9800gx2 (1024 cores like a dual Fermi GPU card will have..)
http://fastra2.ua.ac.be/
http://www.youtube.com/watch?v=GOpBlYx2H1o
3.Seems compute drivers for Tesla on Vista and 7 coming surpassing WDDM and allowing remote desktops for example: see a screen shoot (coming soon..)
http://forums.nvidia.com/index.php?act=attach&type=post&id=18902
4.Kapersky labs use CUDA for anti virus scanning (like GPU Gems 3 paper..)
5.Fermi web cast come and go I asked some questions:
I get only that presentation are coming in nvidia.com/tesla
photos i grabbed:
libraries:
http://dl.dropbox.com/u/1416327/fermi1.jpg
soft coming:
http://dl.dropbox.com/u/1416327/fermi2.jpg
mellanox:
http://dl.dropbox.com/u/1416327/fermi3.jpg
6.where are the new 5870 (Evergreen) instructions documented ?
search for it in AMD forums a guy finds GDS ops in aticaldd.dll and also SAD and POPC stuff..
7. Seen Optix has problems with lbvh acceleration structures and coming with some double support in rays..
Read More
Posted in | No comments

Catalyst 9.12 released

Posted on 07:58 by Unknown
Some months ago:
As was reported to us from a company-source, runs the HD5800 series and HD5700 series is not yet at full power. Thus, the full optimization is only with the 9.12 Catalyst Drivers have happened, as shown in the changelog.
As will shape the performance with the fully optimized drivers, we will see in a few weeks.

People say better Dirt2 and Shift perf.. also perf update in 3D Vantage (say 6-10%).. some texture fixes in games..
and better CrossfireX..

Also seems to provide DirectCompute 4.1 for ATI 4xxx cards..
Crossfire Eyefinity support?

Linux finding..
phoronix shows
fglrx 8.68.2 OpenGL: 3.1.9210
i found also:
fglrx 8.70.0, OpenGL: 3.2.9335 Compatibility Profile Context
Now:
http://www.overclock.net/ati-drivers-overclocking-software/628763-new-catalyst-9-12-uploaded-gruru3d.html
New Catalyst 9.12 WHQL driver for Vista and Windows 7 only 32bit and 64bit
dated 24/11
version 8.681
OpenGL 9232
D3d 716
2d 984

Links here:
https://a248.e.akamai.net/f/674/9206/0/www2.ati.com/drivers/9-12_vista64_win7_64_dd_ccc_wdm_enu.exe
http://www2.ati.com/relnotes/Catalyst_912_release_notes.pdf

opengl extensions?
still no direct_state_access
still no ext_transform_feedback2 seems also -> gt200 nvidia originally
GL_AMD_gpu_shader5
GL_AMD_video_instruction
GL_AMD_patch_tessellator
GL_AMDX_random_access_target
GL_EXT_tessellation_shader-> multivendor nvidia has it for Fermi?
shows CAL and OpenCL interop work
wglResourceAttachAMD (?) wglEndCLInteroperabilityAMD wglBeginCLInteroperabilityAMD
wglMakeAssociatedContextCurrent
exposes 3.2.9232?

cal?
seems better than leaked 8.68 xp build was 49x now 515..

includes more gl interop work
for OpenGL interop
wglBeginCLInteroperabilityAMD wglEndCLInteroperabilityAMD wglResourceAttachAMD wglResourceDetachAMD

directcompute test?
test flash 10.1 beta2?

UPDATE soon..
Read More
Posted in | No comments

PS3 OpenCL work and AMD OpenCL ICD

Posted on 07:51 by Unknown
AMD post ICD KB
says production release gets ICD
explaining changes in ICD you need create context from no null but pass platform so you can pass also null to platform but in createcontext and device use struct to pass platform..
AMD OpenCL hotfix for sandra 2010 includes ICD OpenCL..
no good driver..
my apple demos don't work
Sandra 2010 only..
exposes in CPU atomics 64 (all atomics now) and by error
some cl_amd_dx9_interop, and exist cl_amd_dx10_interop
and all extensions present in binary.
still gpu really none..
you can test renaming to nvcuda.dll if you have nvidia 195 opencl.dll in windows system
this uses khronos ICD and works..
still don't know how to recognize more than two Opencl dll imps..

PS3
Already
http://sites.google.com/site/openclps3/
has study of OpenCL on Ps3
remember ps3 slim no linux..
Read More
Posted in | No comments

Christmas Wish list (I): Monitors

Posted on 07:39 by Unknown
Hi,
Christmas is here and for people lucky enough this hard times to be able to buy some reward for them you have
Monitors :
LCD of course for size,weight,etc..
Well monitors we have some factors but knowing that 24inch fullhd cost 200eur we want to spending some more have more features..
this factors:
the "classic" factors:
1.size
2.resolution
3.color depth
the new factors:
4.touch
5.3d stereo..
Also speakers,Webcam,microphone,etc..

Regarding size 24" cheap 26-27" next thing..
Until now 27" are fullhd only (1920x1200) with the exception of news Imacs 27 that seem not to arrive still..
If you prime size and cheap you can get a say 30-32 FullHD LCD TV and you get low prices good color and response time (2ms..) and also TDT (in Spain at least..)..
say 300 Eur.

We go to resolution 1920x1200 standard (200eur or less) now hot stuff at 4m pixels.. said if you avoid 3D LCD monitor (as Nvidia 3d vision samsung 22) you get fullhd.. for some years you have 30 inches with 4mpixels..
and now with Imac 27" also near 4mpixls (2560*1440).. now PCs get via Dell UltraSharp U2711 also this panel:
http://vr-zone.com/articles/dell-ultrasharp-u2711-release-soon-/8173.html
Hopefully less than 600eur.

This is the more balanced if you want high resolution.. you don't need to spend a fortune like for 30inch 4mpxiels or not and Imac and hopefully 50%-75% of that..

8bits per channel since ever (some stereo monitors 6bits)->10bits here->12 bit coming?
HDMI 1.3 adds 10 and 12 bits per chanel is named DeepColor but as 30bits is more easy as to pack with current 32bits per color (using no alpha or restricting to 2 bits).. there are also hardware supported as framebuffers (FBOs) and texture samplers of GPU as shows OpenGL..
Supposedly also H.264 also supports it (some advanced profile)..
also Bluray adds this profile as optional(?) so Bluray players should we use it..
We would need a Bluray DeepColor player a Bluray deepcolor movie and HDMI 1.3 cable and a monitor with 10bit panel to support it..
Some Bluray coded with 10bit per channel color or video sample in Internet?

More than 8 bit requires more than a monitor supporting it in computers..
*Graphics cards (artificially limited to Quadros and FireGL..)
*cable (HDMI 1.3? (DeepColor),DVI(with hacks),Displayport).. see Nvision08 presentation..
*SO (Linux and WinXP some limitations, Win Vista/7 good) (mac?)
*in OpenGL create context with these bits
(you program needs use perhap HDR textures.. etc.. to use full depth)..
*SceniX supports it..

So to avoid problems you need it a Quadro/Radeon with HDMI 1.3 or Displayport and Windows Vista/7 or Linux..

For Nvidia use see Nvision08 presentation..
HD is now 8MP & HDR
http://developer.nvidia.com/object/nvision08-hd8mp.html
for AMD see..
http://developer.amd.com/GPU/WGSDK/Pages/default.aspx
AMD’s 10-bit Video Output Technology
http://developer.amd.com/gpu_assets/10-Bit.pdf
http://developer.amd.com/Downloads/10bit_demo.zip
for OpenGL textures and FBOs see..

Well monitor I know only one "affordable" 24 inch FullHD..
HP DreamColor LP2480zx Professional Display.. (600 eur more or less)
about Dell UltraSharpTM U2410?
says panel supports 1billion color and uses internal processing of color 12bits..
but can you drive 10bits (panel 10 bit really?) via HDMI 1.3? I don't know..
Also a lot of TV LCD monitors say 10bit color panel or Deepcolor panel and HDMI 1.3 (is that really)? they seem to offer FullhD and big sizes relatively cheap..

Touch
Well I want also at least fullhd (some less than 24)..
With Windows 7 launched there is a lot of monitors but a lot 22 or less of size..
DellTM SX2210T 21.5"W Multi-Touch (fullhd) 2mp camera
Ilyama releases multi-touch 22" monitor T2250MTS (fullhd)
Elo TouchSystems 2420L 24" Touch Screen LCD Monitor

also
desktops:
Dell's multi-touch Studio One 19 PC
notebooks:
Dell Studio 17

Also neetbooks include it..
APIs: Windows 7 Touch API (Windows SDK and code samples in MSDN)..
Linux Freeglut enabled MPX(?)
Apple ships private framework for Snow Leopard (touch api?) for mighty mouse..


Linux can have all.. 10 bits is supported in OpenGL (and distro?) with code samples in quadro cards and 10bit monitor.. 3d vision is supported since 195.22 with Quadro cards.. and touch you have MPX patch for Freeglut and using X server 1.7, there are drivers for touch panels.. and Qt 4.6 APIS and Linux?..

Seeing mac we have good size 27, 4mpixels and quadros so I don't know but theoretically you have all needed for 10bit color.. you lost for sure touch and 3d..
Touch is in private api at leat since 10.6.2 with mighty mouse.. so seems we need a Cocoa Touch for desktops and driver for touch drivers and we can have multitouch in mac.. Regarding 3d we could have support for passive displays as Iz3d as isn't a usb device and don't need to configure for special hz. Only need to know the algorithm as Iz3d what to send to each DVI port.. Luckily there is open soure Iz3d media player so you can see how to use that..
For 3D Vision and active displays we would need a driver that enables OpenGL QB with 3d Vision cards..
Read More
Posted in | No comments

Tuesday, 15 December 2009

3d Stereoscopic players!

Posted on 12:51 by Unknown
You have 3D Vision specific:
1.Nvidia 3D Vision player supports Fujifilm videos,etc.. seems to derive 2.
Nvidia stereo photo viewer: supports slideshow, jps and Fujifilm mpo..

2.3D Stereoscopic player: http://www.3dtv.at/Downloads/Index_en.aspx
The best!
has IZ3D, Nvidia, whatever..

3.Tridef:

4. Iz3D:
Has video player derived from Media Player Classic.
Has source so you can make your own OpenGL QB driver if don't want to pay or for other OSes (think Linux, Mac..)
Has photo viewer I think..

5.Linux
patches for OpenGl QB mplayer..

Changes:
3D Stereoscopic player:
*Supports the new 2009 iZ3D glasses
*supports separate left and right still image files
*fixes problems when viewing still images on graphics cards with a small amount of memory.
*Fujifilm MPO format and supports photos.
*Optimized for Windows 7 and features new anaglyph formats.

TriDef 3D Media Player

* Improved video format support for Windows 7: added support for Microsoft DirectShow codecs requiring NV12 format data, including MPEG Program Stream, Microsoft Recorded TV (.wtv) and ASF.
* MPEG-4 Part 10 (AVC/H.264) and MPEG-2 video codecs are now supported on Windows 7 without requiring 3rd party DirectShow components.
Read More
Posted in | No comments

Today news!

Posted on 12:42 by Unknown
Search it:
Nvidia 195 new drivers with Physx 12.11.09! (and flash 10.1 beta 2)
Avatar trailer in 3D! (home made)
DirectCompute benchmark betas!
Another OpenCL mandelbrot:
http://www.bealto.com/mp-mandelbrot.html


Tridef 4.13 updated!

* TriDef 3D Media Player 6.5.5
* TriDef 3D Ignition 2.4.6

Release Notes


TriDef 3D Media Player

* Improved video format support for Windows 7: added support for Microsoft DirectShow codecs requiring NV12 format data, including MPEG Program Stream, Microsoft Recorded TV (.wtv) and ASF.
* MPEG-4 Part 10 (AVC/H.264) and MPEG-2 video codecs are now supported on Windows 7 without requiring 3rd party DirectShow components.

TriDef 3D Ignition

* Displays an error message if the game is not 32-bit or does not use DirectX 9.
* Fixed user interface in Age of Empire 3.
* Fixed a crash in Resident Evil 5.
* Fixed a crash in The Sims 3.
* Fixed reversed 3D in the windowed mode of some games.
* Fixed an issue that some games cannot be switched to 2D.
* Improved compatibility with some new games.
Read More
Posted in | No comments

What will I do if I have 3D Vision OpenGL QB

Posted on 12:23 by Unknown
Needs:
*Only Quadro cards..

Adds:
*Its programmable and
*Adds explicit eye management!
*it supports windowed stereo! (dx autostereo coming!)
*Adds Windows XP and Linux (in 195.22) (Mac?)
*I could play Tridef photo and Thidef movies with Tridef Media Player
See store: http://store.yabazam.com/
*I could play Stereo videos in linux with Mplayer:
see patches..
*I could add support for Optix Mandelbulb Anaglyph demo..
*I could use Tridef Ignition(?) to use Google Earth (OGL mode) and Bing(?) (these are coming for 3D Vision too!)
*Use OpenGL QB demos
*Use OpenGL QB programs
*Write some path for Youtube Stereo videos (?).. IZ3d has some thing..
*Write some demo using OpenGL QB and OpenCL.. qjulia OpenCL
*WebGL through OpenGL QB!

related: waiting for D3D11 support Directcompute Nvidia Ocean demo..
Read More
Posted in | No comments

GLEW,GLUT,Freeglut, MesaGLUT and more

Posted on 09:00 by Unknown
Hi

don't pretend to understand most of this post is only my daily notes..

GLUT is old, no supersampled framebuffer wgl and glx extensions i.e. without using aa FBO's..
is 2001 src for win32..
from FreGLUT web:
Why?

The original GLUT library seems to have been abandoned with the most recent version (3.7) dating back to August 1998. It's license does not allow anyone to distribute modified the library code. This would be OK, if not for the fact that GLUT is getting old and really needs improvement. Also, GLUT's license is incompatible with some software distributions (eg Xfree86).
I have coded some path for Win32 aa multisampled framebuffer..
I think not works if you want 3.0 contexts (you have to create special)..
I think to have patched already also.. I will try to find this megapatched GLUT :-)
Now seems 3.2 requires new functions (core profile only?) so I will patch also if required (all my patches are Windows stuff..)
I pretended to patch for OpenCL OpenGL interop before as it was said to require OpenGL enable compute context but now seems it's no required..
I wanted more fixes.. (I have to think what?..)
I will post it..

Related I also added runtime hotpatches for OpenGL libraries in Windows for having NV affinity extension (altough it will not add any perf for Nvidia multigpus).. have to add ati emulation (that's efficient if I use AMD equivalent extension)..
would be good for Equalizer that uses it and 5970 say.. Also what can I do for CompleX to run? I will see.. remember now Win 32 and 64 but says XP (it's really needed or runs in Seven)
I also added runtime hotpatches for OpenGL libraries in Windows for reporting QB stereo emulation..
Now I must add full patch for 3D Vision sample I have..
seems now projects are switching to FreeGLUT and MesaGLUT among others..

past other have tried:
crazy stuff (dead):
*GLUX of Ogre creators.. (promised to bring ogl 3.0 and aa for example.. it's dead.. I don't know how it is..
http://code.google.com/p/glux/updates/list
*OpenGL Window Framework
http://sourceforge.net/projects/oglwfw/
An Alternative C++ Windowing Framework for OpenGL which initalises extensions (via GLee) and allows easy access to context features such as FSAA and the Accumulation buffer
*http://openglut.sourceforge.net/
Some changes since forking from freeglut include:

* Improved full-screen support
* Improved joystick support
* Improved Cygwin and Mingw32 support
* Improved bitmap font rendering
* Improved ANSI-C compliance
* Documentation refinements
* Experimental offscreen rendering

So using GLUT, you require aa use FBO aa..
you require OGL 3.x swith to FreeGLUT, MesaGLUT, or SDL (in trunk),
QT(?),..

MesaGLUT
Mark Kilgard's GLUT, easily compiled and used with Mesa. Plus, other implementation of GLUT for DOS, OS/2, BeOS, etc.

MesaGLUT: Right now MESA 7.7-> MesaGLUT 7.7rc2
ftp://ftp.freedesktop.org/pub/mesa/7.7/
latest :ftp://ftp.freedesktop.org/pub/mesa/beta/

FreeGLUT 2.6.0 (relased 27 nov 2009)
(Windows
binary
)adds vs glut (seems all I want):
I want patch for Windows 7 Touch APIs as QT has..
MultiPointer Linux patch support (Svn 832)
*Binary compatible with GLUT dlls's:
Added the ugly ATEXIT_HACK from GLUT 3.7, making freeglut binary compatible with the GLUT DLLs out in the wild.
Use our module definition file, so we get undecorated names in the resulting DLL, just like the classic GLUT DLL from Nate Robins.
* Added support for multisampling: The number of samples per pixel to use
when GLUT_MULTISAMPLE is specified in glutInitDisplayMode() can be set via
glutSetOption() with parameter GLUT_MULTISAMPLE now. glutGet() with the
same token retrieves that value. The possible number of samples per pixels
can be queried via the new API entry

int *glutGetModeValues( GLenum mode, int *size );

with mode GLUT_MULTISAMPLE. (glutGetModeValues() currently only works for
X11)
Note:
Set the default number of samples per pixel to 4 and actually use the value set
with glutSetOption(GLUT_MULTISAMPLE,...) in Windows code.
* Added support for versioned (i.e. 3.0) OpenGL contexts: New API entries

void glutInitContextVersion( int majorVersion, int minorVersion );
void glutInitContextFlags( int flags );

with related new constants GLUT_DEBUG and GLUT_FORWARD_COMPATIBLE for the latter API entry. Added new constants GLUT_INIT_MAJOR_VERSION, GLUT_INIT_MINOR_VERSION and GLUT_INIT_FLAGS for glutGet().

*Added support for sRGB framebuffers via the GLX_ARB_framebuffer_sRGB / WGL_ARB_framebuffer_sRGB extensions. Added support for context profiles via the new parts of the GLX_ARB_create_context / WGL_ARB_create_context extensions.

*Visual Studio 2008 2 files one for the normal DLL build, and one for the static build.
* Added deprecated, but working Joystick API.
* Added new constant GLUT_INIT_STATE for glutGet() to check if freeglut is already initialized.
* Added new API entry for full-screen mode
void glutFullScreenToggle( void );
with a related new constant GLUT_FULL_SCREEN for glutGet().
* Added new API entry to de-initialize freeglut:
void glutExit( void );
* Added more special keys: GLUT_KEY_NUM_LOCK, GLUT_KEY_BEGIN GLUT_KEY_DELETE
* Added support for windows without captions and/or borders via two new constants GLUT_CAPTIONLESS and GLUT_BORDERLESS for glutInitDisplayMode (currently works for Windows only).
* Added new constant GLUT_AUX for glutSetOption() to set the number of auxiliary buffers. The possible number of auxiliary buffers can be queried via glutGetModeValues with mode GLUT_AUX.

Has example fully OpenGL-3.1-compliant.

For using GLUT and FreeGLut Mingw GCC:
Using GLUT with MINGW


Also how to bring in auto load of extensions..

GLEW 1.5.1 is old..
glee 5.4 is 3.0 capable no more..

glew
====
seems every year some stuff..
so some stuff in SVN since last release:
briefly 3.0 detection fixes, 3.1,3.2 support and Snowleopard fixes:

gl.h for SnowLeopard uses __X_GL_H rather than __gl_h_ or __GL_H__
binaries from the object files instead of sources
Add GL 3.2 Core support
Add support for GLint64, GLuint64 and GLsync
preliminary spec file for GL 3.1
Workaround Mesa GLUT incompatibility.
Ensure OpenGL 3.0 is properly detected.
Use uname instead of arch since Debian/Ubunutu/etc do not have arch
Some bug fixes..

glee
====
in svn you have vc2008 projects
Read More
Posted in | No comments

Nvidia 195 new drivers and Flash player beta 2!

Posted on 08:07 by Unknown
Will work with ATI Catalyst 9.12?

http://forums.legitreviews.com/about24589.html

Nvidia released new GeForce 195.81 beta drivers with support for Adobe Flash Player 10.1 beta 2. Both desktop and notebook drivers were released.

This driver is identical to the 195.62 WHQL drivers except for the following:

* Several bug fixes to support the new Adobe Flash 10.1 Beta 2 release. Learn more here.
* Adds new SLI and multi-GPU profiles for Avatar Demo, Operation Flashpoint: Dragon Rising Demo, and Wings of Prey.
* Fixes 3D Vision display detection for CRT and DLP displays.

Desktop drivers
http://www.nvidia.com/object/winxp_195.81_beta.html
http://www.nvidia.com/object/winxp64_195.81_beta.html
http://www.nvidia.com/object/win7_winvi ... _beta.html
http://www.nvidia.com/object/win7_winvi ... _beta.html

Notebook drivers
http://www.nvidia.com/object/notebook_w ... _beta.html
http://www.nvidia.com/object/notebook_w ... _beta.html
http://www.nvidia.com/object/notebook_w ... _beta.html
http://www.nvidia.com/object/notebook_w ... _beta.html

Adobe Flash Player 10.1 beta 2
http://download.macromedia.com/pub/labs ... 121509.exe
http://download.macromedia.com/pub/labs ... 121509.exe
Read More
Posted in | No comments

Sunday, 13 December 2009

Running ATI GPUs in Sisoft Sandra 2010!

Posted on 18:47 by Unknown
KB72 - Running SiSoftware Sandra 2010 OpenCL™ GPGPU benchmarks with the ATI Stream SDK v2.0-beta4
http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=72
Read More
Posted in | No comments

Memcheck GPUs!

Posted on 18:16 by Unknown
ALL CUDA based
https://simtk.org/home/memtest/

http://arxiv.org/pdf/0910.0505

https://cudagpumemtest.svn.sourceforge.net/svnroot/cudagpumemtest/
http://cudagpumemtest.sourceforge.net/
applicacion
On Testing GPU Memory for Hard and Soft Errors

CUDA MemTest 0.50a

I don't find There existed one that checked via DirectX and CUDA (hungary boy or european)
Read More
Posted in | No comments

Emulate 3D kernel launch grid

Posted on 18:16 by Unknown
http://forums.nvidia.com/index.php?s=&showtopic=87178&view=findpost&p=493821
This is a good question!

Unfortunately on current hardware the grid is only 2D, which makes it tricky to calculate 3D indicies. To makes things worse, integer divide and modulo (which you would normally use to calculate your own 3d indicies) are very expensive on current GPUs.

The best solution I've seen is this code (credit to Jonathan Cohen, hopefully he doesn't mind me posting it here).

The setup code is:

CODE
__host__ void launchThreads(int nx, int ny, int nz) {
int threadsInX = 16;
int threadsInY = 4;
int threadsInZ = 4;

int blocksInX = (nx+threadsInX-1)/threadsInX;
int blocksInY = (ny+threadsInY-1)/threadsInY;
int blocksInZ = (nz+threadsInZ-1)/threadsInZ;

dim3 Dg = dim3(blocksInX, blocksInY*blocksInZ);
dim3 Db = dim3(threadsInX, threadsInY, threadsInZ);

callKernel<<>>(..., blocksInY, 1.0f/(float)blocksInY);
}


And the kernel code looks like this:

CODE
__global__ void callKernel(..., unsigned int blocksInY, float invBlocksInY)
{
unsigned int blockIdxz = __float2uint_rd(blockIdx.y * invBlocksInY);
unsigned int blockIdxy = blockIdx.y - __umul24(blockIdxz,blocksInY);
unsigned int i = __umul24(blockIdx.x,blockDim.x) + threadIdx.x;
unsigned int j = __umul24(blockIdx.y ,blockDim.y) + threadIdx.y;
unsigned int k = __umul24(blockIdx.z ,blockDim.z) + threadIdx.z;
// use i,j,k ...
}


We should really have a sample of 3D array processing in the SDK. Anybody want to contribute one?
Read More
Posted in | No comments

things found in CUDA forums

Posted on 12:12 by Unknown
Also some CUDA news:

Mandelbulb stereo angalyph
-> have to port to 3D Vision
http://forums.nvidia.com/index.php?showtopic=150985&st=20&start=20

CUDA game of life
http://forums.nvidia.com/index.php?showtopic=152757

sieve atkins cuda
todo

Benchmarking a GPUCV CUDA operator with Cuda Visual profiler

CFD cuda paper of rodinia bench
http://web.cos.gmu.edu/~acorriga/pubs/gpu_cfd/aiaa_2009_4001.pdf

Radix sort for doubles:
Yep, Thrust sorts doubles with two 32-bit radix sorts using similar tricks. Here are some performance results.
http://www.meganewtons.com/2009/08/sorting-performance-optimizations.html
http://forums.nvidia.com/index.php?showtopic=152539

Performance of 3D Deconvolution Algorithms on Multi-Core and Many-Core Architectures
ftp://ftp.cs.unc.edu/pub/publications/techreports/09-001.pdf
8 cpus vs gpu
code:
Clarity Deconvolution Library 1.0
http://cismm.cs.unc.edu/downloads/
manual
http://cismm.cs.unc.edu/resources/software-manuals/clarity-deconvolution-library/

Fuzzy Logic on the GPU in CUDA
http://cirl.missouri.edu/gpu/cuda_lessons/type1fl/index.html


gpu economics
http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/
http://people.maths.ox.ac.uk/~gilesm/hpc/

montecarlo gpus
http://www-atom.fysik.lth.se/MedWeb/research/montecarlo.html
http://www-atom.fysik.lth.se/MedWeb%5Cresearch%5Cmonte_carlo_files%5CCUDAMC.rar


using 3D arrays cuda:
good explanation
http://forums.nvidia.com/index.php?s=&showtopic=87178&view=findpost&p=591025
code that permutes indexes
http://forums.nvidia.com/index.php?s=&showtopic=87178&view=findpost&p=963428
http://dl.dropbox.com/u/1416327/transpose3D_revised.cu
see pdf:
http://dl.dropbox.com/u/1416327/Matrix_transpose_post.pdf
see page:
http://oz.nthu.edu.tw/~d947207/cuda
can be used for:
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
Read More
Posted in | No comments

Siggraph 2009 (Asia too..)!

Posted on 11:12 by Unknown
Asia 2009

A GPU Laplacian Solver for Diffusion Curves and Poisson Image Editing

Micro-Rendering for Scalable, Parallel Final Gathering

Amortized Supersampling

Adaptive Wavelet Rendering

Stochastic Progressive Photon Mapping

Automatic Bounding of Programmable Shaders for Efficient Global Illumination

Fast Motion Deblurring

DiagSplit: Parallel, Crack-Free, Adaptive Tessellation for Micropolygon Rendering

Approximating Subdivision Surfaces with Gregory Patches for Hardware Tessellation

Ray casting of multiple volumetric datasets with polyhedral boundaries on manycore

Debugging GPU Stream Programs Through Automatic Dataflow Recording and Visualization

Real-Time Parallel Hashing on the GPU

RenderAnts: Interactive REYES Rendering on GPUs

Out-of-Core Multigrid Solver for Streaming Meshes

Dynamic Shape Capture using Multi-View Photometric Stereo

Robust Single View Geometry And Motion Reconstruction

Consolidation of Unorganized Point Clouds for Surface Reconstruction

Siggraph 2009

Gaussian KD-Trees for Fast High-Dimensional Filtering

An Empirical BSSRDF Model

Harmonic Fluids

Energy-Preserving Integrators for Fluid Animation

Modular Bases for Fluid Dynamics

Predictive-Corrective Incompressible SPH

Directable, High-Resolution Simulation of FIre on the GPU

Direct Trimming of NURBS Surfaces on the GPU

Fourier Depth-of-Field

GRAMPS: A Programming Model for Graphics Pipeline

Automatic Pre-Tessellation Culling

A Benchmark for 3D Mesh Segmentation

Deforming Meshes that Split and Merge

A Visibility Algorithm for Converting 3D Meshes Into Editable 2D Vector Graphics

Automatic and Topology-Preserving Gradient Mesh Generation for Image Vectorization

Asynchronous Contact Mechanics

An Efficient GPU-based Approach for Interactive Global Illumination

Invertible Motion Blur in Video
Read More
Posted in | No comments

Architecture ideas for future GPUs!

Posted on 10:17 by Unknown
graphics related

"A Hardware Processing Unit for Point Sets"
Simon Heinzle, Gael Guennebaud, Mario Botsch, and Markus Gross

"PCU: The Programmable Culling Unit"

"A Hardware Architecture for Surface Splatting"
Tim Weyrich, Simon Heinzle, Timo Aila, Daniel Fasnacht, Stephan Oetiker, Mario Botsch, Cyril Flaig, Simon Mall, Kaspar Rohrer, Norbert Felber, Hubert Kaeslin, and Markus Gross, in "ACM Trans. Graph. (Proc. Siggraph 2007)", August 2007

"PFU: Programmable Filtering Unit for Mobile Multimedia Applications on Graphics Hardware"

GPU computing based

Inter-Block GPU Communication via Fast Barrier Synchronization
Xiao, Shucai and Feng, Wu-chun (2009) Inter-Block GPU Communication via Fast Barrier Synchronization. Technical Report TR-09-19, Computer Science, Virginia Tech.

"Increasing Memory Miss Tolerance for SIMD Cores"
David Tarjan, Jiayuan Meng, Kevin Skadron, in "Proc. Supercomputing '09", August 2009

Dynamic detection of uniform and affine vectors
in GPGPU computations
Sylvain Collange1, David Defour1 and Yao Zhang2



Instructions to add
===================

Understanding the Efficiency of Ray Traversal on GPUs
hpg09 paper
Timo aila
2 warp-wide instructions will help:
ENUM (Prefix sum) enumerates the threads (inside a warp) for which a condition
is true and returns a unique index [0;M-1] to those threads
POPC (population count)
Returns the number threads for which a condition is true, i.e. M above

Improvements for raytracing:
With ENUM + POPC, in Fairy scene
Ambient occlusion +40
%Diffuse +80%
Iff not limited by memory speed

popc also util for stream compaction see paper hpg09 stream compacted on wide simd..


Atomic Vector Operations on Chip Multiprocessors
vector atomics seems Larrabe 3 stuff
Read More
Posted in | No comments

Dificulties in coding, achieving high perf an measuring MultiGPU code!

Posted on 10:08 by Unknown
A lot of CUDA software is no MultiGPU aware:
Badaboom uses multiple GPUs for multiple videos but not for one..
Other CUDA video encoders I doubt so..

there is why:
The first problem is coding it:
if CPU code needs thread API, thread pools, etc..
GPU needs careful coding for dividing work and send to every GPU..
Also taking care of async kernels exec and mem copies and multiple streams and you get crazy..
See gpuworker cuda forums
for and easy way..
also CUda openmp example..

Achieving higp perf. if not perfectly divisible need interchange data one GPU to another..
also this requires currently host intervention and can only be minimized time if pinned mem shared to both GPU so pinned shared mem of CUDA 2.2 is need..
For clusters where a GPU to GPU transfer may need going to NICs wait for 2010 when transfers will use DMA from GPU to pinned host mem by NIC (?)..

Measuring multiGPU perf, well let's talk about it, of course we can add multiGPU support to some CUDA codes but the intrinsic problem in this and a lot of GPGPU apps lies in how you measure perf.. A lot of scores are measured with inputs and outputs are get in GPU mem.. if get in CPU mem we get no linear scaling with GPU shader count as GPU-CPU transfers are counted which amount constant time (they will improve only with PCI Express versions)..
I think Larabee perf and the CUDA matmul figures vendors show us are with data on GPUs.. with multiple GPUs you may transfer at least from one GPU to another GPU which currently there is no fast way for doing it in GPU Computing APIs and requires going through host so you would get no apple to apples comparison.. you have to compare to benches with inputs and outputs in CPU mem which anyway is not a "true" benchmark as I said before not scales with shader count..
think of it as CPU benchmarks that acounted for time of reading/ writing input data to hard disks..
note there have been great strides this year for using multiple GPUs to the point of being able to transfer data between graphics APIs in OpenGL with AMD and Nvidia propietary extensions
for Nvidia see http://www.opengl.org/registry/specs/NV/copy_image.txt
search wglCopyImageSubDataNN
for AMD see http://www.opengl.org/registry/specs/AMD/wgl_gpu_association.txt

To facilitate high performance data communication between multiple
contexts, a new function is necessary to blit data from one context
to another.

VOID wglBlitContextFramebufferAMD(HGLRC dstCtx, GLint srcX0, GLint srcY0,
GLint srcX1, GLint srcY1, GLint dstX0,
GLint dstY0, GLint dstX1, GLint dstY1,
GLbitfield mask, GLenum filter);


We can try to echange data for multiple GPUs using Computing APIs with OpenGL interop and this OpenGL extensions..
i.e CUDA OpenGL itnerop and OpenCL OpenGL interop.. note CAL OpenGL is surely coming for AMDs but currently lacking..

I have to ask vendors (Nvidia and AMD) what are they doing for developers being able to transfer data between GPUs without CPU host intervention using DMA engines.. in both CUDA and OpenCL..
note at SC09 Nvidia anounced that for spring next year you will have a solution for a similar problem: for the cluster
enviroment i.e. transfering from GPU to NICs without host intervention I think..
Read More
Posted in | No comments

Learned from HPG09 stuff!

Posted on 09:19 by Unknown
HPG09 site with program
there are slides and links to ACM paper..

is a join of Graphics Hardware and Interactive Raytracing stuff..

The three first is used by Optix (nvidia people):
1.Spatial Splits in Bounding Volume Hierarchies

new HQ GPU friendly acceleration structure option in Optix recommended dynamic data.
faster than kd-tree raytracing and faster in GPU contruction to Kd-tree?..

remember Optix also has this GPU very fast BVH:
Fast BVH Construction on GPUs

For more Optix stuff:
A. see doc folder Optix..
B. Overview: http://www.nvidia.com/docs/IO/67191/NVIRT-Overview.pdf
C. search slides and session in HPG program page..
D. NVIRT pdf in google stuff

2.Image Space Gathering
..

3.Understanding the Efficiency of Ray Traversal on GPUs

Timo aila
fastest raytracing CUDA kernels to date..
Bandwidth and no cache is not the issue for raytracing perf (lack of thereof)..
Adds persistent threads and improves current GPU imps..

2 new warp-wide instructions will help:
ENUM (Prefix sum) enumerates the threads (inside a warp) for which a condition
is true and returns a unique index [0;M-1] to those threads
POPC (population count)
Returns the number threads for which a condition is true, i.e. M above

Improvements for raytracing:
With ENUM + POPC, in Fairy scene
Ambient occlusion +40
%Diffuse +80%
Iff not limited by memory speed

Stream Compaction for Deferred Shading

Deffered shading adds the effect of code divergence of ubershaders..
Schedules shaders among conditions or shader types..
Best option uses radix sort, etc..
As future architectures add additional register store
and better switch handling, we expect the uber-kernel approach of
implicit serialization to scale better

A Parallel Algorithm for Construction of Uniform Grids


Efficient Stream Compaction on Wide SIMD Many-Core Architectures

Code: http://www.cse.chalmers.se/~billeter/pub/pp/index.html
presents like CUDPP library C++ oriented..

*Avoids explicit construction of a prefix sum with size=input data
*3x speedup previous aproaches
*Presents general SIMD width algorithms (CUDA,CAL,Larrabee)
*Presents both prefix sum and pop count based..
*Presents a CUDA Optimized version avoiding scattered writes via buffering the writes

Also I think to remember that all these things in found in parallel by Indians are worked in
Scalable Split and Gather Primitives for the GPU
which in turn is used for:
Fast Minimum Spanning Tree for Large Graphs on the GPU
i.e. new techniques for avoiding storying full scan (prefix sum) and scattering in final pass via buffering
they report more or less


CUDA:
Says popcount warp instruction not present (for a condition evualated for every element in a warp). Needs as Understanding the Efficiency of Ray Traversal on GPUs

That's true and you can't get an integer (32bit) which every bit is the condition evaualed to every element of a warp (32bit but amd wavefront 64bits)..
Really but if you could get a integer you have pop count:
__popc(x) returns the number of bits that are set to 1 in the binary representation
of 32-bit integer parameter x

In CUDa 1.2 compute and higher you get vote functions an all or nothing function for a condition..
Also CUDA 1.2 via shared shared mem atomics you can calc condition every threadid
and then do an OR atomic local mem to condition(threadid) lsh warpid
lsh says left shift..
Really CUDA 3.0 reveals ballot which perhaps is that function for returning an integer which used with popc we have pop count..

speedup vs. CUDPP

compaction 2.9× (compacts 64bit elems faster than 32bit (2x data))
Radix Sort 15% faster for >500k elems
Prefix Sum ‐ 30% faster

radix sort record
Fast Minimum Spanning Tree for Large Graphs on the GPU
This group has interesting things:

Papers:
Fast and Scalable List Ranking on the GPU
Singular Value Decomposition on GPU using CUDA
High Performance Pattern Recognition on GPU
CUDA Cuts: Fast Graph Cuts on the GPU
Accelerating Large Graph Algorithms on the GPU using CUDA

Soft:
http://cvit.iiit.ac.in/index.php?page=resources
Has cuda cuts source and example Codes for Shader Model 4.0:
Simple Geometry Shader
Simple Transform Feedback
Simple Layered Rendering
Motion Blur with Layered Rendering
Bicubic Patch Subdivision with Geometry Shader
Rendering Geometry Images with Geometry Shader
Have to test on Catalyst 10.1 with opengl 3.2 and geometry shader (current geometry shader has bugs with layers and integer tex fetches..)
See related:
Scalable Split and Gather Primitives for the GPU
A thesis more
Scalable Primitives for Data Mapping and Movement on the GPU:
http://cvit.iiit.ac.in/thesis/skpMS2009/

last thing is Nvidia people now photon mapping in image space similar to existing image space shadows and caustics..

Hardware-Accelerated Global Illumination by Image Space Photon Mapping
has code based in G3D 8.0

Efficient Depth Peeling via Bucket Sort
Fang Liu, Meng-Cheng Huang, Xue-Hui Liu, and En-Hua Wu
CUDA based there is a short paper with other technique by same authors in sigraph..
-“Single Pass Depth Peeling using CUDA Rasterizer” at SIGGRAPH 2009 talks

Data-Parallel Rasterization of Micropolygons With Defocus and Motion Blur
see post on tesellation and micropolygons..

Scaling of 3D Game Engine Workloads on Modern Multi-GPU Systems
more clear impossible..
Read More
Posted in | No comments

Nvidia driver 187.98 add new files!

Posted on 07:51 by Unknown
inf has:
Nv3DAppShExt.dll
Nv3DAppShExtR.dll
folder:
nvinit.dll
nvinitx.dll shim initialzation
nvumdshim
nvumdshimx Nvidia D3d shim drivers

someone points Bluray 3D.. I say some Windowed or other 3D Vision stuff..
Remember 195.62 adds:
npnv3dv -> 3D vision plugin for Mozilla browsers
Nv3DVisionIePlugin-> 3D vision plugin for IE browsers

to:
Adding to Stereo control api?
nvSCPAPI.dll
API 3D Vision extension..
nvStereoApiI.dll
Read More
Posted in | No comments

What I would want to know and get from vendors part III: Apple, Microsoft and Caustics

Posted on 07:05 by Unknown
Whises from Microsoft:
Access to Direct3D latest SDKs:
Currently DirectCompute compiler in latest SDK has problems with double precision compute shaders..
Seems like is fixed internally.. can we get info on when this limiting factor for GPGPU usage is going to be fixed..
I have seen shaders from Dirt2 game are compiled with later versions of the compiler than the public August SDK..
so they ship newer versions to partners.. How to be able to gain access to newer builds?..

Whises From CausticGraphics: (email recieved January coming..)
what's going with CausticGL emulation SDK it was anounced two month ago at least I signed in and I get no news from them.. This SDKs promises to put the API in developer hands without having to pay for current SDK with FPGA card (costs more than+2K euros) so I can start working with it in CPU..
This allows building soft for before a raytracing cards gets in full production in April next year as it was anounced some time ago..
That would allow someone to write a minimal wrapper for using both Optix and CausticGL SDKs and then say in April be capable of testing Raytracing on "Accelerators" and publish an apples to apples comparison on raytracing with Caustics (in case you have one) cards vs say Fermi (which by the way is supposed to bring big gains on raytracing.. and which should be avaiable at the same time..
please note that would be good to have a raytracing bench capable of using Caustic hardware and Nvidia GPUs but it's hard work (learning APIs, making commong code,etc..) so I think would be good to have this year or early next..
Note also Optix is free but now only works on Quadro GPUs but that but change soon as Fermi is supposed to work with it..
Currently there are hacks for having support for Geforce cards.. a solution I think it's on my blog..

Whises From Apple:
Currently OpenCL on Apple is somewhat immature.. and this is now that 10.6.2 is released before it it was only "usable" on Nvidia cards.. Some Apple own source code demos weren't working on ATI cards..
Now I think the most complex OpenCL published by Apple i.e. Apple FFT library has in release notes that OpenCL runtime and compiler have issues that avoid getting full expected perf of Mac and they have fixed internally.. I think at least 2x slower.. as this is published after 10.6.2 seems that new 10.6.3 will have fixed..
Also Apple OpenCL is less advanced at least for Nvidia where we have now double precision support in Windows and Linux and not in Apple..
Also Apple OpenCL lacks image support for AMD and double precision where hardware supports it..
what's going on with OpenGL support on Apple.. on case you don't know OpenGL support is still 2.1 which is more than 3 years old and that means DirectX 9.0 functionality.. well with some Direct3d 10 functionality via OpenGL extensions.. but still no OpenGL 3.x which is now more than 16 months old..
Now there are OpenGL 3.2 which jointly with some ARB extensions which were anounced in with 3.2 you get almost Direct3D 10.1 feature set..
Note OpenGL 3.2 is supported on Nvidia since four months and AMD cards support OpenGL 3.1 since August this year and are going to support 3.2 next month I think (leaked Catalyst 9.12 almost supports it..)
The situation gets worse now than Direct3D 11 cards are here and vendors are going to publish OpenGL extensions to provide Direct3D 11 features (think tesselation, "scatter" in pixel shaders, double precision in shaders, etc..) Nvidia anounced it at GTC 09..
Access to Snow Leopard developer seeds which can get you point releases 1-2 month before public releases are good?.. so if I can get better OpenGL and OpenCL improvements the earlier the better..
there are a API for exposing GPU video decoding avaiable in Snow Leopard but the problem is that is so simple it only allows (I think reading the API fast..) doing players.. would be good to have an API having some fast path for getting a pointer the decoded frames in GPU mem i.e. as an OpenGL texture, PBO, etc.. so OpenGL or OpenCL processing could be done.. note this can be done on Windows with DXVA+Direct3D shaders.. and also in Linux via VAAPI with GLX extensions..
I know seeds are currently delivered to ADC developers which is also for deep buckets (I think more than 1K-2K every year..), so I don't to become and ADC member, I'm only asking if there is a way for having access to some components of latest developer builds (graphics drivers, OpenCL and OpenGL frameworks..) having the latest OpenCL/OpenGL improvements..
So briefly I would want to ask Apple about:

*OpenCL framework fixes for Apple OpenCL FFT lib.. when..
*OpenCL with double precision in GPUs.. when..
*OpenCL Image support in AMD GPUs.. when..
*OpenGL 3.x support? when..
*GPU video decoding APIs that allow exposing decoded frames as OpenGL / OpenCL buffers for postprocessing without going it to host..
Read More
Posted in | No comments

What I would want to know and get from vendors part II: Nvidia

Posted on 06:43 by Unknown
1. Access to WGL_DX_interop OpenGL extension documentation and headers: this extension is shipping since late August in NVIDIA drivers and is very powerful as it provides a fast path between OpenGL and DirectX (interop) so stuff from one API can be seen by other with no host interaction (I mean no transfers to host as I currently needed).. it was talked at GTC 09 but spec was not released..
Also can you say a expected time of when it will be avaiable for Windows Vista/7 users since it's only avaiable on XP currently..
you may think what that could add to the mix at least two/three things:
*Access to GPU video decode (DXVA) and feed that to OpenCL with lowest overhead..
Nvidia can tell us they have CUVID with OpenGL support but assuming someday ATI supports similar extension we can have a crossvendor code path via DXVA..
*Access to current state of the art efficient fluid rendering is currently shipped as a library part of Physx Screen Saver source code..
This accepts only Direct3D interface so accessing for OpenGL needs some interop if wanting to be done efficiently..
*Accessing Direct3D 11 functionality like tesselation from OpenGL .. interchanging tesselated stuff to OpenGL all in GPU mem..

2.Access to NVAPI NDA SDK: this could enable a killer feature.. since I think since 195 Nvidia drivers NVAPI has the capability of getting GPU load and memory bus load and video decoding unit load (think Bluray GPU decode).. (only GT200 and higher and 190xx)
This is used in GPU-z 0.38 so at least some developer has access to this functionality.. I think GPU-z uses NVAPI..
NVAPI public doesn't expose this..
I think this API allows access to 3D Vision internals stuff.. (see below..)
I have tried to get access but you need to be in Nvidia Registered Developer program.. I have tried many times to sign up but I get no response.. this presumably allows to get also access to latest driver builds..

3. Access to Nexus GPU debugger beta ->released
( Doing GPGPU stuff could be done a lot easier with a GPU debugger.. Nexus was scheduled to get released in beta in October.. I have signed to the beta program but I get no response other than in late October that in two weeks we would get the beta build.. )

4.CUVENC lib headers and documentation: For having GPU video encoding.. Nvidia ships in standard drivers similar to CUVID library CUVENC library for accessing GPU hardware encoding and it's used by a lot of commercial video encoders with CUDA support.. in fact all are using this library..
the problem is that it is only exposed to partners I think.. it's not public.. I think now Windows 7 we have Windows MFT library for accessing GPU video encoding I have to test it..

5.Access to documentation about Fermi OpenGL Direct3D 11 like extensions: there is some info in GTC presentation but still no headers or things for working on "it" for real..

6. Access to 3D Vision internal APIs, thats what's Avatar game are getting i.e. I get access to ways for sending a frame to each eye bypassing Nvidia 3D driver..

more or less the same:

About Nvidia source code
========================

1.OpenCL port of the DirectCompute Ocean demo source code? it was shown in OpenCL tutorial in GTC09..
I hope as Nvidia ships DirectCompute Ocean demo source code, Nvidia Ocean OpenCL demo is going to ship soon in GPU Computing SDK..
can someone confirm that and provide us in the meantime the code?
I would love to learn the differences between DirectCompute and OpenCL from other perspective i.e. seeing such complex code (has high perf FFTs in it) side by side as
I want to make some common wrapper around DirectCompute and/or OpenCL and/or CUDA..


2. Physics demos using GPU Compute APIs either using as a base GPU enabled Bullet code (rigid bodies stuff by Harada) and/or using Phyx fluids but coding efficient fluid rendering is
complex to do..
I have seen Nvidia fluid demo (OpenGL) use this technique:

"Screen Space Fluid Rendering with Curvature Flow"
Wladimir J. van der Laan, Simon Green, Miguel Sainz
Some authors are Nvidia guys..

also seems "Physx Screen Saver" uses it (DirectX)
the code is avaiable http://files.thegamecreators.com/darkphysics/ScreenSaversource.zip
but the rendering fluid functionality is a directx based compiled lib:
dxFluidRenderLib.lib
dxFluidRenderer.h
As I want multiOS support I would love or source code of that library so I can modify for OpenGL usage or compiled OpenGL based libraries for
Win/Lin/Mac ..

3. Massiliamo Fatica of Nvidia done a port of Linpack to use both CPU+GPU load balancing them..
"Accelerating linpack with CUDA on heterogenous clusters "
in CUDA forums said that is distributed to universities.. can I get it?

About kernel binaries:
=======================
I think that's the most ridiculous question but anyway for CUDA and OpenCL we can store "compiled" kernels in PTX and launch kernels from that code..
I know that PTX is virtual isa so allows you to target multiple architectures now my question is if PTX generated by nvcc or OpenCL builtin compiler
is mature enough that can not pass that say one year ahead new OpenCL builtin compiler or new say CUDA 4.0 nvcc gets PTX that in turn provides better performance..
I hope a generated PTX generated now achieves same performance that if we compile the kernel to PTX next year..
i.e. that all optimization can be extracted from PTX code..
If not I will have at least for OpenCL to supply kernel source files and compile on the fly..

Also compiling CUDA 2.3 kernels we get PTX 1.4 and OpenCL generated PTX is v1.5 and in CUDA 3.0beta (at least for Fermi target) I seem we get PTX 2.0..
In SDK we get v1.4 doc, current CUDA 3.0 SDK beta 1 provides no PTX 1.5 nor PTX 2.0 info..
Can we get access to these new PTX specs documentation?..




About CUDA 3.0:
===============
It will be good having a module that is able to get info about specific instruction issue rate and latency similar to GPUbench
http://graphics.stanford.edu/projects/gpubench/test_instrissue.html
Well the problem lays in that there are currently some PTX instructions that aren't visible from CUDA C..
This guy for example exposes native addc instruction:
__addc / __uaddc: signed and unsigned addition-with-carry. Carry flag after addition is set automatically.
http://www.mpi-inf.mpg.de/~emeliyan/cuda-compiler/
You can find a paper where he motivates this effort for having some speedup in some integer related scientific codes
see "Efficient Multiplication of Polynomials on Graphics Hardware"..
He is providing a diff to cuda Open64 sources (2.2 I think) and also new headers..
can be this support be added so I perhaps we can instruction issue rate of this instructions..
if not I can manually compile patched sources for every architecture of our benchmark.. (Win,Lin,Mac)(x64 and x32) but I think I will do not..
I said that because some integer multiprecision libraries have a similar problem (It's impossible to access add with carry op from C without having to
add assembly code..)
Now a mix of some previous questions:
It's possible to access native add with carry in Nvidia GPUs in OpenCL?
I think the answer is no and I believe that could be fixed if there was interop between OpenCL and CUDA generated PTX code.. I would with the Cuda addc
enabled compiler compile and addc function and call that from OpenCL..
Anyway also having PTX 1.5 spec documentation will helpfully to find how to patch PTX OpenCL generated code for using that..
Yeah I know that all of this is not in the OpenCL spec support.. but anyway worth investigating..
(I will love to ask this to AMD engineers also enabling use of add with carry if existant in r8xx via use of AMD IL generated code ..)

I see that CUDA 3.0 has surface instructions cusurf..
this is Fermi stuff correct?
Seems that this instructions allow "true" writable textures (I mean without having to use CUDA 2.2 "texture from pitch linear mem" functionality)..
and so have (x,y) addressing for writing to it (so its equal in concept to DirectCompute RWTexture2D?) and presumably format conversion on read/write(?)..

The unique objection I have is I can't find in headers 3D surfaces but I hope 3D surfaces are supported similar in hardware in Fermi due to RWTexture3D in D3d 11 so
I can expect to have 3D surface functions in CUDA 3.0 with Fermi (i.e. I want "true" writable 3D textures..).. I want that to use for 3D stencil codes..
For GPU codes without this support I can use 3dfd code of Nvidia GPU Computing SDK that I think is based on:
3D finite difference computation on GPUs using CUDA
de P Micikevicius - 2009

About CUDA multicore:
I know that Nvidia is still working hard on it because of:
1.http://llvm.org/devmtg/2009-10/Grover_PLANG.pdf
"PLANG: Translating NVIDIA PTX language to LLVM IR Machine"
2.I have seen in CUDA 3.0 beta nvcc binary some strings related to multicore-llvm
seems you have switched from the idea of:
"MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs"
to a more hopefully better one i.e. translating from PTX to LLVM and then using
LLVM efficient bakends for x86..
The question is if that is going to be avaiable soon enough
This will allow me to compare perf in this mode versus check the perf of CUDA ported to OpenCL and then run on OpenCL AMD CPU backend..
or having to write efficient CPU codes..
I'm thinking in sort test examples:
Currently GPU fastest seems to be:
"Designing efficient sorting algorithms for manycore GPUs"
Nadathur Satish, Mark Harris, and Michael Garland
the code it's in CUDPP 1.1 and CUDA SDK sample already..
The problem CPU most efficient one seems to be "Efficient implementation of sorting on multi-core SIMD CPU architecture"
and care has to be taken of writing SIMD enable code..

DirectCompute questions:
=======================
As I know there a basically two models 10_x and 11_0 i.e DirectCompute 4.x and 5.0..
well the problem lays in that CUDA hasn't any restrictions on writing to shared mem and codes I plan on using presumably uses atomics on global mem (as CUDA GPUs except g80 support it.. somerecent codes use it..(?))
this code isn't going to automatically translate to DirectCompute 4.x..
This is no problem for Fermi and AMD 5xxx GPUs but as I think DirectCompute 4.x takes the "greatest common divisor" between CUDA cards and ATI 4xxx
CUDA cards are greatly in disadvantage.. so my question is if Nvidia can and want fix this issue..
Think similar as I remember to have read Nvidia enabled a d3d 10.1 feature in some driver for FarCry2 ?(related to multisample)
I mean at least it allows to compile kernels to cs_5_0 target in GT200 cards for example..
I know some things of these target aren't avaiable as shared mem size in GT200 cards for example is below required but I mean that if kernel uses GT200 hardware restrictions
(for example shared mem usage below 16K) features and requiring hardware resources avaiable in CUDA cards this could be enabled..
This could be a NDA feature(?) for example enabling cs_5_0_gt200 target (it's possible?)
Also a similar hack for enable doubles on GT200 via directcompute..

OpenCL:
======
Well I have to be frank I can find any issue worth mentioning in 195 drivers excepting:

1. I'm not happy with OpenCL Volume3D demo in Windows XP goes nearly as fast as CUDA one.. In fact I get sustained 60fps in CUDA vs 40-60 fps in OpenCL
with a 8600gts.. Note the same OpenCL Volume3D demo run at mediocre 14fps in a high end desktop with gtx 275 in OpenCL in Win7..
while the cuda demo runs at 60fps.. all 195.55 recent OpenCL drivers..
I think Linux OpenCL doesn't suffer also..
So seems the CUDA texture 3d support is good whatever OS but OpenCL Image support for 3D textures has perf issues in Vista/7 systems..
can anyone confirm if they going to fix soon or already fixed?
Doesn't seem ok to say it's because WDDM as CUDA seems not affected..
If I say this is because I want to love to code some volumetric rendering code also perhaps with 3D Vision builtin optional feature and seems that code will suffer with OpenCL backend..

2.I'm waiting for cl_khr_3d_image_writes..
is this is similar in concept to RWTexture3D I think, correct (i.e. (x,y,z) addressing etc..)?
but I think there is going to be hardware support for it only in Fermi and higher, correct?
Assuming that this is Fermi stuff will be avaiable say by Fermi launch drivers or it's already supported in 195.62 if we have a Fermi or there is no
specific time?
I think this allows high perf implementation of 3D stencil codes on d3d 11 architectures as texture is directly written using coordinates and reads
gets cached and at least this and advantage for architectures without global cache (AMD 5xxx cards)
Of course I'm aware of alternative techniques chaching neigboorhood values in shared mem and calculating the stencil from these values..

3. Could say at least if there is any way (hack) for accessing host mem from Nvidia GPUs in OpenCL backend ( pinned system mem in CUDA parlance )
I have no problem even if it's playing with PTX code..
If I say that is because I want to run kernels over big problem and
that would perhaps be lenghty in time enough so that a progress bar would be welcome.. I know of the watchdog time issue for kernels running for more
than x seconds and I think also Nvidia recommends dividing the kernel for solving this issue..
Yeah I know doesn't Nvidia recommend that..

Better would be some roadmap on an extension supporting this feature (by the way it's supported by the hardware on AMD cards also)

Mac stuff
=========

CUDA Mac:
Of course I plan to run OpenCL-OpenGL interop eneabled codes and CUDA-OpenGL ones and perf of course the CUDA benchmark on the Mac will suffer as still is going thorugh host the interop..
Can we expect it fixed sometime say before April-May 2010..

OpenCL Mac:
Can someone confirm if double extensions is going to be avaiable say in April-May 2010 (10.6.3-4?) on Nvidia GPUs GT200 GPUs for example similar to with 195 Windows and linux drivers.

OpenGL Mac:
Sorry if I'm so ignorant in this matter..
but what's the problem about Nvidia Mac drivers shipping still OpenGL 2.1 drivers
(yeah with some 3.x stuff).. I remember seeing a Nvision08 presentation by and Nvidia OpenGL guy saying coming to Mac all the OpenGL 3.0 stuff at that
time..
Nvidia is shipping in drivers download page "custom" drivers for GTx285 mac edition why can they ship custom drivers if not with 3.x support at least with all 3.0,3.1,3.2 ARB equivalent extensions and possible other Nvidia extensions..

Optix and Physx for Mac?
Assuming we want to port some simple GPU raytracing and GPU physics code can we have it working on Mac.. As nor Optix nor Physx libraries are avaiable for
Mac currently..
Read More
Posted in | No comments

Some news II (post #100!)

Posted on 06:11 by Unknown
Hi today two news more:
1.Apple seeds Snow Leopard 10.6.3 should be avaiable now!
Hopefully fixing Apple FFT perf issues.
I also would love:
OpenGL 3.x
AMD image support, AMD IL from to, and doubles for 4xxx..
Nvidia: doubles on gt200

Hi will try to get from IRC where netkas is!

2.Nvidia Fermi perf charts avaiable (fake or not?)!
Also seems A3 is the final one and Nvidia has checked it first samples and sent to manufacturing!
(This news is from my brother he lately surfs the web more deeply than me :-) thanks brother)
Read More
Posted in | No comments

What I would want to know and get from vendors part I: AMD

Posted on 05:55 by Unknown
Lot of questions for sure.. but more near term practical questions are:

1.Access to the Catalyst BETA program.. I tried to get info about that and I'm sure it exists but I can't get access to..
this would provide at least 1-2 months advantage vs regular Catalyst WHQL drivers having features early so I would get latest CAL compiler improvements (perf improvements, fixes, new features..) which in turn should benefit also OpenCL GPU backend and DirectCompute stuff.. Also new OpenGL stuff..
I think by Catalyst 10.1 is circulating to some privileged 5670 testers (GPU-z leaks last week) and that supports OpenGL 3.2 for the first time I think..

2.Acess to Catalyst Linux beta program.. for the same reason as Catalyst Windows..

3. AMD Shader Kernel Analizer with OpenCL and or DirectCompute support..:
This would allow getting me low level info of kernel issues on ATI cards..


4.Roadmap of OpenGL and OpenCL and state of OpenCL of AMD cards (expected?,when?):
all these are lacking vs Nvidia, when they will get done? (at least for 5xxx cards):
*Opencl shipping in drivers
*OpenCL with Khronos ICD model..
*opencl:image support, double precision,atomics,local atomics, byte addresable store,etc..
* OpenCL kernels built to/from AMD IL ..
* R8xx ISA and AMD IL updated documentation
*48xx cards image support and doubles (AMD,Apple)
*5xxx cards (ogl interop, doubles, byte addressing, local and global atomics, 3d_image_write when?)
State of AMD exts (global sync, wave sync,GDS) coming soon said at «Fermi» attack PowerPoint and also
*Concurrent kernel execution in 5xxx hard exposed via OpenCL command queues?

5.OpenGL stuff:
*opengl 3.2 plus 10.1 arb ext
*opengl 5xxx extensions: tesellator, random access pixel shader,etc..

6.What about DirectCompute 4.0 support for 4xxx cards?

7.Early access to OpenCL builds?: it's possible to get this?..
Read More
Posted in | No comments

physics on GPU: source code!

Posted on 05:25 by Unknown
Hi I have recopiled some source code related to physics on GPUs that's good to know:
*Bullet broadphase col det (last year) is 2.73 stuff
*Bullet Impulse code (narrowphase)(now OpenCL is 2D,3D) (CUDA in GDC 2009 anounced) comes from: harada see other web.. is 2.75 stuff
Search for a fix for Linux compiling in google.code
seems a lot of OpenCL porting interesting work is done by one guy:
http://code.google.com/u/rponom/updates

Found fluid simulation MultiGPUS:
Massive Particles: Particle-based Simulations on Multiple GPUs
http://sites.google.com/site/takahiroharada/Home/multiGPUs.pdf?attredirects=0&d=1

*Fluids: well Bullet also: SPH in CUDA comes from: http://www.rchoetzlein.com/eng/index.htm (not well optimized last time I checked)
see http://www.rchoetzlein.com/eng/graphics/fluids.htm
this seems fastest CPU imp and also new 2.0 (bullet has it?)(month old):
ver 2.0: Oct 2009
* CUDA ver 2.3 now builds with Visual Studio 2008.
The build solutions for VS2005 are no longer needed or included.
* CUDA ver 2.3 drivers are required for GPU build
* Freeglut 2.6.0 is now used instead of GLUT
* Both CPU and GPU projects are in the same Visual Studio solution.
If you do not have a CUDA GPU, you can select and build the CPU only project inside Visual Studio.
(Right click the Fluid project, select Build)
Still perf issues and bugs:

- The GPU integrator is not yet complete. Integration always takes place on the CPU, in both CPU and GPU modes. (As a result, this forces a bus transfer to and from the GPU per cycle. Once the integrator is finished, GPU simulation performance should increase significantly.)

- Occassionally, the GPU simulation with crash cuda, causing the screen to blink and particles to move randomly. This is believed to be due to a not-yet-found out of bounds condition.


*Fluids see Physx Screensaver source code for a check of efficient fluids usage and rendering (trough rendering code is binary library DirectX dependant). Rendering code from Nvidia Physx Fluid demo..
*GPU Gems 3 CD: you have rigid solid code.. compiled cuda 1.0 so doesn't work..
simple fix (?) similar as AES code is Linux only and I had to fix for Windows..

Nvidia SDKs stuff:
Well there are some demos:
*OpenGL SDK: 3D waves demo
*CUDA SDK: fft waves(oceanFFT),fluids 2d cavity(d3d,ogl),particles,sph smokeParticles
*Of course for tessendorf waves (FFT) you have Nvidia Ocean DirectCompute demo and
also OpenCL version shown at GTC09!

My some years code ago:
*RTFSS: wave equation on GPU with Fresnel raytraced refractions reflections and caustics on GPU.

Now there is on Beyond3D a similar with touch.. So I will to add to it..
What about Qt 4.6 OpenGL and touch interop?
Read More
Posted in | No comments

Saturday, 12 December 2009

OpenCL with MingW! (and more)

Posted on 11:03 by Unknown
From AMD forums:
OpenCL Mingw
============
In fact, that is quite easy to make a libOpenCL.a for MingW. I've done it, and now I can compile OpenCL examples with MingW.

The idea is to use the tool reimp found in mingw, which allows you to create an a import library for a DLL (ie create libXXX.a from a XXX.lib where XXX.lib is only the import library for XXX.dll ; I insist : it won't create a libXXX.a when XXX.lib is a general library, it only works for import libraries ; moreover, the name mangling in the DLL must not be C++ mangling : simple bare function names are OK).

1. Open a command prompt where the PATH contains the mingw\bin ; go to the ATIStreamSDK\lib\x86 where you find OpenCL.lib, and type reimp OpenCL.lib

You get OpenCL.def and libopencl.a -> this is what you want to link against.

2. When you compile your .c / .cpp using cl.h, add a compiler define _MSC_VER in order to define the stdcall convention, else the linker will not search for the good names into libopencl.a.

That's all ; it worked for me.

In case reimp tells you "bad or corrupt import lib" or something like that, you just have to use dlltool (included in mingw) to generate libopencl.a from OpenCL.def :

dlltool -l libopencl.a -d OpenCL.def -A -k

where OpenCL.def is for instance this file (.def contain export names from DLL) :

http://pastebin.com/f2ac38b2f

OpenCL and AMD constant mem
==============
Q:
I have about 32KB total worth of 16-bit (short int) constants in 4 or so lookup tables of different sizes. I'd like to be able to access them in parallel from different threads in as quick a way as possible. Architecturally it would seem like the texture cache is ideal, but if I just place them in the CL kernel file and tag it with the __constant specifier, will they be located somewhere that will be accessed quickly?
A:
Although it is not in the current release, if you place data like this in a constant address space array in the kernel file, it will be placed in a constant buffer when this gets fully implemented. The constant buffer peak is around a factor of 10x faster than the L1 speed on 770, which is ~480GB/s, but slower than register file access.
See "A compiler for parallel execution of numerical Python programs on graphics processing units"

OpenCL and CAL
================
Quick answer is yes ( OpenCL is written on top of CAL, so it can't be faster ). Full answer is a little bit longer.

On the 4xxx family with CAL you can get almost full power of the card. But you should be warned - it will be rather painfull. Documentation is really bad or missing ( with regard to optimization ) and compiler is sometimes doing strange things ( so you need it to trick it to get quality code ). On the other hand OpenCL for 4xxx is reaalllyyy bad ( lacking cached memory access and LDS ) - it's about 3x slower than Brook+.

With 5xxx family it's hard to say. There are some results suggesting ( search streamsdk forum ) that there is problem with memory transfer speed ( we will se if new CAL version will corect it ). So with exception of memory transfer you can get almost full power of 5xxx with CAL.

OpenCL on 5xxx is again a problem. In theory OpenCL on 5xxx should work like a charm ( it doesn't miss LDS, new memory access instructions ) but results are not supporting it ( maybe again problems with memory - who knows ). At the moment performance for some applications is comparable to OpenCL on 8800GT.

smallpt bugs in AMD OpenCL
==========================
smallpt 1.2 has bugs in AMD OpenCL currently fixed trunk:
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=123480&enterthread=y

Nvidia bug?:
writing to Image obtained via CL/GL
==================================
Is it possible yet to write to an image that's been obtained via clCreateFromGLTexture2D?

I get an error -30 (CL_INVALID_VALUE) when I try writing to it with clWriteImage or clEnqueueNDRangeKernel (after acquiring it), but writing to an image that's been created with clCreateImage2D works ok. Writing to buffers acquired from OpenGL works ok too.

Am I missing some extra step needed, or is it just not supported yet?

If it's not supported, is there a list of the current issues in NVidia's OpenCL implementation?
Read More
Posted in | No comments

Some news!

Posted on 09:42 by Unknown
1.BSGP v2.0 released!
2.0.0.1: New demo: RenderAnts
Debug support (uses bsgpdbg)
Multi-GPU support (-DmultiGPU)
Multiple computation capabilities support (-DPTXdevice=detect)
CUDA 2.x support (-Dptxas=20)
Exception handling
Improved error messages
So has REYES CUDA source code.. http://www.kunzhou.net/2009/renderants.pdf
also GPU debugger and MultiGPU!

2.Afterburner 1.5beta2 released!
Now with GPU load usage (says GT200 needed and 190 driver feature..)
Also gpu mem usage in 185 and higher..
so it's the ultimate GPU tool (has OC AMD beyond Catalyst and voltage control 5xxx and now GPU utilization and memory use)..

3.Complex 1.6 is supported on OpenGL for both Linux (64bit) and Windows XP (32bit and 64bit)
(so adds 32bit?)
It has a stereo app also..

4.The Future of GPU Computing Bill Dally talk with GPU predictions 2017

5.WebGL Khronos page, spec draft, and demos released!

6.
Info about ati radeon 5350 and 5570


7.CFD cuda paper of rodinia bench
http://web.cos.gmu.edu/~acorriga/pubs/gpu_cfd/aiaa_2009_4001.pdf

8.Benchmarking a GPUCV CUDA operator with Cuda Visual profiler

9. OpenGL 3.2, 1.5 GLSL minor updates.. as always see http://www.opengl.org/registry/
You can find here update gl3.h also..
* gl3.h last updated on $Date: 2009-11-19 17:45:11 -0800 (Thu, 19 Nov 2009) $

10. PGI insider December coming with:
Next Issue Highlights

PGI Accelerator Programming Model—Part 4 New Features in PGI 2010

CUDA Fortran Data Management

Building, running and debugging MSMPI programs using PGI Visual Fortran

PGI Accelerator programming model on many-core processors


PGI Accelerator programming model on many-core processors is that using CUDA multicore or is GPUs or OpenCL CPUbackend?
Read More
Posted in | No comments

String matching on GPUs!

Posted on 06:51 by Unknown
You can find
String Matching on a Multicore GPU Using CUDA
Corfu, Greece
September 10-September 12
ISBN: 978-0-7695-3788-7

Fast Exact String Matching on the GPU
Michael C. Schatz and Cole Trapnell
http://www.cbcb.umd.edu/software/cmatch/ (code, data)


Online Approximate String Matching with CUDA
Mikael Onsj¨o (mikael@is.titech.ac.jp), Yoshinori Aono
group: “patternmatchers”, supervisor: Osamu Watanabe
Tokyo Institute of Technology
(code) Approximate patternmatching with CUDA and Nvidia’s HPC T1 processor
http://odinlake.net/wordpress/?s=cuda
Small experiment with CUDA on a Tesla card of cc 1.3


kmp_algorithm (code)
I have written a string matching algorithm - the classical KMP algorithm when the model and between the main strings there are many..
http://odinlake.net/wordpress/?s=cuda
also protein search pattern algorithms are similar:
Needleman wunsh
smith waterman
Read More
Posted in | No comments

Lots of OpenCL soft coming!

Posted on 06:40 by Unknown
1.VMD 1.8.8 OpenCL support coming! (get it) See Khronos SC09 bof slides LANL!

2.nlm denoise opencl

3.MD5 OpenCL crack: oclcrack
This guy makes for mac OpenCL-z like OpenCL InfoBrowser
http://sghctoma.extra.hu/downloads/OpenCL/OpenCL%20Info%20Browser.zip

4. Mandelbrot OpenCL: mandelgpu
5.OpenCL version of AO bench
http://code.google.com/p/aobenchcl/updates/list

6.smallpt opencl cuda:

7. Also http://graphics.stanford.edu/~yoel/notes/ author of GPUbench
offers clinfo and more
Read More
Posted in | No comments

10 Raytracing GPU demos! (more or less)

Posted on 05:40 by Unknown
Almost all CUDA but one OpenCL!
Also one OpenGL based (old GPGPU) but new research (2009)
also works Nvidia only no AMD (had to port?)

1. Nvidia Optix (demos): CUDA, closed source, multiGPU capable, Win&Lin (Mac?)
2. kernels by timo aila (paper) http://www.tml.tkk.fi/~samuli/(other author)
3. bsgp gpu sdk: source, but needs to port to CUDA manually or all BSGP..
Newer Better Bulk-Synchronous GPU Programming Compiler
Has better codes in paper 2009..
4. tokaspt port of smallpt to cuda has hidden cornell box scene
5. smallpt opencl port
6. guy poland demo: binary only claims many times better than Optix!
7. Real-Time Ray Tracing with NVIDIA CUDA GPGPU and Intel Quad-Core and Celldemo some primitives cell multicore sse cuda (2007 stuff)
8. demo 2007 demo bolas code
9.
(see radius cuda)
http://www710.univ-lyon1.fr/~bsegovia/ broken
see here for code
Has been ported to Linux
see "nested struct" in Cuda linux
http://forums.nvidia.com/index.php?showtopic=80655&mode=threaded&pid=462339
Also interesting Brute Force Tesselation and Quadratic Approximation of Subdivision Surfaces

10. Stocastic photon mapping
(OpenGL)
see "Progressive Photon Mapping" [code] but only CPU
11.
http://cg.alexandra.dk/category/cuda-ray-tracer/
http://cg.alexandra.dk/2009/08/10/triers-cuda-ray-tracing-tutorial/
http://programmingcuda.blogspot.com/2009/08/simple-ray-tracing-example.html
http://programmingcuda.blogspot.com/
12. Cautic Emulation SDK

Of course the a papers and theses:

Scalable Ray Tracing with Multiple GPGPUs by Rodrigo A. Urra
http://www.uni-koblenz.de/~hrabe/ Hanno Rabe
Timo aila
http://sites.google.com/site/mkjeong/Home/cuda-raytracer
A straightforward CUDA implementation for interactive ray-tracing
Budge, B.C. Anderson, J.C.
High-speed volume ray casting with CUDA Export
by: L. Marsalek, A. Hauber, P. Slusallek
Read More
Posted in | No comments

New Nvidia tools and crossvendor GPU instrumentation info!

Posted on 05:12 by Unknown
Following last days Scenix 5.5 or 5.6 release now I have found:

SceniX 1.6.5 (11 December 2009)
also old news (?):
PerfKit 6.6: added GT21x metrics support (so D3D 10.1 and OGL 3.1 extensions also no?)
but still old Linux PerfKit 6.0..
I tested with Ubuntu 8.04 (no newer kernels support for it..) so no Unbuntu 8.10 and higher..
also I think needs as Windows Xp special instrumented drivers and Perfkit 6.0 drivers still not support GT200 arch.
hopefully one guy has backported support for newer kernels support for this driver.. but this can't fix the lack for GT200 no?

some guy uses PerfKit for getting GPU load on Windows x64 with plots but as I have said him it's also now in GPU-z 0.3.8..
the problem with PerfKit is that it's need to install Perkit SDK (runtime?) which is a no small download..
at least Windows Vista and higher come with integrated support for it in normal driver buils.. so no more downloads for normal users..
remember Windows XP needs special driver that is shipped with it so not so timely updates..
You have to enable instrumentation with a toolbar..
Also remember gDebugger needs it for gathering exntended info about Nvidia cards..
Both for Linux and Windows..
gDebugger Mac accesses luckily cross vendor OGL metrics by Apple drivers (?)
Also note I have had issues with gdebugger in x64 Windows with Nvidia instrumentation files as gDebugger was only Win x32.. I need to copy some Nvidia instrumentation DLLs from System folder to gdebugger folder or move between System and SysWow64.. you check more fast all this with included demos..
ATI instrumentation is similar.. it's built right into the drivers for Vista and higher drivers and gDebugger uses it.. the problem with Catalyst is that some releases work some not.. I think 5xxx 9.10 or 9.11 ok..
The problem that ATI support in gDebugger is Win MAc only so no Linux?
well I don't know if it's the same but for OpenGL at least you have a Catalyst extension providing metrics for OpenGL also in Linux
Also getting GPU load is better in Nvidia now with mem load but ATI load is in ADL so public(?) and also Linux compatible?
have to check if OpenCL and DirectCompute affects load metric as Nvidia..
Need to check also GPU video acceleration metric wih Nvidia with Bluray and CUVID (nvidia sdk) and CUVENC (use CUDA h.264 encoder mediacoder) stuff..
Read More
Posted in | No comments

Thursday, 10 December 2009

About Catalyst 9.12 and 10.1!

Posted on 12:18 by Unknown
Catalyst 9.12: 16 december!
Catalyst 10.1: 7 january.
The greatest change will be the opportunity where you can enable 3D representation in movies / videos, of which we have already been reported.
Furthermore, should include the Catalyst 10.1 full OpenGL 3.2 support.
Read More
Posted in | No comments

Wednesday, 9 December 2009

CUDA 3.0 forums stuff!

Posted on 15:33 by Unknown
1.Getting CUBIN instead of ELF
If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_ELF" and use your old tools like decuda and what not.
Go to the top of the page

2.Does anyone know if the CUDA 3.0 docs have been added to the 3.0 beta toolkit yet?
they haven't been refreshed, no.

2. Bugs in Linux
Has anyone managed to compile code with 3.0 beta without -fpermissive flag on linux? It bugs me to use that flag, without it however I can't compile my applications.
Read More
Posted in | No comments

Upcoming GPU tutorials!

Posted on 15:29 by Unknown
Siggraph Asia 2009
http://www.siggraph.org/asia2009/gpu/

Introduction to OpenCL by John Stone
Thursday, December 10, 2009 3:00 PM - 4:30 PM CST
https://www2.gotomeeting.com/register/437880426

more..
Read More
Posted in | No comments

News from the web! (9 December)

Posted on 15:21 by Unknown
Nvidia news:
1.Nexus beta1 released (GPU debugger): See posts below
2.Fermi stuff: These are old news but I forgot to say that at SC09 while announcing Tesla Fermi cards Nvidia also posted a photo of a Fermi Geforce card running Direct3D 11 based Heaven benchmark on her Facebook channel.. Seems DX11 mode as character is tesselated.. Also Tesla Fermi cards announcement said Fermi Geforce GPUs are coming in Q1/2010.. so before April.. at the time fudzilla claimed December/January avaiability.. now Vr-Zone reports partners saying March launch date..
Yesterday also a photo of SLI Fermi based setup running Heaven posted on Nvidia twitter page..
http://www.tcmagazine.com/comments.php?id=31382
And today rumors of graphics specs of GTX 360 and GTX 380:
http://dl.dropbox.com/u/1416327/nvidia_geforce_gtx_380_360_specs.jpg
3.Physx runtime 12/11/2009 posted on PhysX member area some days ago.. it's supposed to bring APEX features and more..
Now shipping WHQL with full release notes!
In addition, according to our sources, PhysX System Software 9.09.1112 will be included into next release of public GeForce drivers.
http://physxinfo.com/news/976/physx-system-software-9-09-1112-available/
4.Also a Q/A by Nvidia confirms Nvidia doesn't care about an OpenCL or DirectCompute port for now for cross vendor portability
http://physxinfo.com/news/966/nvidia-qa-roundup-physx-wont-support-opencl-or-directcompute-any-time-soon/
5. Bluray 3D support with GT240 and Fermi cards (see below)..

gbauschene keeps posting new builds of XBVA VAAPI backend (0.5.4) so perhaps now 58xx compatibility..

New SC09 cuda course presentation Supercomputing 2009 birds-of-a-feather session on “The Art of Performance Tuning for CUDA and Manycore Architectures”
http://www.cs.virginia.edu/~skadron/Papers/cuda_tuning_bof_sc09_final.pdf
following last week
Supercomputing 2009 Tutorial: "High-Performance Computing with CUDA"

This past days we have seen two important things about 3D Bluray which confirm they are working hard on getting it ready for prime time in Q4/10 possibly..
First was anouncement by Sony that they are going to film some matches of Fifa World Cup 2010 in FullHD 3D and better they are producing a official 3D Film in Bluray 3D.. so perhaps this is going to be one of the first Bluray films.. jointly with Avatar?
From press:
Sony Pictures Home Entertainment plans to produce and distribute the Official 3D Film on the Blu-ray Disc*[1] and other formats.

*[1] 3D specification of BD is under consideration by the BDA (Blu-ray Disc Association).
Also there is an anouncement of AMD showcasing bluray 3d jontly with Cyberlink (the player software?) at CES 2010 and I think support for GPU video decoding of the format H264 MVC in upcoming AMD hardware (r9xx?)..
Today Nvidia said also GT240 and Fermi have Bluray 3d support and a lot of players:
Corel WinDvd, Arcsoft and Cyberlink supporting it..
http://www.engadget.com/2009/12/09/nvidia-shows-its-3d-blu-ray-readiness-in-run-up-to-ces-acer-dem/
acer 3d vision fullhd 24inch mointor
http://money.cnn.com/news/newsfeeds/articles/marketwire/0567133.htm

Seems bullet GPU enabled physics demo can't possibly compile in non Windows systems:
bullet gpu3ddemo
It fixes the problem for me when I add 'pthread' to the list of libraries
in the CMakeLists.txt file in the Demos/Gpu3dDemo/ directory.

Seems Voxilla (guy doing exciting DirectCompute demos and at the same time extracting the power avaiable in new ATi cards) is working on a new demo: now from 400^3 wave simulation to a 200^3 fluid simulation I don't know if particle based (grid resolution could be for neighboorhood info or surface extraction) but I think is no free surface so a smoke simulation seems more plausible:
The real challenge IMHO is simulation of volume fluids, like on a 3d grid.
I've been working on one with a 200x200x200 grid, I'll post it in the near future.


Users interested in latest Gallium3D architecture which will bring OpenGL es 2.0 and OpenCL and OpenGL 3x to open source drivers has presentations:
http://www.lunarg.com/wordpress/technologies/gallium-3d/gallium3d-online-developers-workshop/

http://physxinfo.com/news/1000/popular-physics-engines-comparison-physx-havok-and-ode/
Read More
Posted in | No comments
Newer Posts Older Posts Home
Subscribe to: Posts (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ►  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ▼  2009 (125)
    • ▼  December (51)
      • GPU computing on AMD.. an history perspective!
      • Catalyst 9.12: hotfix (III)
      • Catalyst 9.12 Linux and Windows links and release ...
      • Source code of DirectCompute bechmark(OpenCL and D...
      • Catalyst 9.12 adds OpenGL 3.2 support (and more..)!
      • 16/12 news!
      • Catalyst 9.12 released
      • PS3 OpenCL work and AMD OpenCL ICD
      • Christmas Wish list (I): Monitors
      • 3d Stereoscopic players!
      • Today news!
      • What will I do if I have 3D Vision OpenGL QB
      • GLEW,GLUT,Freeglut, MesaGLUT and more
      • Nvidia 195 new drivers and Flash player beta 2!
      • Running ATI GPUs in Sisoft Sandra 2010!
      • Memcheck GPUs!
      • Emulate 3D kernel launch grid
      • things found in CUDA forums
      • Siggraph 2009 (Asia too..)!
      • Architecture ideas for future GPUs!
      • Dificulties in coding, achieving high perf an meas...
      • Learned from HPG09 stuff!
      • Nvidia driver 187.98 add new files!
      • What I would want to know and get from vendors par...
      • What I would want to know and get from vendors par...
      • Some news II (post #100!)
      • What I would want to know and get from vendors par...
      • physics on GPU: source code!
      • OpenCL with MingW! (and more)
      • Some news!
      • String matching on GPUs!
      • Lots of OpenCL soft coming!
      • 10 Raytracing GPU demos! (more or less)
      • New Nvidia tools and crossvendor GPU instrumentati...
      • About Catalyst 9.12 and 10.1!
      • CUDA 3.0 forums stuff!
      • Upcoming GPU tutorials!
      • News from the web! (9 December)
      • Compiling the CUDA compiler!
      • Understanding Nvidia GT200 GPU and CUDA implementa...
      • Open Source GPU Computing benchmarks
      • CUDA TopCoder contest stuff (with source code of t...
      • CUDPP news!
      • DirectCompute stuff!
      • Nvidia GPU computing news!
      • GPU Computing calendar for December 09 and January...
      • Nexus FAQ!
      • Nvidia Nexus beta1 GPU debugger shipped!
      • GPU virtualization (and what to expect in VMs)!
      • AMD OpenCL news! (almost all..)
      • News posted 2/12/2009! (megacompilation)
    • ►  November (53)
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile