GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 26 November 2009

News from the web II (big compilation)

Posted on 02:25 by Unknown
*SC09 news:
*Larrabee perf exposed on matmul and sparse matvec mul..
On the SGEMM single precision, dense matrix multiply test, Rattner showed Larrabee running at a peak of 417 gigaflops with half of its cores activated (presumably the 80-core processor the company was showing off last year); and with all of the cores turned on, it was able to hit 805 gigaflops. As the keynote was winding down, Rattner told the techies to overclock it, and was able to push a single Larrabee chip up to just over 1 teraflops, which is the design goal for the initial Larrabee co-processors.
Here's the next problem. Sparse matrix math is what is commonly needed in simulations involving cloth and water. And on that test, a Larrabee chip that was not overclocked was able to do between 7.9 and 8.1 gigaflops, depending on the test and the size of the matrices.

But what he did say is that the Ct dialect of C++ that Intel has created will be going into beta soon to help with the parallelization of C++ code to run on multicore and multithreaded processors, and more importantly, to spread code across CPUs and GPU-based co-processors in workstations and services to maximize performance as transparently as possible. Ct will work in conjunction with the CUDA environment from Nvidia for its GPUs and for the OpenCL environment being pushed by Advanced Micro Devices and others.

Intel is also cracking the issue of sharing data between Core and Xeon CPUs and Larrabee GPU co-processors. Future Core and Xeon chips will be able to create a virtual shared memory pool that both the CPU and GPU can access so datasets are not crunched down, serialized, and moved over the PCI-Express bus from the CPU to the GPU and then back again after calculations are done. The shared virtual memory allows the CPU and GPU to work off the same data in sequence without any movement, which should radically improve performance and smooth out simulations.
*Improved Nvidia GPUs and Infiniband interop: it allows to use pinned mem for both GPU and Infiniband devices (Mellanox drivers and CUDA release around Q2 2010).. avoids copy on host mem.. or avoiding pinned mem.. still lacking general way of using GPU DMAs to send to other DMA devices
*CUDA 3.0beta and drivers public..
*OpenMP work towards 3.1-4.0

*Magma 0.2 released without source (expect in december) still no OpenCL support..
* LU, QR, and Cholesky factorizations in both real and complex arithmetic (single and double);
* LQ and QL factorizations in real arithmetic (single and double);
* Linear solvers based on LU, QR, and Cholesky in real arithmetic (single and double);
* Mixed-precision iterative refinement solvers based on LU, QR, and Cholesky in real arithmetic;
* Reduction to upper Hessenberg form in real arithmetic (single and double);
* MAGMA BLAS in real arithmetic (single and double), including gemm, gemv, symv, and trsm.
See:
http://icl.cs.utk.edu/projectsfiles/magma/pubs/MAGMA-BLAS-SC09.pdf
http://icl.cs.utk.edu/projectsfiles/magma/docs/magma_roadmap.pdf
*Cula 1.1:
Here is a subset of the improvements that have made it into this release:

* Exciting new functions including general Eigensolver (Premium Feature)
* Bridge interface for migrating currently existing LAPACK/MKL code
* Better documentation including a full API reference
* New examples constructed from user feedback
* More performance!
* Mac OS X support (Preview)
eigensolvers in pro version
now supports Mac though only Leopard and single precision: what about Snow Leopard and double precision?
*OpenMM 1.0beta released: OpenCL preliminary support.. still no binaries with it!
This release adds support for Particle Mesh Ewald, arbitrary forms for non-bonded interactions, and preliminary support for OpenCL.
*Apple OpenCL FFT lib: seems very high perf. only Mac..perf issues until 10.6.3?
Currently supports 1D, 2D, 3D batched complex-to-complex transforms (inverse and forward) both in-place and out-of-place transforms.

Using plannar and interleaved data format but current only supports transform on GPU device. Accelerate framework can be used on CPU.

Current version supports sizes that fits in device global memory although "Twist Kernel" is included in fft plan if user wants to virtualize (implement sizes larger than what can fit in GPU globalmemory).
*gpu-z 0.37: Shows DirectCompute (supported version also) and OpenCL check boxes.. OpenCL ati is not detected..
*gbench 1.0 released based on Matlab jacket product similar to matlab bench builtin func and works wothout Matlab also..
Checks FFT, Dense blas, bench..
Benchmarks include six different tasks, common to the technical computing community:

1. LU: LU decomposition of 1024 x 1024 matrix
2. FFT: Fast Fourier Transform of a 2^20 x 1 vector
3. BLAS: Matrix multiplication of two 1024x1024 matrices
4. 3D Conv: Convolution of 64x64x64 array with 3x3x3 kernel
5. FOR/GFOR: Matrix-vector multiplication of 1024x1024x32 array
6. Equations: Solution of a system of 1024 equations
*3D Vision news:
->Avatar demo with 3D Vision builtin is impressive tough goes from 60 to 20 fps
though have to use d3d10 path 9 seems fixed in 195.62
->3D vision on Linux supported for quadro cars on 195.22 (quadro only and requires mini din connector and connected before x starts no hotplug)
-> 3D vision 195.55 and higher ship with browser plugins (IE,firefox) for 3d photos and also upcoming windowed support.. see tweaktown..
*Nvidia released 195.62 WHQL candidate and 195.22 for Linux public..
*AMD released 9.11 WHQL CAL supports OpenCL
*Direct3D 11 benchmark for Stalker
*PGI 2010: CUDA fortran and accelerator model for Windows and MAC and stable for Linux
*Khronos OpenCL BOF presentations posted: especially interesting are LANL pdf showing perf of molecular code of VMD (electrostatic potential) on both Intel SSE multicore,OpenCL (CPU,AMD,NVIDIA and also Cell)..
What you learn:
shows perf issues on Cell about lacking __constant and how to overcome this..
shows tables of perf of all this arch.
points key issues in OpenCL right now
Fermi as a GPU:
http://techreport.com/articles.x/17815

Posters about GPU computing
of GTC
of SC09

Porting a efficient bit library in CUDA (with preliminary perf)
http://bmagic.sourceforge.net/bmcudasse2.html


Implementing integer multiprecision in OpenCL
on cuda: "Implementation of Multiple-precision Modular Multiplication on GPU" Kaiyong Zhao
see poster:
http://www.nvidia.com/content/GTC/posters/87__Kaiyong_Implementation_of_Multiple-precision.png

There are also work by on Daniel Bernstein Elliptic curves and also on RSA both in Eurocrypt 2009 conference..
interested also are mpir gpu

Source code of DCGN – Message Passing on GPUs released (old news):
http://jeff.bleugris.com/journal/2009/06/02/looking-for-dcgn/
http://jeff.bleugris.com/journal/projects/
know that I don't know if code is updated but if not is somewhat bad since CUDA 2.2 introduced pinned host mem for GPU accessing that an avoiding polling the CPU and doing cudamemcpy gpu->cpu for inspecting if GPU has new things to do.. now polling is done on CPU mem and GPU writes to CPU mem..

fem codes on CUDA:
http://sites.google.com/site/monkology/gpuprogramming-project3-final


papers/posters:
fluid on GPU by Michael griebel as poster on GTC09
indexing the internet with gpu (cuda zone)


Posted on Apple OpenGL forums:
Here is a simple example that uses GLUT, it reads a png image (arg1) creates a source and dest texture/image, then uses a kernel to clip out the red.
example of simple cg-gl interop

This is interesting a year ago I was searching on bulding WRF on Windows.. there was some efforst some years ago but overall it's was a hacky port and also with old base code..
This was for testing WRF perf of CUDA ports of physics microkernels WSM5..
tere is a web page:
now PGI has done my dreams come true and provides a very clean patch file for latest WRF (3.1.1) for compiling on latest PGI compilers.. I think 9.0-4 or higher but now 10.0 should also support it.. anyway it's good news for Windows users and I want to obtain a VS2008port from this.. it may need some work for lot less than ever.. see in PGI October newsletter..
"Porting the Weather Research and Forecasting Application to Microsoft Windows Using PGI Workstation"
Also is good to know that this has been dome for the same purpose as I wanted.. to test WRF working on GPU now with the Accelerator model..
There is another article on the same newsletter..

ATI 9.12beta (8.68) only XP
includes ATI CAL 1.4.492 vs. OpenCL beta4 CAL (1.4.467)

Windows guest drivers for KVM
http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers

Virtual texturing demos
http://linedef.com/personal/demos/?p=virtual-texturing

Hierarchical voxel rendering demo
http://linedef.com/personal/demos/?p=hierarchical-voxel-rendering
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ►  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ▼  2009 (125)
    • ►  December (51)
    • ▼  November (53)
      • Two big games coming today: State of the art Direc...
      • News from the web (IV) (big compilation)
      • Wishes in GPU drivers before Q2 2009!
      • CUDA Atomics perf!
      • GPU Compute benchmark results!
      • Interesting AMD Stream forums posts! (old posts)
      • Testing my apps with 8600GTS and WinXP!
      • A lot of Catalyst AMD drivers!
      • News from the web III
      • News from the web II (big compilation)
      • News from OpenCL forums!
      • Bugs in OpenGL AMD drivers: Geometry shader and te...
      • Testing LDS perf in OpenCL!
      • OpenCL bugs!
      • Benchmarking OpenCL and DirectCompute!
      • Benchmarking stientific kernels on OpenCL!
      • News from the web!
      • OpenCL learning and tutorials!
      • Porting CUDA to OpenCL!
      • GPU computing programming contests..
      • AMD 5xxx series overclocking..
      • OpenCL on Apple: update!
      • State of the blog..
      • Places where OpenCL shines!
      • Running Optix with Geforce in Linux
      • New exciting soft and info coming this year!
      • Matmul bench for CUDA, CAL, and MultiCore CPUs!
      • More than 10 places where DX Compute 5.0 is better...
      • CUDA 3.0 has CUBLAS functions for MAGMA with compl...
      • About IBM OpenCL
      • OpenGL interop perf in CUDA and OCL in Linux
      • Fraps like for Linux and for Windows DX11!
      • opencl/opengl linux interop! seen in opencl cuda 3...
      • AMD OpenCl forums (I)
      • About CUDA 3.0 (II)
      • About CUDA 3.0 (I)
      • CAL 2.0 vs 1.4 API
      • Naive OpenCL benchmarks..
      • Managing AMD OpenCL GPU devices and OpenCL backend...
      • About Xvba VAAPI backend..
      • CUDA 3.0 released
      • About Khronos ICD model..
      • Exploring Nvidia OpenCL 195.39 drivers:Bugs , perf...
      • Nvidia OpenCL samples with AMD OpenCL drivers!
      • Nvidia OpenCL samples on Nvidia 195 OpenCL drivers!!
      • AMD OpenCL samples on Nvidia 195 OpenCL drivers!!
      • Optix and OpenCL SDKs with Visual Studio 2010
      • OpenCL on AMD GPUs!
      • Dreaming about Ubuntu 10.04
      • News from the web!
      • OpenCL-z is here!
      • Port of Apple demos to Windows..
      • Shared memory names..
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile