GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Sunday, 13 December 2009

Dificulties in coding, achieving high perf an measuring MultiGPU code!

Posted on 10:08 by Unknown
A lot of CUDA software is no MultiGPU aware:
Badaboom uses multiple GPUs for multiple videos but not for one..
Other CUDA video encoders I doubt so..

there is why:
The first problem is coding it:
if CPU code needs thread API, thread pools, etc..
GPU needs careful coding for dividing work and send to every GPU..
Also taking care of async kernels exec and mem copies and multiple streams and you get crazy..
See gpuworker cuda forums
for and easy way..
also CUda openmp example..

Achieving higp perf. if not perfectly divisible need interchange data one GPU to another..
also this requires currently host intervention and can only be minimized time if pinned mem shared to both GPU so pinned shared mem of CUDA 2.2 is need..
For clusters where a GPU to GPU transfer may need going to NICs wait for 2010 when transfers will use DMA from GPU to pinned host mem by NIC (?)..

Measuring multiGPU perf, well let's talk about it, of course we can add multiGPU support to some CUDA codes but the intrinsic problem in this and a lot of GPGPU apps lies in how you measure perf.. A lot of scores are measured with inputs and outputs are get in GPU mem.. if get in CPU mem we get no linear scaling with GPU shader count as GPU-CPU transfers are counted which amount constant time (they will improve only with PCI Express versions)..
I think Larabee perf and the CUDA matmul figures vendors show us are with data on GPUs.. with multiple GPUs you may transfer at least from one GPU to another GPU which currently there is no fast way for doing it in GPU Computing APIs and requires going through host so you would get no apple to apples comparison.. you have to compare to benches with inputs and outputs in CPU mem which anyway is not a "true" benchmark as I said before not scales with shader count..
think of it as CPU benchmarks that acounted for time of reading/ writing input data to hard disks..
note there have been great strides this year for using multiple GPUs to the point of being able to transfer data between graphics APIs in OpenGL with AMD and Nvidia propietary extensions
for Nvidia see http://www.opengl.org/registry/specs/NV/copy_image.txt
search wglCopyImageSubDataNN
for AMD see http://www.opengl.org/registry/specs/AMD/wgl_gpu_association.txt

To facilitate high performance data communication between multiple
contexts, a new function is necessary to blit data from one context
to another.

VOID wglBlitContextFramebufferAMD(HGLRC dstCtx, GLint srcX0, GLint srcY0,
GLint srcX1, GLint srcY1, GLint dstX0,
GLint dstY0, GLint dstX1, GLint dstY1,
GLbitfield mask, GLenum filter);


We can try to echange data for multiple GPUs using Computing APIs with OpenGL interop and this OpenGL extensions..
i.e CUDA OpenGL itnerop and OpenCL OpenGL interop.. note CAL OpenGL is surely coming for AMDs but currently lacking..

I have to ask vendors (Nvidia and AMD) what are they doing for developers being able to transfer data between GPUs without CPU host intervention using DMA engines.. in both CUDA and OpenCL..
note at SC09 Nvidia anounced that for spring next year you will have a solution for a similar problem: for the cluster
enviroment i.e. transfering from GPU to NICs without host intervention I think..
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ►  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ▼  2009 (125)
    • ▼  December (51)
      • GPU computing on AMD.. an history perspective!
      • Catalyst 9.12: hotfix (III)
      • Catalyst 9.12 Linux and Windows links and release ...
      • Source code of DirectCompute bechmark(OpenCL and D...
      • Catalyst 9.12 adds OpenGL 3.2 support (and more..)!
      • 16/12 news!
      • Catalyst 9.12 released
      • PS3 OpenCL work and AMD OpenCL ICD
      • Christmas Wish list (I): Monitors
      • 3d Stereoscopic players!
      • Today news!
      • What will I do if I have 3D Vision OpenGL QB
      • GLEW,GLUT,Freeglut, MesaGLUT and more
      • Nvidia 195 new drivers and Flash player beta 2!
      • Running ATI GPUs in Sisoft Sandra 2010!
      • Memcheck GPUs!
      • Emulate 3D kernel launch grid
      • things found in CUDA forums
      • Siggraph 2009 (Asia too..)!
      • Architecture ideas for future GPUs!
      • Dificulties in coding, achieving high perf an meas...
      • Learned from HPG09 stuff!
      • Nvidia driver 187.98 add new files!
      • What I would want to know and get from vendors par...
      • What I would want to know and get from vendors par...
      • Some news II (post #100!)
      • What I would want to know and get from vendors par...
      • physics on GPU: source code!
      • OpenCL with MingW! (and more)
      • Some news!
      • String matching on GPUs!
      • Lots of OpenCL soft coming!
      • 10 Raytracing GPU demos! (more or less)
      • New Nvidia tools and crossvendor GPU instrumentati...
      • About Catalyst 9.12 and 10.1!
      • CUDA 3.0 forums stuff!
      • Upcoming GPU tutorials!
      • News from the web! (9 December)
      • Compiling the CUDA compiler!
      • Understanding Nvidia GT200 GPU and CUDA implementa...
      • Open Source GPU Computing benchmarks
      • CUDA TopCoder contest stuff (with source code of t...
      • CUDPP news!
      • DirectCompute stuff!
      • Nvidia GPU computing news!
      • GPU Computing calendar for December 09 and January...
      • Nexus FAQ!
      • Nvidia Nexus beta1 GPU debugger shipped!
      • GPU virtualization (and what to expect in VMs)!
      • AMD OpenCL news! (almost all..)
      • News posted 2/12/2009! (megacompilation)
    • ►  November (53)
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile