GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Saturday, 12 December 2009

OpenCL with MingW! (and more)

Posted on 11:03 by Unknown
From AMD forums:
OpenCL Mingw
============
In fact, that is quite easy to make a libOpenCL.a for MingW. I've done it, and now I can compile OpenCL examples with MingW.

The idea is to use the tool reimp found in mingw, which allows you to create an a import library for a DLL (ie create libXXX.a from a XXX.lib where XXX.lib is only the import library for XXX.dll ; I insist : it won't create a libXXX.a when XXX.lib is a general library, it only works for import libraries ; moreover, the name mangling in the DLL must not be C++ mangling : simple bare function names are OK).

1. Open a command prompt where the PATH contains the mingw\bin ; go to the ATIStreamSDK\lib\x86 where you find OpenCL.lib, and type reimp OpenCL.lib

You get OpenCL.def and libopencl.a -> this is what you want to link against.

2. When you compile your .c / .cpp using cl.h, add a compiler define _MSC_VER in order to define the stdcall convention, else the linker will not search for the good names into libopencl.a.

That's all ; it worked for me.

In case reimp tells you "bad or corrupt import lib" or something like that, you just have to use dlltool (included in mingw) to generate libopencl.a from OpenCL.def :

dlltool -l libopencl.a -d OpenCL.def -A -k

where OpenCL.def is for instance this file (.def contain export names from DLL) :

http://pastebin.com/f2ac38b2f

OpenCL and AMD constant mem
==============
Q:
I have about 32KB total worth of 16-bit (short int) constants in 4 or so lookup tables of different sizes. I'd like to be able to access them in parallel from different threads in as quick a way as possible. Architecturally it would seem like the texture cache is ideal, but if I just place them in the CL kernel file and tag it with the __constant specifier, will they be located somewhere that will be accessed quickly?
A:
Although it is not in the current release, if you place data like this in a constant address space array in the kernel file, it will be placed in a constant buffer when this gets fully implemented. The constant buffer peak is around a factor of 10x faster than the L1 speed on 770, which is ~480GB/s, but slower than register file access.
See "A compiler for parallel execution of numerical Python programs on graphics processing units"

OpenCL and CAL
================
Quick answer is yes ( OpenCL is written on top of CAL, so it can't be faster ). Full answer is a little bit longer.

On the 4xxx family with CAL you can get almost full power of the card. But you should be warned - it will be rather painfull. Documentation is really bad or missing ( with regard to optimization ) and compiler is sometimes doing strange things ( so you need it to trick it to get quality code ). On the other hand OpenCL for 4xxx is reaalllyyy bad ( lacking cached memory access and LDS ) - it's about 3x slower than Brook+.

With 5xxx family it's hard to say. There are some results suggesting ( search streamsdk forum ) that there is problem with memory transfer speed ( we will se if new CAL version will corect it ). So with exception of memory transfer you can get almost full power of 5xxx with CAL.

OpenCL on 5xxx is again a problem. In theory OpenCL on 5xxx should work like a charm ( it doesn't miss LDS, new memory access instructions ) but results are not supporting it ( maybe again problems with memory - who knows ). At the moment performance for some applications is comparable to OpenCL on 8800GT.

smallpt bugs in AMD OpenCL
==========================
smallpt 1.2 has bugs in AMD OpenCL currently fixed trunk:
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=123480&enterthread=y

Nvidia bug?:
writing to Image obtained via CL/GL
==================================
Is it possible yet to write to an image that's been obtained via clCreateFromGLTexture2D?

I get an error -30 (CL_INVALID_VALUE) when I try writing to it with clWriteImage or clEnqueueNDRangeKernel (after acquiring it), but writing to an image that's been created with clCreateImage2D works ok. Writing to buffers acquired from OpenGL works ok too.

Am I missing some extra step needed, or is it just not supported yet?

If it's not supported, is there a list of the current issues in NVidia's OpenCL implementation?
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ►  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ►  January (14)
  • ▼  2009 (125)
    • ▼  December (51)
      • GPU computing on AMD.. an history perspective!
      • Catalyst 9.12: hotfix (III)
      • Catalyst 9.12 Linux and Windows links and release ...
      • Source code of DirectCompute bechmark(OpenCL and D...
      • Catalyst 9.12 adds OpenGL 3.2 support (and more..)!
      • 16/12 news!
      • Catalyst 9.12 released
      • PS3 OpenCL work and AMD OpenCL ICD
      • Christmas Wish list (I): Monitors
      • 3d Stereoscopic players!
      • Today news!
      • What will I do if I have 3D Vision OpenGL QB
      • GLEW,GLUT,Freeglut, MesaGLUT and more
      • Nvidia 195 new drivers and Flash player beta 2!
      • Running ATI GPUs in Sisoft Sandra 2010!
      • Memcheck GPUs!
      • Emulate 3D kernel launch grid
      • things found in CUDA forums
      • Siggraph 2009 (Asia too..)!
      • Architecture ideas for future GPUs!
      • Dificulties in coding, achieving high perf an meas...
      • Learned from HPG09 stuff!
      • Nvidia driver 187.98 add new files!
      • What I would want to know and get from vendors par...
      • What I would want to know and get from vendors par...
      • Some news II (post #100!)
      • What I would want to know and get from vendors par...
      • physics on GPU: source code!
      • OpenCL with MingW! (and more)
      • Some news!
      • String matching on GPUs!
      • Lots of OpenCL soft coming!
      • 10 Raytracing GPU demos! (more or less)
      • New Nvidia tools and crossvendor GPU instrumentati...
      • About Catalyst 9.12 and 10.1!
      • CUDA 3.0 forums stuff!
      • Upcoming GPU tutorials!
      • News from the web! (9 December)
      • Compiling the CUDA compiler!
      • Understanding Nvidia GT200 GPU and CUDA implementa...
      • Open Source GPU Computing benchmarks
      • CUDA TopCoder contest stuff (with source code of t...
      • CUDPP news!
      • DirectCompute stuff!
      • Nvidia GPU computing news!
      • GPU Computing calendar for December 09 and January...
      • Nexus FAQ!
      • Nvidia Nexus beta1 GPU debugger shipped!
      • GPU virtualization (and what to expect in VMs)!
      • AMD OpenCL news! (almost all..)
      • News posted 2/12/2009! (megacompilation)
    • ►  November (53)
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile