GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Saturday, 16 January 2010

Some suggestions questions and problems I have..

Posted on 13:23 by Unknown
Please fix this issues.. making and almost perfect OpenCL SDK..

This are the things that are most wished for me to be fixed:

improvements:
0. Support kernels with a loop with a lot of MADS for testing peak flops: this gets long compile times-> kernel in CUDA compiles fast..
1. Ship an up to date ICD compatible with AMD one i.e fix ICD for detecting also AMD backend.. (or AMD ship fixed OCL iCD dll)..
2. expose
clGetGLContextInfoKHR(cl_context_properties *properties,
cl_gl_context_info param_name,
size_t param_value_size,
void *param_value,
size_t *param_value_size_ret)

is not in hearders, .lib and also not exported in khronos .dlls

3. Add DirectCompute ocean demo to OpenCL port in GTC09 (shown): i.e are the plans to publish OpenCL port of DirectCompute ocean demo shown in GTC OpenCL course..
4. Ship a driver compatible with new Nvidia DirectX interop extensions
5. fp_16 and 3d_image_write extensions?


ocl compiler bugs:

1. and bug in ATI AES sample.. see:

Thanks. Also, I've found a way to fix AESEncryptDecrypt sample to pass test on nvidia: just replace

CODE
unsigned char hiBitSet = (a & 0x80);
with
unsigned char hiBitSet = ((a>127)?128:0);
in AESEncryptDecrypt_Kernels.cl
It looks weird, but it works


2. fft apple lib see: http://forums.nvidia.com/index.php?showtopic=153544
Take a look at fft_base_kernels.h, see line 4 of "baseKernels", the complexMul line.
The define seems to be too complicated to the NVidia OpenCL compiler, I replaced the define by a function and it's now working:

CODE
float2 complexMul(float2 a,float2 B) { return (float2)(mad(-(a).y, (B).y, (a).x * (B).x), mad((a).y, (B).x, (a).x * (B).y));}


3. kernels without parameters don't compile

bugs in SDK:
1. samples get platformID but have to set parameter to NULL for working on non Nvidia imp (AMD imp.)..
or fix the function for setting to NULL at first..
2. Oclutils: getdevice(i) check num devices but returns wrong data if i=num devices due to incorrect check if(i>numdevices) error..
3. Shrutils: findfilepath if you put absolute path "c:\.." fails due to adding ".\" you have to add "" to add paths..


About DX11 OIT demo.. crahsing for me..
Hi have seen from AMD forums DX11 OIT demo..
Well the demo crashes with:
DXGI_ERROR_INVALID_CALL
Failed to resize swap chaing..
I have

Windows 7 x64
AMD 5850
Catalyst 9.12 hotfix and 10.1
DirectX runtime august 2009

What's the problem..


update: answer from author:
Hi rtfss!

Yes, there was a serios bug around BGRA/RGBA formats. I don't know why
it works on Windows 7 32-bit.

There fixed and slightly optimized demo demo (actually I further
optimize it as much as possible):

http://rapidshare.de/files/49006316/oit_dx11.zip.html

If it works, please let me know, and I re-upload it for public
community as soon as possible.
Some questions about VAAPI, VDPAU, XbVA?

I want to know a lot about GPU video decode stuff in Linux.. i'm asking some questions..
Basically I have doubts about using/learning VDPAU or VAAPI depending of these features:
*OpenGL interop overhead..
*Dual HD stream decode support..
*(this is your opinion) possible future support by the API of H.264 MVC (multiview codec)..

First I have read some time ago that VAAPI added GL interop so now is returning all frames as OGL textures, right?.. I think I have read that
AMD is only working trough OGL backend but has lower CPU usage but Nvidia OGL backend has some CPU overhead.. it's currently right with current drivers?..
some perf figures..
Also I tested first XBva backends with 9.10 and flgrx 9.11 and didn't work with my 5850 but ok with 4850 cards..
so say with Catalyst 9.12 hotfix or upcoming 10.1 is working Xbva backend with 5850 card?.. also assuming not is a AMD issue or a issue of AMD VAAPI backend?..
More questions:
New cards like AMD 5850, Nvidia GT 240 and Intel graphics HD support dual stream decode, so is this exposed/supported in VAAPI.. i.e. the API is capable of exposing such hardware feature?..
Assuming no, are someone working to add that support to VAAPI?..
Also VDPAU exposes and accelerates dual HD streams in supported GPUs?.. if yes it would add parity to DXVA HD
Also assuming is no NDA thing can someone tell me if using Xbva VAAPI I can decode HD dual streams?.. i.e. XBva is exposing that capabilty..

Also more "futuristic" things:
Hi have seen exists a current H.264 MVC (multiview codec) reference encoder decoder..
Also Nokia ships a encoder decoder..
I'm would want to encode some samples to MVC..
Anyway I expect VDPAU with all the Nvidia motivation in 3D Vision would add support to it sometime this year..
Someone plans to patch/improve VAAPI to expose that support? i.e. exposing VDPAU MVC support via VAAPI..
Also someone knows if FFMPEG has this support in trunk or about some effort/patches into playing these codec..

Last question is more about expectations in GPU video encoding:
Nvidia ships CUVENC.DLL for Windows providing GPU H.264 video encoding..
Now seems with Windows 7 you have crossvendor via MFT, GPU H.264 video encoding for example, I think at least supported for Nvidia..
Someone knows if Nvidia is working or VDPAU exposes currently GPU video encoding..
if not I think VAAPI latest API at least exposes the interfaces, right?.. I think it's hard to add say a backend that uses x264 or H.264 reference encoder as an example..

seems Broadcom Crystal HD has provided open source drivers for decoding all HD formats for Linux so do you plan to add a VAAPI backend for these cards?
Also seems they provide open source drivers for MAC so last question is..
how hard is to get a MAC or Windows port of VAAPI?
Basically I would want to form same source using VAAPI have GPU decoding in Windows via a DXVA VAAPI backend and for MAC at least in Snow Leopard use their GPU decoding backend..
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ▼  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ►  March (9)
    • ►  February (15)
    • ▼  January (14)
      • GLES 2.0 (and 1.x) emulators..
      • OpenCL Nvidia DirectX (up to 11) extensions publis...
      • Some suggestions questions and problems I have..
      • Why I want a tablet more than a netbook..
      • More news:Found a nice blog with DirectCompute stu...
      • Integer GPU computing apps..
      • MsC project ideas!
      • 3d stereo: news
      • Thinking about renderants and bsgp..
      • Thinking about direct3d 11 vs OCL..
      • AMD news..
      • News learned this days..
      • GPU Computing calendar for Feb March 10!
      • Blog 2009 posts in PDF!
  • ►  2009 (125)
    • ►  December (51)
    • ►  November (53)
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile