This are the things that are most wished for me to be fixed:
improvements:
0. Support kernels with a loop with a lot of MADS for testing peak flops: this gets long compile times-> kernel in CUDA compiles fast..
1. Ship an up to date ICD compatible with AMD one i.e fix ICD for detecting also AMD backend.. (or AMD ship fixed OCL iCD dll)..
2. expose
clGetGLContextInfoKHR(cl_context_properties *properties,
cl_gl_context_info param_name,
size_t param_value_size,
void *param_value,
size_t *param_value_size_ret)
is not in hearders, .lib and also not exported in khronos .dlls
3. Add DirectCompute ocean demo to OpenCL port in GTC09 (shown): i.e are the plans to publish OpenCL port of DirectCompute ocean demo shown in GTC OpenCL course..
4. Ship a driver compatible with new Nvidia DirectX interop extensions
5. fp_16 and 3d_image_write extensions?
ocl compiler bugs:
1. and bug in ATI AES sample.. see:
Thanks. Also, I've found a way to fix AESEncryptDecrypt sample to pass test on nvidia: just replace
CODE
unsigned char hiBitSet = (a & 0x80);
with
unsigned char hiBitSet = ((a>127)?128:0);
in AESEncryptDecrypt_Kernels.cl
It looks weird, but it works
2. fft apple lib see: http://forums.nvidia.com/index.php?showtopic=153544
Take a look at fft_base_kernels.h, see line 4 of "baseKernels", the complexMul line.
The define seems to be too complicated to the NVidia OpenCL compiler, I replaced the define by a function and it's now working:
CODE
float2 complexMul(float2 a,float2 B) { return (float2)(mad(-(a).y, (B).y, (a).x * (B).x), mad((a).y, (B).x, (a).x * (B).y));}
3. kernels without parameters don't compile
bugs in SDK:
1. samples get platformID but have to set parameter to NULL for working on non Nvidia imp (AMD imp.)..
or fix the function for setting to NULL at first..
2. Oclutils: getdevice(i) check num devices but returns wrong data if i=num devices due to incorrect check if(i>numdevices) error..
3. Shrutils: findfilepath if you put absolute path "c:\.." fails due to adding ".\" you have to add "" to add paths..
About DX11 OIT demo.. crahsing for me..
Hi have seen from AMD forums DX11 OIT demo..
Well the demo crashes with:
DXGI_ERROR_INVALID_CALL
Failed to resize swap chaing..
I have
Windows 7 x64
AMD 5850
Catalyst 9.12 hotfix and 10.1
DirectX runtime august 2009
What's the problem..
update: answer from author:
Hi rtfss!Some questions about VAAPI, VDPAU, XbVA?
Yes, there was a serios bug around BGRA/RGBA formats. I don't know why
it works on Windows 7 32-bit.
There fixed and slightly optimized demo demo (actually I further
optimize it as much as possible):
http://rapidshare.de/files/49006316/oit_dx11.zip.html
If it works, please let me know, and I re-upload it for public
community as soon as possible.
I want to know a lot about GPU video decode stuff in Linux.. i'm asking some questions..
Basically I have doubts about using/learning VDPAU or VAAPI depending of these features:
*OpenGL interop overhead..
*Dual HD stream decode support..
*(this is your opinion) possible future support by the API of H.264 MVC (multiview codec)..
First I have read some time ago that VAAPI added GL interop so now is returning all frames as OGL textures, right?.. I think I have read that
AMD is only working trough OGL backend but has lower CPU usage but Nvidia OGL backend has some CPU overhead.. it's currently right with current drivers?..
some perf figures..
Also I tested first XBva backends with 9.10 and flgrx 9.11 and didn't work with my 5850 but ok with 4850 cards..
so say with Catalyst 9.12 hotfix or upcoming 10.1 is working Xbva backend with 5850 card?.. also assuming not is a AMD issue or a issue of AMD VAAPI backend?..
More questions:
New cards like AMD 5850, Nvidia GT 240 and Intel graphics HD support dual stream decode, so is this exposed/supported in VAAPI.. i.e. the API is capable of exposing such hardware feature?..
Assuming no, are someone working to add that support to VAAPI?..
Also VDPAU exposes and accelerates dual HD streams in supported GPUs?.. if yes it would add parity to DXVA HD
Also assuming is no NDA thing can someone tell me if using Xbva VAAPI I can decode HD dual streams?.. i.e. XBva is exposing that capabilty..
Also more "futuristic" things:
Hi have seen exists a current H.264 MVC (multiview codec) reference encoder decoder..
Also Nokia ships a encoder decoder..
I'm would want to encode some samples to MVC..
Anyway I expect VDPAU with all the Nvidia motivation in 3D Vision would add support to it sometime this year..
Someone plans to patch/improve VAAPI to expose that support? i.e. exposing VDPAU MVC support via VAAPI..
Also someone knows if FFMPEG has this support in trunk or about some effort/patches into playing these codec..
Last question is more about expectations in GPU video encoding:
Nvidia ships CUVENC.DLL for Windows providing GPU H.264 video encoding..
Now seems with Windows 7 you have crossvendor via MFT, GPU H.264 video encoding for example, I think at least supported for Nvidia..
Someone knows if Nvidia is working or VDPAU exposes currently GPU video encoding..
if not I think VAAPI latest API at least exposes the interfaces, right?.. I think it's hard to add say a backend that uses x264 or H.264 reference encoder as an example..
seems Broadcom Crystal HD has provided open source drivers for decoding all HD formats for Linux so do you plan to add a VAAPI backend for these cards?
Also seems they provide open source drivers for MAC so last question is..
how hard is to get a MAC or Windows port of VAAPI?
Basically I would want to form same source using VAAPI have GPU decoding in Windows via a DXVA VAAPI backend and for MAC at least in Snow Leopard use their GPU decoding backend..
0 comments:
Post a Comment