Mandelbrot using OGL 4.0 features (double precision and precise keyword) ~ GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

http://dl.dropbox.com/u/1416327/mandeldouble.rar
above executable contains:
*uses gl_arb_gpu_shader5 in a float-float implementation with precise keyword for fixing agressive Nvidia compiler
*uses arg_gpu_shader_FP64 with doubles.. and fallbacks to doublepAMD on catalyst no ogl 4.0 drivers..
*normal mandelbrot implementation

on AMD 5850 with 1920x1080 res ati gl 4.0 drivers
I obtain:
*15fps using float-float approach..
*50fps using doubles with ati gl 4.0 drivers
*130fps using single precision
Note pre GL 4.0 drivers using doublepAMD attain 36fps on double precision now gl 4.0 drivers either doublepAMD or double attain 50fps..
You can deduce Gflop/s seeing glsl code.. it's very high..

I use #if 1 instead of #ifdef GL_arb_gpu_shader5 or shader_fp64 as then shaders work on Nvidia GL 3.3 drivers altough without doubles (instead double precision) and without precise keywork so float-float is still bad!
i.e. I force #pragma extension enable

Sorry for big exe is linked to Cg altough not usingly now it was used for correct disabling of optimization on Nvidia.. but it's not working now
program arguments are first pixel start horizontal offset for multimonitor setups second fullscreen or no then fragment and vertex shader and then zoom and x and y offset in mandelbrot..
It's used for showing a enough zoom for seeing diff between single and bigger precision either double precision or float-float.. last argument in use glsl or cg backend..
but as said cg is broken..

seems amd doesn't optimize so many as float-float without precise works ok!

AMD 5850 with ogl 4.0 drivers windows 7(with fps)
http://dl.dropbox.com/u/1416327/float-float.jpg
http://dl.dropbox.com/u/1416327/fp32.jpg
http://dl.dropbox.com/u/1416327/fp64.jpg
NVIDIA
bad float-float is similar to amd fp32 photo

fix for float-float-> use precise
I hope this goes well with Fermi OGL 4.0 drivers and also enable precise keywork for GL 3.0 hardware..
Cg has a trick for disabling optimizations so it's not needed..
search blog for more info..

vec2 dblsgl_add (vec2 x, vec2 y)
{
precise vec2 z;
float t1, t2, e;

t1 = x.y + y.y;
e = t1 - x.y;
t2 = ((y.y - e) + (x.y - (t1 - e))) + x.x + y.x;
z.y = e = t1 + t2;
z.x = t2 - (e - t1);
return z;
}

vec2 dblsgl_mul (vec2 x, vec2 y)
{
precise vec2 z;
float up, vp, u1, u2, v1, v2, mh, ml;

up = x.y * 4097.0;
u1 = (x.y - up) + up;
u2 = x.y - u1;
vp = y.y * 4097.0;
v1 = (y.y - vp) + vp;
v2 = y.y - v1;
//mh = __fmul_rn(x.y,y.y);
mh = x.y*y.y;
ml = (((u1 * v1 - mh) + u1 * v2) + u2 * v1) + u2 * v2;
//ml = (fmul_rn(x.y,y.x) + __fmul_rn(x.x,y.y)) + ml;

ml = (x.y*y.x + x.x*y.y) + ml;

mh=mh;
z.y = up = mh + ml;
z.x = (mh - up) + ml;
return z;
}

GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

Tuesday, 6 April 2010

Mandelbrot using OGL 4.0 features (double precision and precise keyword)

0 comments:

Post a Comment

Popular Posts

Blog Archive

About Me