GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 5 March 2010

CUDA 3.0 and Nexus in VS 2010, CUDA on FreeBSD 8.0 and much more!

Posted on 08:54 by Unknown
Interesting Nvidia threads:
1.Nexus: Unofficial Nexus / Visual Studio 2010 integration
http://forums.nvidia.com/index.php?showtopic=161096
->enables also cuda 3.x compiling with vs 2010!
this is awesome brings vs2010rc+cuda 3.0+nexus and also project templates for cuda and nexus apps!

It patches the vsvars32.bat file to read "Setting environment for using Microsoft Visual Studio 2008 x86 tools" instead of "Setting environment for using Microsoft Visual Studio 2010 x86 tools" to get around nvcc's Visual C++ version detection; otherwise it fails with this message: "nvcc fatal : nvcc cannot find a supported cl version. Only MSVC 8.0 and MSVC 9.0 are supported". It also creates the vcvarsamd64.bat file to make 64-bit builds work, or otherwise nvcc files with "nvcc fatal : Visual Studio configuration file '(null)' could not be found" (see this thread).
Nexus news: no DX9 and OGL in initial release:

DX10 is currently supported and DX11 will be available in the Beta 2 release which is scheduled for early March.DX10 and DX11 are the graphics APIs of choice for 1.0. Interesting that you feel OpenGL is favored, as full support for OpenGL won't be in the 1.0 release. The Beta 1 was focused on Compute - The Beta 2 will be released just before GDC and will bring full DX10, DX11 debugging and profiling into Visual Studio. This is a dream for game and graphics developers - perfhud on steroids. OpenGL support will come out *sometime* in late 2010. Pro version: (paid version) In addition to premium support, platform analysis (cpu+gpu correlated timeline) and advanced debugging capabilities will be available only in the pro version.
2.Feature Request: Support simultaneous native and CUDA debugging
I have noticed that it is not possible to debug CUDA and native code simultaneously on the same Visual Studio instance. I tried starting debugging through the Start CUDA Debugging option and then attaching the native debugger to the running process (inserting a 10 second sleep at the start of the program helped make this easier), but as soon as a breakpoint on a CUDA kernel is hit Visual Studio freezes.
I've been able to debug native and CUDA code on the same process simultaneously by having two Visual Studio instances open, and it works very well. I think it would be very valuable to be able to step through native code and device code on the same session, much like mixed debugging works with .NET.

2. Eclipse Plugin for CUDA and QT development

http://forums.nvidia.com/index.php?showtopic=160564

we developed a plugin for Eclipse, which comortably allows CUDA and QT development. It provides three toolchains, which can be used to compile CUDA and/or QT sources.

Features include:

- Error Parsing
- Dependency Calculation
- Automatic invocation of all tools
- ...

http://www.ai3.uni-bayreuth.de/software/eclipsecudaqt/index.php
Fastest CUDA reduction code to date! (following news last week of fastest matmul for GT200 and for AMD in C like language)
http://forums.nvidia.com/index.php?showtopic=160196

My simple but speedy reduction code (runs 106.4GB/s on GTX 295), 106.4/111.9=95.1% to the peak bandwidth good reduction code 5ms for 150m integers.
Testing with different input size I can see that your code is significantly slower if size is less than 16M, about the same speed with 32M and faster with more than 32M on the GTX 260.
Seems my code can beat SDK recution on every input size provided that the parameters M and K are properly choosed. Here is a detailed result for different M and K on different input sizes, and the performance for the SDK reduction with the same sizes are also listed.
 gtx 295 1 core.
[size=~512K]
My code (M=240, N=64, K=34): 60.3GB/s (23.3% faster)
SDK reduction (size=1<<19): 48.9GB/s
[size=~1M]
My code (M=240, N=64, K=69): 76.0GB/s (15.8% faster)
SDK reduction (size=1<<20): 65.6GB/s
[size=~2M]
My Code (M=240, N=64, K=137): 86.6GB/s (9.3% faster)
SDK reduction (size=1<<21): 79.2GB/s
[size=~4M]
My Code (M=240, N=64, K=273): 94.2GB/s (5.8% faster)
SDK reduction (size=1<<22): 89.0GB/s
[size=~8M]
My Code (M=240, N=64, K=546): 99.5GB/s (5.0% faster)
SDK reduction (size=1<<23): 94.8GB/s
[size=~16M]
My Code (M=240, N=64, K=1092): 103.1GB/s (5.5% faster)
SDK reduction (size=1<<24): 97.7GB/s
[size=~32M]
My Code (M=240, N=64, K=2184): 104.9GB/s (6.3% faster)
SDK reduction (size=1<<25): 98.7GB/s
[size=~64M]
My Code (M=480, N=64, K=2184): 105.8GB/s (7.3% faster)
SDK reduction (size=1<<26): 98.6GB/s
[size=~128M]
My Code (M=720, N=64, K=2912): 106.4GB/s (9.1% faster)
SDK reduction (size=1<<27): 97.5GB/s
*cuda_wrapper
The CUDA wrapper library provides means for an efficient resource sharing and resource protection on multi-user GPU clusters.It implements the following functionality:1) Virtualization of the physical GPU devices2) Ensuring NUMA affinity for GPUs
http://sourceforge.net/projects/cudawrapper/

It's suposed to show that allocating resources and freeing and allocating new ones show mem intact as left by last object so no privacy in this sense!

*Seems depth buffers/renderbuffers are not supported by GL interop in CUDA 3.1

so no tex where format is GL_DEPTH_COMPONENT32 in
cudaGraphicsGLRegisterImage (&resource, tex , GL_TEXTURE_2D , cudaGraphicsMapFlagsNone);
Also remember this is a post showing current color formats seems to be R,RG,RGB,RGBA in float,float16 and uint8 formats more or less similar to published OpenCL DX interop formats.. Good to write a tool that writes current formats  on OpenCL as there is a function for it for seeing if DX interop disables some formats on Nnvida hard at least.

*cuda on freebsd 8.0!
Inter-kernel communication is not supported under pain of me glaring at you really hard.
The recipe is:
FreeBSD 8.0 + NVidia driver 195.22 + CUDA 3.0
Also linprocfs and linsysfs should be mounted

uname -a
FreeBSD av429635.oops 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sun Feb 7 17:30:12 MSK 2010 root@av429635.oops:/usr/src/sys/i386/compile/ALECN80 i386

mount /compat/linux/proc/
mount /compat/linux/sys/

Well, not exactly "CUDA work on FreeBSD" - it's Linux program that use Linux libs under Linuxlator on FreeBSD that works.
Also I didn't try to compile Cuda programs yet, I've just launched programs pre-compiled on Linux (Debian)

also seems 190 drivers showed info with cudadeviceinfodrv but not created context.

*ipad: cpu is cortex a8 1ghz 1 core (same as 3gs) but stripped
gpu is power sgx variant.. but slow for pixel resolution (perhaps is 535 or 530 worse or 540 I doubt so)
so tegra2 is a lot better in cpu and gpus side seems
perhaps flash does`'t work by custom gpu altough using PowerVr IP.. as OMAP3 or 4 has been shown with Flash 10.1 video acceleration in MWC..

*"Optimus Works Perfectly With Intel Wireless Display (WiDi)"

A perfect notebook must have it!
 Still in WiDi mode you lose 3d 120hz via HDMI and also it hasn't HDCP so no Bluray..
I hope next Widi has HDCP and also HDMI 1.4 so 3d also work but will require double bandwith and seems to stress current wifi..
My question is with Optimus where Nvidia sends to Intel IGP if it have teoretically 3d screen built in with 120hz will work Nvidia 3d Vision and what about if it has DVI dual link output and I connect to 3d 120hz display? I suspect the answer is the same at least the technical hurdles seem to be.. and I think correctly is hard as is a PCI Express transfer and seems 1Gbytes/s is currently used for 60hz? so at least this would put more streess but entirelly doable if Intel IGP recognizes special 120hz modes of LCD and acts acordingly..
Also all requires Windows 7 (Optimus requires as it has two graphic drivers different IHVs at the same time and Widi seems to require 7 x64)
Also will work with macbookpro optimus laptops widi? It would require support from Intel as is using some MyWifi tech so must see.. Perhaps Apple waits for LightStage optical video outputs no wireless tech..
A dream notebook in graphics must have a d3d11 with 3d (so Fermi) also with standard 3d outputs so HDMI 1.4 outputs and 3d 120hz builtin screen and Optimus and possibly Widi better at least with HDCP support.. let's wait how long it takes to arrive to that I hope least than a year..
*Optimus has nvgpustateviewer tool that shows if Nvidia GPU is activated or not. Where to download?

intel widi no hdcp so no bluray viewing of course not 3d but optimus compatible now
similar a PERFECT 3D PROJECTOR
*720p at least
*Broad 3d support: 3d via hdmi 1.4, dlplink, 3d vision compatible
*hdcp support
so it can output 3d vision, PS3 3d games (HDMI 1.4) and Bluray 3d(HDMI 1.4+hdcp)
Now Acer and Viewsonic support all but HDMI 1.4.. so no ps3+bd3d support..

current projector is hdcp so bluray and 3d 120hz via duallink dvi or hdmi so no hdmi 1.4 3d spec support for projecting ps3 games bluray players output,etc..

iz3d 1.11 coming soon using catalyst 10.3 3d hooks for better multimon support (3d vision surround?) and possibly crossfire and also bringing d3d10 support for games

HYDRA in AMD chipset shown with GTX275+5870 are using improved Hydra 1.5 driver with better Mix mode.. it would be interesting to see how perf and compatibilty improves over time (i.e. see the hardware potential once all software issues remain solved/tuned..)

Regarding Widi:
"The software drivers that work with Intel® Wireless Display only apply to Microsoft Windows 7 64-bit*.
Intel® PROSet/Wireless WiFi Connection Utility for Windows 7 64-Bit for Intel Wireless Display
Requires special Proset driver:
Wireless Driver:
Drivers and management software for Microsoft Windows 7 64-bit OS*.
 NOTES:
http://www.intel.com/support/wireless/wtech/iwd/sb/CS-031109.htm

-The ZIP file is provided with Intel® My WiFi Technology enabled.
-Intel® My WiFi Technology has the following requirements:
-Intel® Centrino® Ultimate-N 6300, Intel® Centrino® Advanced-N 6200, Intel® Centrino® Advanced-N + WiMAX 6250, Intel® WiFi Link 1000, Intel® WiFi Link 5300, or Intel® WiFi Link 5100
-Minimum of Intel® PROSet/Wireless WiFi Connection Utility 13.0.0.0 on Microsoft Windows 7*
NOTE: Intel® Wireless Display requires one of the following products:
-Intel® Centrino® Ultimate-N 6300
-Intel® Centrino® Advanced-N 6200
-Intel® Centrino® Advanced-N+WiMAX 6250
NOTE: Features removed from this version:
Wake on Wireless LAN is not present in this version of the application.
the Intel® My WiFi Technology application is not supported for Windows Vista. This feature is available on Windows 7 only.
For the latest driver for the Intel® PROSet/Wireless WiFi Connection Utility (for Intel® Centrino® Advanced-N 6200). Intel recommends that you use the latest drivers for best performance.

intel media sdk 1.5rc

See http://software.intel.com/en-us/articles/intel-media-software-development-kit-intel-media-sdk/

Its going to support Intel Media SDK H.264 MVC codec of 3D Bluray either via GPU video processors or if they not support via optimized multithreaded SSE enabled code..
Also similar to CPU h.264 encoding support is going to be a 3D MVC encoder?


*Shader Model 5 (see Shader Model 5) vs OpenCL kernels:

Common (more or less):
Doubles with denorms
Reduced-precision reciprocal
Shader conversion instructions - fp16 to fp32 and vice versa
Structured buffer, which is a new type of buffer containing structured elements.
of which some things not present in OpenCL kernels
Resinfo on buffers
Count bits set instruction
Find first bit set instruction
Carry/Overflow handling
Bit reversal instructions for FFTs
Conditional Swap intrinsic
Also Dispatch indirect
remember it's about reading from GPU buffer the grid size to launch still requires CPU to launch the kernel..
but I doesn't require reading about 3 integers of grid which being so much size the PCI transaction still would be 1k? and add a lot of latency and add a CPU GPU synch point.. remember still no block size
at runtime kernel must be compiled for a fixed block size.
it' a evolution(?)  of Draw Indirect - Direct3D 10 implements DrawAuto, which takes content (generated by the GPU) and renders it (on the GPU). Direct3D 11 generalizes DrawAuto so that it can be called by a Compute Shader using DrawInstanced and DrawIndexedInstanced.

* gDEBugger CL is a new and exciting product; it brings all of gDEBugger's Debugging and Profiling capabilities to the OpenCL developer's world. gDEBugger CL, now in beta testing, supports all OpenCL implementations on Windows, Mac OS X and Linux. The upcoming gDEBugger iPhone version includes on-device debugging and profiling abilities, running in real-time and letting developers optimize their game on the actual iPhone device. gDEBugger iPhone displays invaluable inside information such as iPhone's GPU, CPU, graphic driver and operating system performance counters.
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Porting CUDA to OpenCL!
    Well so you want to port CUDA code to OpenCL: you are in AMD GPU competition of porting Cuda codes to opencl (see previous post) or you are ...
  • Megapost!
    Today fools{ *GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders.. *ati 5990 has 4 gpus in board.. *bulldozer benchmarks }end fools.. A...
  • About ATI and Nvidia drivers (OCL included)!
    Hi I have been investigating AMD and Nvidia drivers.. for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enab...
  • things found in CUDA forums
    Also some CUDA news: Mandelbulb stereo angalyph -> have to port to 3D Vision http://forums.nvidia.com/index.php?showtopic=150985&st=2...
  • opencl/opengl linux interop! seen in opencl cuda 3.0 sdk samples
    Following my OpenCL/OpenGL Window interop work: now has come to Linux  for Nvidia GPU computing registered developers via 195.17 driver! Als...
  • State of the blog..
    Sorry for the delay guys of posting code of Apple OpenCL demos port.. the blog has been with no updated for more than 2 weeks in this rapid ...
  • Optix and OpenCL SDKs with Visual Studio 2010
    Optix 1.0 ========= install cg download Cmake 2.80 cmake says error dumpbin not found and it is cuda doesn't work with vc2010 so copy pt...
  • CUDA 3.0 forums stuff!
    1.Getting CUBIN instead of ELF If you need the older text format, you can disable ELF cubins in nvcc.profile by changing "CUBINS_ARE_EL...
  • News from the web!
    Some things learned in AMD forums: 1.Why 3xxx no OpenCL: Compute shader mode is a hardware feature that did not exist in the HD38XX line of ...
  • Shaders: measuring perf, source translation and parsing different languages!
    Hi, I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source) For DX sh...

Blog Archive

  • ►  2013 (5)
    • ►  September (1)
    • ►  March (3)
    • ►  February (1)
  • ►  2012 (1)
    • ►  December (1)
  • ▼  2010 (46)
    • ►  July (4)
    • ►  May (1)
    • ►  April (3)
    • ▼  March (9)
      • What's for CUDA 3.1 and OpenGL 3.3/4.1!
      • raw data..
      • What's left in OpenGL 4.0? and more raw info..
      • GPU computing toys!
      • GPGPU Image support!
      • CUDA 3.0 and Nexus in VS 2010, CUDA on FreeBSD 8.0...
      • New in Nvidia 196.75 drivers!
      • GPU computing in a browser, and other news..
      • New findings and questions..
    • ►  February (15)
    • ►  January (14)
  • ►  2009 (125)
    • ►  December (51)
    • ►  November (53)
    • ►  October (21)
Powered by Blogger.

About Me

Unknown
View my complete profile