CUDA 3.0 and Nexus in VS 2010, CUDA on FreeBSD 8.0 and much more! ~ GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

Interesting Nvidia threads:
1.Nexus: Unofficial Nexus / Visual Studio 2010 integration
http://forums.nvidia.com/index.php?showtopic=161096
->enables also cuda 3.x compiling with vs 2010!
this is awesome brings vs2010rc+cuda 3.0+nexus and also project templates for cuda and nexus apps!

It patches the vsvars32.bat file to read "Setting environment for using Microsoft Visual Studio 2008 x86 tools" instead of "Setting environment for using Microsoft Visual Studio 2010 x86 tools" to get around nvcc's Visual C++ version detection; otherwise it fails with this message: "nvcc fatal : nvcc cannot find a supported cl version. Only MSVC 8.0 and MSVC 9.0 are supported". It also creates the vcvarsamd64.bat file to make 64-bit builds work, or otherwise nvcc files with "nvcc fatal : Visual Studio configuration file '(null)' could not be found" (see this thread).

Nexus news: no DX9 and OGL in initial release:

DX10 is currently supported and DX11 will be available in the Beta 2 release which is scheduled for early March.DX10 and DX11 are the graphics APIs of choice for 1.0. Interesting that you feel OpenGL is favored, as full support for OpenGL won't be in the 1.0 release. The Beta 1 was focused on Compute - The Beta 2 will be released just before GDC and will bring full DX10, DX11 debugging and profiling into Visual Studio. This is a dream for game and graphics developers - perfhud on steroids. OpenGL support will come out *sometime* in late 2010. Pro version: (paid version) In addition to premium support, platform analysis (cpu+gpu correlated timeline) and advanced debugging capabilities will be available only in the pro version.

2.Feature Request: Support simultaneous native and CUDA debugging

I have noticed that it is not possible to debug CUDA and native code simultaneously on the same Visual Studio instance. I tried starting debugging through the Start CUDA Debugging option and then attaching the native debugger to the running process (inserting a 10 second sleep at the start of the program helped make this easier), but as soon as a breakpoint on a CUDA kernel is hit Visual Studio freezes.

I've been able to debug native and CUDA code on the same process simultaneously by having two Visual Studio instances open, and it works very well. I think it would be very valuable to be able to step through native code and device code on the same session, much like mixed debugging works with .NET.

2. Eclipse Plugin for CUDA and QT development

http://forums.nvidia.com/index.php?showtopic=160564

we developed a plugin for Eclipse, which comortably allows CUDA and QT development. It provides three toolchains, which can be used to compile CUDA and/or QT sources.

Features include:

- Error Parsing
- Dependency Calculation
- Automatic invocation of all tools
- ...

http://www.ai3.uni-bayreuth.de/software/eclipsecudaqt/index.php

Fastest CUDA reduction code to date! (following news last week of fastest matmul for GT200 and for AMD in C like language)
http://forums.nvidia.com/index.php?showtopic=160196

My simple but speedy reduction code (runs 106.4GB/s on GTX 295), 106.4/111.9=95.1% to the peak bandwidth good reduction code 5ms for 150m integers.
Testing with different input size I can see that your code is significantly slower if size is less than 16M, about the same speed with 32M and faster with more than 32M on the GTX 260.
Seems my code can beat SDK recution on every input size provided that the parameters M and K are properly choosed. Here is a detailed result for different M and K on different input sizes, and the performance for the SDK reduction with the same sizes are also listed.
gtx 295 1 core.
[size=~512K]
My code (M=240, N=64, K=34): 60.3GB/s (23.3% faster)
SDK reduction (size=1<<19): 48.9GB/s
[size=~1M]
My code (M=240, N=64, K=69): 76.0GB/s (15.8% faster)
SDK reduction (size=1<<20): 65.6GB/s
[size=~2M]
My Code (M=240, N=64, K=137): 86.6GB/s (9.3% faster)
SDK reduction (size=1<<21): 79.2GB/s
[size=~4M]
My Code (M=240, N=64, K=273): 94.2GB/s (5.8% faster)
SDK reduction (size=1<<22): 89.0GB/s
[size=~8M]
My Code (M=240, N=64, K=546): 99.5GB/s (5.0% faster)
SDK reduction (size=1<<23): 94.8GB/s
[size=~16M]
My Code (M=240, N=64, K=1092): 103.1GB/s (5.5% faster)
SDK reduction (size=1<<24): 97.7GB/s
[size=~32M]
My Code (M=240, N=64, K=2184): 104.9GB/s (6.3% faster)
SDK reduction (size=1<<25): 98.7GB/s
[size=~64M]
My Code (M=480, N=64, K=2184): 105.8GB/s (7.3% faster)
SDK reduction (size=1<<26): 98.6GB/s
[size=~128M]
My Code (M=720, N=64, K=2912): 106.4GB/s (9.1% faster)
SDK reduction (size=1<<27): 97.5GB/s

*cuda_wrapper
The CUDA wrapper library provides means for an efficient resource sharing and resource protection on multi-user GPU clusters.It implements the following functionality:1) Virtualization of the physical GPU devices2) Ensuring NUMA affinity for GPUs
http://sourceforge.net/projects/cudawrapper/

It's suposed to show that allocating resources and freeing and allocating new ones show mem intact as left by last object so no privacy in this sense!

*Seems depth buffers/renderbuffers are not supported by GL interop in CUDA 3.1

so no tex where format is GL_DEPTH_COMPONENT32 in
cudaGraphicsGLRegisterImage (&resource, tex , GL_TEXTURE_2D , cudaGraphicsMapFlagsNone);
Also remember this is a post showing current color formats seems to be R,RG,RGB,RGBA in float,float16 and uint8 formats more or less similar to published OpenCL DX interop formats.. Good to write a tool that writes current formats on OpenCL as there is a function for it for seeing if DX interop disables some formats on Nnvida hard at least.

*cuda on freebsd 8.0!
Inter-kernel communication is not supported under pain of me glaring at you really hard.
The recipe is:
FreeBSD 8.0 + NVidia driver 195.22 + CUDA 3.0
Also linprocfs and linsysfs should be mounted

uname -a
FreeBSD av429635.oops 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sun Feb 7 17:30:12 MSK 2010 root@av429635.oops:/usr/src/sys/i386/compile/ALECN80 i386

mount /compat/linux/proc/
mount /compat/linux/sys/

Well, not exactly "CUDA work on FreeBSD" - it's Linux program that use Linux libs under Linuxlator on FreeBSD that works.
Also I didn't try to compile Cuda programs yet, I've just launched programs pre-compiled on Linux (Debian)

also seems 190 drivers showed info with cudadeviceinfodrv but not created context.

*ipad: cpu is cortex a8 1ghz 1 core (same as 3gs) but stripped
gpu is power sgx variant.. but slow for pixel resolution (perhaps is 535 or 530 worse or 540 I doubt so)
so tegra2 is a lot better in cpu and gpus side seems
perhaps flash does`'t work by custom gpu altough using PowerVr IP.. as OMAP3 or 4 has been shown with Flash 10.1 video acceleration in MWC..

*"Optimus Works Perfectly With Intel Wireless Display (WiDi)"

A perfect notebook must have it!
Still in WiDi mode you lose 3d 120hz via HDMI and also it hasn't HDCP so no Bluray..
I hope next Widi has HDCP and also HDMI 1.4 so 3d also work but will require double bandwith and seems to stress current wifi..
My question is with Optimus where Nvidia sends to Intel IGP if it have teoretically 3d screen built in with 120hz will work Nvidia 3d Vision and what about if it has DVI dual link output and I connect to 3d 120hz display? I suspect the answer is the same at least the technical hurdles seem to be.. and I think correctly is hard as is a PCI Express transfer and seems 1Gbytes/s is currently used for 60hz? so at least this would put more streess but entirelly doable if Intel IGP recognizes special 120hz modes of LCD and acts acordingly..
Also all requires Windows 7 (Optimus requires as it has two graphic drivers different IHVs at the same time and Widi seems to require 7 x64)
Also will work with macbookpro optimus laptops widi? It would require support from Intel as is using some MyWifi tech so must see.. Perhaps Apple waits for LightStage optical video outputs no wireless tech..
A dream notebook in graphics must have a d3d11 with 3d (so Fermi) also with standard 3d outputs so HDMI 1.4 outputs and 3d 120hz builtin screen and Optimus and possibly Widi better at least with HDCP support.. let's wait how long it takes to arrive to that I hope least than a year..
*Optimus has nvgpustateviewer tool that shows if Nvidia GPU is activated or not. Where to download?

intel widi no hdcp so no bluray viewing of course not 3d but optimus compatible now
similar a PERFECT 3D PROJECTOR
*720p at least
*Broad 3d support: 3d via hdmi 1.4, dlplink, 3d vision compatible
*hdcp support
so it can output 3d vision, PS3 3d games (HDMI 1.4) and Bluray 3d(HDMI 1.4+hdcp)
Now Acer and Viewsonic support all but HDMI 1.4.. so no ps3+bd3d support..

current projector is hdcp so bluray and 3d 120hz via duallink dvi or hdmi so no hdmi 1.4 3d spec support for projecting ps3 games bluray players output,etc..

iz3d 1.11 coming soon using catalyst 10.3 3d hooks for better multimon support (3d vision surround?) and possibly crossfire and also bringing d3d10 support for games

HYDRA in AMD chipset shown with GTX275+5870 are using improved Hydra 1.5 driver with better Mix mode.. it would be interesting to see how perf and compatibilty improves over time (i.e. see the hardware potential once all software issues remain solved/tuned..)

Regarding Widi:
"The software drivers that work with Intel® Wireless Display only apply to Microsoft Windows 7 64-bit*.
Intel® PROSet/Wireless WiFi Connection Utility for Windows 7 64-Bit for Intel Wireless Display
Requires special Proset driver:
Wireless Driver:
Drivers and management software for Microsoft Windows 7 64-bit OS*.
NOTES:
http://www.intel.com/support/wireless/wtech/iwd/sb/CS-031109.htm

-The ZIP file is provided with Intel® My WiFi Technology enabled.
-Intel® My WiFi Technology has the following requirements:
-Intel® Centrino® Ultimate-N 6300, Intel® Centrino® Advanced-N 6200, Intel® Centrino® Advanced-N + WiMAX 6250, Intel® WiFi Link 1000, Intel® WiFi Link 5300, or Intel® WiFi Link 5100
-Minimum of Intel® PROSet/Wireless WiFi Connection Utility 13.0.0.0 on Microsoft Windows 7*
NOTE: Intel® Wireless Display requires one of the following products:
-Intel® Centrino® Ultimate-N 6300
-Intel® Centrino® Advanced-N 6200
-Intel® Centrino® Advanced-N+WiMAX 6250
NOTE: Features removed from this version:
Wake on Wireless LAN is not present in this version of the application.
the Intel® My WiFi Technology application is not supported for Windows Vista. This feature is available on Windows 7 only.
For the latest driver for the Intel® PROSet/Wireless WiFi Connection Utility (for Intel® Centrino® Advanced-N 6200). Intel recommends that you use the latest drivers for best performance.

intel media sdk 1.5rc

See http://software.intel.com/en-us/articles/intel-media-software-development-kit-intel-media-sdk/

Its going to support Intel Media SDK H.264 MVC codec of 3D Bluray either via GPU video processors or if they not support via optimized multithreaded SSE enabled code..
Also similar to CPU h.264 encoding support is going to be a 3D MVC encoder?

*Shader Model 5 (see Shader Model 5) vs OpenCL kernels:

Common (more or less):
Doubles with denorms
Reduced-precision reciprocal
Shader conversion instructions - fp16 to fp32 and vice versa
Structured buffer, which is a new type of buffer containing structured elements.
of which some things not present in OpenCL kernels
Resinfo on buffers
Count bits set instruction
Find first bit set instruction
Carry/Overflow handling
Bit reversal instructions for FFTs
Conditional Swap intrinsic
Also Dispatch indirect
remember it's about reading from GPU buffer the grid size to launch still requires CPU to launch the kernel..
but I doesn't require reading about 3 integers of grid which being so much size the PCI transaction still would be 1k? and add a lot of latency and add a CPU GPU synch point.. remember still no block size
at runtime kernel must be compiled for a fixed block size.
it' a evolution(?) of Draw Indirect - Direct3D 10 implements DrawAuto, which takes content (generated by the GPU) and renders it (on the GPU). Direct3D 11 generalizes DrawAuto so that it can be called by a Compute Shader using DrawInstanced and DrawIndexedInstanced.

* gDEBugger CL is a new and exciting product; it brings all of gDEBugger's Debugging and Profiling capabilities to the OpenCL developer's world. gDEBugger CL, now in beta testing, supports all OpenCL implementations on Windows, Mac OS X and Linux. The upcoming gDEBugger iPhone version includes on-device debugging and profiling abilities, running in real-time and letting developers optimize their game on the actual iPhone device. gDEBugger iPhone displays invaluable inside information such as iPhone's GPU, CPU, graphic driver and operating system performance counters.

GPU computing Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information

Friday, 5 March 2010

CUDA 3.0 and Nexus in VS 2010, CUDA on FreeBSD 8.0 and much more!

0 comments:

Post a Comment

Popular Posts

Blog Archive

About Me