See it!
Working with SiSoftware, AMD has optimized the performance of the OpenCL benchmarks for its GPU implementations, and for some problems has demonstrated significant performance advantages using AMD’s ATI Stream Software Development Kit (SDK) for OpenCL. When compared to NVIDIA’s CUDA running on its GeForce GTX 295 featuring two GPUs, the ATI Radeon™ HD 5870 graphics card with one GPU delivers up to 2.7 times faster performance on certain benchmark tests. For the "native float shader" results, the ATI Radeon 5870 posted a score of 1820 megapixels per second, compared to the GTX 295 at 680 megapixels per second!
Based on native float shader results of 1820 megapixels per second for AMD compared to 680 megapixels for NVIDIA on AMD Phenom™ II X4 940 processor-based system, 3 GHz, ASUSTek M3A79-T DELUXE, 4GB DDR2-1066, Windows® 7 64-bit Enterprise operating system
Driver: 8.680.0.0, OpenCL base build, SDK 2.0 Beta 4
nvidia Driver: 8.15.11.9038
I don't like using Nvidia 190.38 as 195 drivers I think improve OpenCL numerical intensive test by 30%.. (Sisoft Sandra 2010 uses Mandelbrot calculation as test which is compute bound not mem bound if programmed right..)
Also I don't like using multiGPU as multiGPU scaling seemed to be slow on Vista/Win7 on 190 OpenCL drivers. Nvidia listed as known issue in their GPU Computing SDK..
Alotugh assuming perfect scaling with MultiGPUs in OpenCL (i get 445mpixels with GTX 275) you should get around 850-900 Mpixels which is still 2X slower than ATI
2. Some time ago I mentioned this excellent tutorial about integer multiprecision with OpenCL..
Now it's better with 5870 results and results for both AMD and Nvidia OpenCL implementations in Windows and Linux.. you can find Windows backend is always slower in Nvidia and also possibly presents perf issues for AMD GPUs..
Also I have contacted author for source code and says coming soon..
Rember the available memory bandwidth in the GTX285 is 158 GB/s, and 153 GB/s for the HD5870.
Things learned:
Memory copy
===========
Note the difference in driver efficiency between Linux and Windows for the GTX285 board: the Linux curves rises earlier, meaning the latency of a call to clEnqueueCopyBuffer is much lower on Linux. At the end of the curve, the "asymptotic speed" (pure copy speed) is the same, at 66 GB/s as seen earlier.
The last thing to note on the diagram is the lack of proper support for clEnqueueCopyBuffer in this version of the ATI driver. The Linux version reaches 8.1 GB/s while the Windows version remains under a pathetic 3 GB/s. Hopefully, the next versions of the drivers will fix this, and match the GTX285 results as they should. The host-device copy speeds for the ATI board follow the same tendency.
Zero mem set
============
One big difference is that the behaviour of the ATI board differs significantly between Linux and Windows. Under Windows, it reaches a catastrophic 585 MB/s (does it actually compute something on the CPU? Maybe I installed mixed components of the driver...) while the Linux implementation shows some signs of activity and reaches 53 GB/s.
3.Seems Pyrit OpenCL is not working on ATI OpenCL backend..
AMD engineers have reproduced it and a working on it!
See: http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=123060&enterthread=y
4.Confirmed that next OpenCL Stream SDK release will have documentation
of r8xx arch and IL instructions (lowlevel CAL stuff).
Q:Any plans to document uav_raw_load_id, uav_raw_store_id? To publish R800 ISA? To answer some questions on this forum?
A:Our next release will have updated documentation that should cover all the newer hardware.
5. AMD newsletters:
Check ATI Stream Team Quarterly which is a regular newsletter to keep you up-to-date about ATI Stream.
http://amd-member.com/Newsletters/ATIStream/09Q3.html
Also check AMD Developer Central Newsletters
latest: http://amd-member.com/newsletters/DevCentral/0911.html
6. Geeks3d are providing a lot of GPU computing and OpenCL news:
latest: http://www.geeks3d.com/20091203/opencl-and-gpu-computing-industry-news/
7.Altough flash 10.1 beta and GPU acceleration on AMD with 9.11 seems not working (anandtech) seems ATI has working it. See:
Adobe Flash Player 10.1 Accelerated by ATI Stream Technology
http://www.youtube.com/watch?v=BTOOr2fQ4KA
0 comments:
Post a Comment