graphics related
"A Hardware Processing Unit for Point Sets"
Simon Heinzle, Gael Guennebaud, Mario Botsch, and Markus Gross
"PCU: The Programmable Culling Unit"
"A Hardware Architecture for Surface Splatting"
Tim Weyrich, Simon Heinzle, Timo Aila, Daniel Fasnacht, Stephan Oetiker, Mario Botsch, Cyril Flaig, Simon Mall, Kaspar Rohrer, Norbert Felber, Hubert Kaeslin, and Markus Gross, in "ACM Trans. Graph. (Proc. Siggraph 2007)", August 2007
"PFU: Programmable Filtering Unit for Mobile Multimedia Applications on Graphics Hardware"
GPU computing based
Inter-Block GPU Communication via Fast Barrier Synchronization
Xiao, Shucai and Feng, Wu-chun (2009) Inter-Block GPU Communication via Fast Barrier Synchronization. Technical Report TR-09-19, Computer Science, Virginia Tech.
"Increasing Memory Miss Tolerance for SIMD Cores"
David Tarjan, Jiayuan Meng, Kevin Skadron, in "Proc. Supercomputing '09", August 2009
Dynamic detection of uniform and affine vectors
in GPGPU computations
Sylvain Collange1, David Defour1 and Yao Zhang2
Instructions to add
===================
Understanding the Efficiency of Ray Traversal on GPUs
hpg09 paper
Timo aila
2 warp-wide instructions will help:
ENUM (Prefix sum) enumerates the threads (inside a warp) for which a condition
is true and returns a unique index [0;M-1] to those threads
POPC (population count)
Returns the number threads for which a condition is true, i.e. M above
Improvements for raytracing:
With ENUM + POPC, in Fairy scene
Ambient occlusion +40
%Diffuse +80%
Iff not limited by memory speed
popc also util for stream compaction see paper hpg09 stream compacted on wide simd..
Atomic Vector Operations on Chip Multiprocessors
vector atomics seems Larrabe 3 stuff
Sunday, 13 December 2009
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment