*Includes opencl profiler 1.1
New Fermi features:
surface functions (read/writable textures):
__device__ __surf{1D,2D}{read,write}{s,u,c}{1,2,4}
__device__ __surf{1D,2D}{read,write}l{1,2}
c=char
u=uint
s=ushort
Where is 3D surfaces i.e. 3D writable textures?
device functions:
extern __device__ void __threadfence_system(void);
extern __device__ double __ddiv_rn(double, double);
extern __device__ double __ddiv_rz(double, double);
extern __device__ double __ddiv_ru(double, double);
extern __device__ double __ddiv_rd(double, double);
extern __device__ double __drcp_rn(double);
extern __device__ double __drcp_rz(double);
extern __device__ double __drcp_ru(double);
extern __device__ double __drcp_rd(double);
extern __device__ double __dsqrt_rn(double);
extern __device__ double __dsqrt_rz(double);
extern __device__ double __dsqrt_ru(double);
extern __device__ double __dsqrt_rd(double);
extern __device__ unsigned int __ballot(int);
extern __device__ int __syncthreads_count(int);
extern __device__ int __syncthreads_and(int);
extern __device__ int __syncthreads_or(int);
extern __device__ long long int clock64(void);
extern __device__ float __fmaf_ieee_rn(float, float, float);
extern __device__ float __fmaf_ieee_rz(float, float, float);
extern __device__ float __fmaf_ieee_ru(float, float, float);
extern __device__ float __fmaf_ieee_rd(float, float, float);
Key changes in version cudaprof v3.0 beta with respect to v2.3:
1) New counters "NOP Triggers" are added in "Session Settings" Dialog on
the "Profiler counters" tab
2) New memory copy option "host mem transfer type" is added in "Session Settings"
dialog on "Other Options" tab. This specifies whether a memory transfers uses
"Pageable" or "Page-locked"
3) Device level summary plot :
One bar for each method is there. Bars are sorted in decreasing gpu time. Bar length
is proportional to cumulative gputime for a method across all contexts for a device.
4) Session level summary plot :
One bar for each device is there. Bar length is proportional to Gpu Utilization.
Gpu Utilization is the proportion of time when gpu was actually executing some method
to total time interval from gpu start to end. The values are presented in percentage.
5) User interface changes:
"Session Settings" Dialog :
a) Added a new device selection option on "Session" tab.
Based on this option the available counters can be selected on "Profiler Counter" tab.
In case of "multi-device" only counters supported by all devices can be selected.
b) All the counters on "Profiler Counter" tab and options on "Other Options" tab are shown
in tree view under different groups.
Saturday, 7 November 2009
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment