如何知道显卡支持最大的线程数呢?

hyslam 2009-07-24 04:45:14
有什么参数可以计算得到吗?
...全文
808 4 打赏 收藏 转发到动态 举报
写回复
用AI写文章
4 条回复
切换为时间正序
请发表友善的回复…
发表回复
hyslam 2009-07-24
  • 打赏
  • 举报
回复
再请教如何知道显卡的SM有多少个?
ljyha 2009-07-24
  • 打赏
  • 举报
回复
用这个看
cudaDeviceProp prop;
CUDA_SAFE_CALL(cudaGetDeviceProperties(&prop,0));
prop里就有支持的最大线程数吧
  • 打赏
  • 举报
回复
最大线程:
maxThreadsPerBlock

SM数:
multiProcessorCount*8.

更多参数如下:

cudaError_t cudaGetDeviceProperties ( struct cudaDeviceProp * prop,
int device
)

Returns in *prop the properties of device dev. The cudaDeviceProp structure is defined as:

struct cudaDeviceProp {
char name[256];
size_t totalGlobalMem;
size_t sharedMemPerBlock;
int regsPerBlock;
int warpSize;
size_t memPitch;
int maxThreadsPerBlock;
int maxThreadsDim[3];
int maxGridSize[3];
size_t totalConstMem;
int major;
int minor;
int clockRate;
size_t textureAlignment;
int deviceOverlap;
int multiProcessorCount;
int kernelExecTimeoutEnabled;
int integrated;
int canMapHostMemory;
int computeMode;
}

where:
name is an ASCII string identifying the device;
totalGlobalMem is the total amount of global memory available on the device in bytes;
sharedMemPerBlock is the maximum amount of shared memory available to a thread block in bytes; this amount is shared by all thread blocks simultaneously resident on a multiprocessor;
regsPerBlock is the maximum number of 32-bit registers available to a thread block; this number is shared by all thread blocks simultaneously resident on a multiprocessor;
warpSize is the warp size in threads;
memPitch is the maximum pitch in bytes allowed by the memory copy functions that involve memory regions allocated through cudaMallocPitch();
maxThreadsPerBlock is the maximum number of threads per block;
maxThreadsDim[3] contains the maximum size of each dimension of a block;
maxGridSize[3] contains the maximum size of each dimension of a grid;
clockRate is the clock frequency in kilohertz;
totalConstMem is the total amount of constant memory available on the device in bytes;
major, minor are the major and minor revision numbers defining the device's compute capability;
textureAlignment is the alignment requirement; texture base addresses that are aligned to textureAlignment bytes do not need an offset applied to texture fetches;
deviceOverlap is 1 if the device can concurrently copy memory between host and device while executing a kernel, or 0 if not;
multiProcessorCount is the number of multiprocessors on the device;
kernelExecTimeoutEnabled is 1 if there is a run time limit for kernels executed on the device, or 0 if not.
integrated is 1 if the device is an integrated (motherboard) GPU and 0 if it is a discrete (card) component
canMapHostMemory is 1 if the device can map host memory into the CUDA address space for use with cudaHostAlloc()/cudaHostGetDevicePointer(), or 0 if not;
computeMode is the compute mode that the device is currently in. Available modes are as follows:
cudaComputeModeDefault: Default mode - Device is not restricted and multiple threads can use cudaSetDevice() with this device.
cudaComputeModeExclusive: Compute-exclusive mode - Only one thread will be able to use cudaSetDevice() with this device.
cudaComputeModeProhibited: Compute-prohibited mode - No threads can use cudaSetDevice() with this device. Any errors from calling cudaSetDevice() with an exclusive (and occupied) or prohibited device will only show up after a non-device management runtime function is called. At that time, cudaErrorNoDevice will be returned.

Parameters:
prop - Properties for the specified device
device - Device number to get properties for

ljyha 2009-07-24
  • 打赏
  • 举报
回复
同样,那个结构体里的信息很多,或者你运行sdk里的deviceQuery例子

581

社区成员

发帖
与我相关
我的任务
社区描述
CUDA™是一种由NVIDIA推出的通用并行计算架构,该架构使GPU能够解决复杂的计算问题。 它包含了CUDA指令集架构(ISA)以及GPU内部的并行计算引擎。
社区管理员
  • CUDA编程社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧