CUDA中最多可以设置多少个blocks

BinGo 2012-03-09 08:27:15
linux下执行SDK的结果是:


Device 0: "Tesla T10 Processor"
CUDA Driver Version: 4.0
CUDA Runtime Version: 4.0
CUDA Capability Major/Minor version number: 1.3
Total amount of global memory: 4294770688 bytes
(30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.30 GHz
Concurrent copy and execution: Yes
# of Asynchronous Copy Engines: 1
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device is using TCC driver mode: No


根据:
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

每个块最多512个线程。

那么,每个网格最多可以有多少个块?
...全文
495 1 打赏 收藏 转发到动态 举报
写回复
用AI写文章
1 条回复
切换为时间正序
请发表友善的回复…
发表回复
BinGo 2012-03-09
  • 打赏
  • 举报
回复
The number of thread blocks in a
grid is usually dictated by the size of the data being processed or the number of
processors in the system, which it can greatly exceed.

580

社区成员

发帖
与我相关
我的任务
社区描述
CUDA™是一种由NVIDIA推出的通用并行计算架构,该架构使GPU能够解决复杂的计算问题。 它包含了CUDA指令集架构(ISA)以及GPU内部的并行计算引擎。
社区管理员
  • CUDA编程社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧