哪位见过下面这样并行的BFS算法(伪代码),帮忙解释一下吧
第三行代码:{r, r_end} = Qvfront[cta_offset + thread_id]; 该怎样理解?{r, r_end}是什么东西?
Algorithm 5. GPU pseudo-code for a warp-based, strip-mined neighbor-gathering approach.
Input: Vertex-frontier Qvfront, column-indices array C, and the offset cta_offset for the current tile within Qvfront
Functions: WarpAny(predi) returns true if any predi is set for any thread ti within the warp.
1 GatherWarp(cta_offset, Qvfront, C) {
2 volatile shared comm[WARPS][3];
3 {r, r_end} = Qvfront[cta_offset + thread_id];
4 while (WarpAny(r_end – r)) {
5
6 // vie for control of warp
7 if (r_end – r)
8 comm[warp_id][0] = lane_id;
9
10 // winner describes adjlist
11 if (comm[warp_id][0] == lane_id) {
12 comm[warp_id][1] = r;
13 comm[warp_id][2] = r_end;
14 r = r_end;
15 }
16
17 // strip-mine winner’s adjlist
18 r_gather = comm[warp_id][1] + lane_id;
19 r_gather_end = comm[warp_id][2];
20 while (r_gather < r_gather_end) {
21 volatile neighbor = C[r_gather];
22 r_gather += WARP_SIZE;
23 }
24 }
25 }
对该算法的描述:
Coarse-grained, warp-based gathering. Threads enlist the entire warp to assist in gathering. As described in
Algorithm 5, each thread attempts to vie for control of its warp by writing its thread-identifier into a single word shared by all threads of that warp. Only one write will succeed, thus determining which is allowed to subsequently enlist the warp as a whole to read its corresponding neighbors. This process repeats for every warp until its threads have all had their adjacent neighbors gathered.
看的头都晕了。这个算法出自论文《High Performance and Scalable GPU Graph Traversal》,有没有哪位读过这篇论文?