核函数执行导致显卡crash

Kill_Console 2014-10-15 09:16:57

写的一个求法向量核函数一执行就会屏幕闪一会儿然后蓝屏重启，有时候可以恢复过来。

// 类型

typedef struct

{

        ...

        float3* dev_normal;



        ...

        // CUDA流

        cudaStream_t stream;

} GPUplan;

分配空间：

checkCudaErrors(cudaMalloc((void**)&plan[i].dev_normal, IMG_HEIGHT * IMG_WIDTH * sizeof(float3)));

核函数：

__global__ void getNormalMapKernel(const ushort *dev_depth, float3 *normalMap, const float *K ,const float *T)

{

        int x = threadIdx.x;                                // 得到线程索引

        int y = blockIdx.x;                                        // 得到块索引



        float3 normal = make_float3(0.0, 0.0, 0.0);



        if (x < IMG_WIDTH - 1 && y < IMG_HEIGHT - 1)

        {

                ushort depth = dev_depth[y * IMG_WIDTH + x];

                ushort depth_right = dev_depth[y * IMG_WIDTH + x + 1];

                ushort depth_down = dev_depth[(y + 1) * IMG_WIDTH + x];



                if ( depth && absDevUshort(depth, depth_right) < 20 && absDevUshort(depth, depth_down) < 20)

                {

                        /* 计算摄像机坐标，并减去平移向量

                         * 

                         * cx = (x - px) * depth / fx - T1

                         * cy = (y - py) * depth / fy - T2

                         * cz = depth - T3

                         */

                        float3 cameraPos = make_float3((x - K[1]) * depth / K[0] - T[0], (y - K[3]) * depth / K[2] - T[1], depth - T[2]);

                        float3 cameraPosRight = make_float3((x + 1 - K[1]) * depth_right / K[0] - T[0], (y - K[3]) * depth_right / K[2] - T[1], depth_right - T[2]);

                        float3 cameraPosDown = make_float3((x - K[1]) * depth_down / K[0] - T[0], (y + 1 - K[3]) * depth_down / K[2] - T[1], depth_down - T[2]);



                        // 计算叉乘 (depth_right - depth) X (depth_down - depth) 并归一化

                        normal = normalize(cross(cameraPosRight - cameraPos, cameraPosDown - cameraPos));

                }

        }



        __syncthreads();

        normalMap[y * IMG_WIDTH + x] = normal;

}

启动函数：

extern "C" void launch_getNormalMapKernel(const ushort *dev_depth, float3 *normalMap, const float *K, const float *T, cudaStream_t &stream)

{

        // 块数和线程数

        dim3 dimGrid(IMG_HEIGHT, 1, 1);

        dim3 dimBlock(IMG_WIDTH, 1, 1);



        getNormalMapKernel<<<dimGrid, dimBlock, 0, stream>>>(dev_depth, normalMap, K, T);

        getLastCudaError("getNormalMapKernel() execution failed.\n");

}

dev_depth，K，T 是输入, normalMap 是输出
这三个输入在其他核函数中也进行了读操作，但没出现问题

我今天对核函数逐行注释，发现只要在 if 语句块中改变 normal 的值，核函数就会导致显卡驱动 crash，我也曾尝试将给数组赋值直接放在 if 语句中，也会导致显卡驱动 crash。但是如果在 if 语句中将那个 normalize()结果赋给另外一个局部变量，程序就不会崩溃，但我觉得两者的计算量是相同的。实在是没辙了，求大神们帮忙分析一下。

...全文

403 6 打赏收藏转发到动态举报

写回复

用AI写文章

6 条回复

切换为时间正序

请发表友善的回复…

发表回复

adagio_chen 2014-10-22

打赏
举报

引用 5 楼 cobralw 的回复:

[quote=引用 4 楼 seinedeparis 的回复:] [quote=引用 3 楼 cobralw 的回复:] [quote=引用 1 楼 seinedeparis 的回复:] normalMap 这个变量是Host端的指针吧？

不是。。cudaMalloc分配的[/quote] 你试试看用cudaMallocHost分配内存，cudaMemcpyAsync拷贝[/quote] 这个 normalMap分配完是空的，这个核函数就是用来给它赋值的，它只是一个计算中间量，不涉及到内存显存间拷贝，没必要用你说的这两个函数吧？[/quote] 因为我记得stream是要用异步内存分配的。不过我不是很肯定这一点

Kill_Console 2014-10-20