不用cuda的cufft库实现fft的代码

sabin541 2015-11-16 05:07:32

我想实现fft，但不希望用cufft库，也就是自己写的意思，下面是一篇论文中提供的核函数。不过怎么调用核函数的部分没有找到，有谁知道吗？或者和下面完全不同的思路的也可以，高手指教！
static __global__ void FFT( Complex * DataIn, Complex *DataOut, const unsigned int N)
{
extern __shared__ Complex sdata[];
const unsigned int tid_in_block = threadIdx.x;
if(tid_in_block<N)
{
sdata[tid_in_block] = DataIn[tid_in_block];
sdata[tid_in_block + N/2] = DataIn[tid_in_block + N/2];
__syncthreads();
if(tid_in_block < N/2){
int p,q;
Complex Wn,Xp,XqWn;
float stage = 0.0;
for(int Ns=1; Ns<N; Ns=Ns*2)
{
p = tid_in_block/Ns*Ns*2 + tid_in_block%Ns;
q = p + Ns;
Wn = tex2D(texRef,tid_in_block,stage++);
XqWn = ComplexMul(sdata[q], Wn);
Xp = sdata[p];
sdata[p] = ComplexAdd(Xp, XqWn);
sdata[q] = ComplexSub(Xp, XqWn);
__syncthreads();
}
DataOut[p] = sdata[p];
DataOut[q] = sdata[q];
}
}
}

...全文

1373 2 打赏收藏转发到动态举报

写回复

2 条回复

切换为时间正序

请发表友善的回复…

发表回复

NickelCao 2016-10-16

打赏
举报

回复

他那个论文的纹理存储器来实现Wn的按表查找，以及 p q的值的那段应该是用来定位这个thread要计算的2个点。这2部分搞不大懂。你是在他基础上改的么，能提供一点代码参考下么？谢谢！

sabin541 2016-04-27

打赏
举报

回复

自己回复一下，上面上论文上的ｃｕｄａ代码到底也没搞明白，当点数是２的幂次方的时候，我用快速傅里叶变化来做的，然后进行ｃｕｄａ优化，当点数不是２的幂次方的时候，我用ＤＦＴ的变形ｍｉｘｄｆｔ算法，然后进行ｃｕｄａ优化，具体代码就不贴了。总之最后速度还可以。

This document describes CUFFT, the NVIDIA® CUDA™ Fast Fourier Transform (FFT) library. The FFT is a divide-and-conquer algorithm for eﬃciently computing discrete Fourier transforms of complex or real-valued data sets. It is one of the most important and widely used numerical algorithms in computational physics and general signal processing. The CUFFT library provides a simple interface for computing parallel FFTs on an NVIDIA GPU, which allows users to leverage the ﬂoating-point power and parallelism of the GPU without having to develop a custom, CUDA FFT implementation. FFT libraries typically vary in terms of supported transform sizes and data types. For example, some libraries only implement radix-2 FFTs, restricting the transform size to a power of two. The CUFFT Library aims to support a wide range of FFT options eﬃciently on NVIDIA GPUs. This version of the CUFFT library supports the following features: I Complex and real-valued input and output I 1D, 2D, and 3D transforms I Batch execution for doing multiple transforms of any dimension in parallel I Transform sizes up to 64 million elements in single precision and up to 128 million elements in double precision in any dimension, limited by the available GPU memory I In-place and out-of-place transforms I Double-precision (64-bit ﬂoating point) on compatible hardware (sm1.3 and later) I Support for streamed execution, enabling asynchronous computation and data movement I FFTW compatible data layouts I Arbitrary intra- and inter-dimension element strides I Thread-safe API that can be called from multiple independent host threads

Nvidia CUDA FFT test

CUFFT函数库的主要作用是实现高性能的傅里叶变换计算。傅里叶变换是一种将信号从时域转换到频域的数学变换，广泛应用于信号处理、图像处理、通信等领域。CUFFT函数库通过利用GPU的并行计算能力，可以加速大规模数据集上的傅里叶变换计算，提高计算效率。 CUFFT函数库提供了多种类型的傅里叶变换函数，包括一维、二维和三维的实数和复数傅里叶变换。它支持多种数据布局和数据类型，例如当精度实数和复数，双精度实数和复数，可以适应不同的应用场景。此外，CUFFT还提供了一些辅助函数，用于配置和管理傅里叶变换的参数。总结来说，CUFFT函数库的作用是在CUDA平台上实现高性能的傅里叶变换计算，加速信号处理和图像处理等领域中的相关算法。

matlab fft 代码 1D-4096-FFT-with-CUDA 实测FFT算法在Maxwell架构上恰好处于计算密集和访存密集两类算法之间，在做到足够优化的情况下，计算时间可以掩盖访存时间。本项目使用Stockham结构实现并行FFT算法，达到与cuFFT一致的速度。通过整合kernel，可实现比调用cuFFT更快的算法整体执行速度。另外cuFFT分配了用户不可访问的显存空间，本项目避免了这一问题。项目中测试了8192组4096点时域递增数的一维FFT计算。结果保存于一个txt文件，可用MATLAB对比验证。暂给出4096点FFT实现代码，文档请联系作者。运行环境为WIN7 x64 + CUDA 7.5。

CUFFT.jl：CUDA FFT库的包装器

581

社区成员

2,919

社区内容

发帖

与我相关

我的任务

社区管理员

加入社区

近7日
近30日
至今

加载中

查看更多榜单

社区公告

暂无公告

试试用AI创作助手写篇文章吧

+ 用AI写文章