不用cuda的cufft库实现fft的代码

sabin541 2015-11-16 05:07:32
我想实现fft,但不希望用cufft库,也就是自己写的意思,下面是一篇论文中提供的核函数。不过怎么调用核函数的部分没有找到,有谁知道吗?或者和下面完全不同的思路的也可以,高手指教!
static __global__ void FFT( Complex * DataIn, Complex *DataOut, const unsigned int N)
{
extern __shared__ Complex sdata[];
const unsigned int tid_in_block = threadIdx.x;
if(tid_in_block<N)
{
sdata[tid_in_block] = DataIn[tid_in_block];
sdata[tid_in_block + N/2] = DataIn[tid_in_block + N/2];
__syncthreads();
if(tid_in_block < N/2){
int p,q;
Complex Wn,Xp,XqWn;
float stage = 0.0;
for(int Ns=1; Ns<N; Ns=Ns*2)
{
p = tid_in_block/Ns*Ns*2 + tid_in_block%Ns;
q = p + Ns;
Wn = tex2D(texRef,tid_in_block,stage++);
XqWn = ComplexMul(sdata[q], Wn);
Xp = sdata[p];
sdata[p] = ComplexAdd(Xp, XqWn);
sdata[q] = ComplexSub(Xp, XqWn);
__syncthreads();
}
DataOut[p] = sdata[p];
DataOut[q] = sdata[q];
}
}
}
...全文
1373 2 打赏 收藏 转发到动态 举报
写回复
用AI写文章
2 条回复
切换为时间正序
请发表友善的回复…
发表回复
NickelCao 2016-10-16
  • 打赏
  • 举报
回复
他那个论文的纹理存储器来实现Wn的按表查找,以及 p q的值的 那段 应该是用来定位这个thread要计算的2个点。这2部分搞不大懂。你是在他基础上改的么,能提供一点代码参考下么?谢谢!
sabin541 2016-04-27
  • 打赏
  • 举报
回复
自己回复一下,上面上论文上的cuda代码到底也没搞明白,当点数是2的幂次方的时候,我用快速傅里叶变化来做的,然后进行cuda优化,当点数不是2的幂次方的时候,我用DFT的变形mixdft算法,然后进行cuda优化,具体代码就不贴了。总之最后速度还可以。
This document describes CUFFT, the NVIDIA® CUDA™ Fast Fourier Transform (FFT) library. The FFT is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued data sets. It is one of the most important and widely used numerical algorithms in computational physics and general signal processing. The CUFFT library provides a simple interface for computing parallel FFTs on an NVIDIA GPU, which allows users to leverage the floating-point power and parallelism of the GPU without having to develop a custom, CUDA FFT implementation. FFT libraries typically vary in terms of supported transform sizes and data types. For example, some libraries only implement radix-2 FFTs, restricting the transform size to a power of two. The CUFFT Library aims to support a wide range of FFT options efficiently on NVIDIA GPUs. This version of the CUFFT library supports the following features: I Complex and real-valued input and output I 1D, 2D, and 3D transforms I Batch execution for doing multiple transforms of any dimension in parallel I Transform sizes up to 64 million elements in single precision and up to 128 million elements in double precision in any dimension, limited by the available GPU memory I In-place and out-of-place transforms I Double-precision (64-bit floating point) on compatible hardware (sm1.3 and later) I Support for streamed execution, enabling asynchronous computation and data movement I FFTW compatible data layouts I Arbitrary intra- and inter-dimension element strides I Thread-safe API that can be called from multiple independent host threads

581

社区成员

发帖
与我相关
我的任务
社区描述
CUDA™是一种由NVIDIA推出的通用并行计算架构,该架构使GPU能够解决复杂的计算问题。 它包含了CUDA指令集架构(ISA)以及GPU内部的并行计算引擎。
社区管理员
  • CUDA编程社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧