opencv相同的代码,C#的效率比C++快三倍?我该怎么设置C++项目才能提速?

Peki.L 2020-09-02 09:03:23
首先上C#代码,这是我以前初学时写得代码,功能是执行冈萨雷斯那本书上的逆谐波均值滤波器。
在x86 Debug模式下,处理同一张图片需要 45ms。

long t1 = Cv2.GetTickCount();
double Q = nudContraharmonicMeanFilterQ.Value.ToDouble();
int kernelSize = nudMeanFilter.Value.ToInt();
Mat source = Input.Clone();
source.ConvertTo(source, MatType.CV_8U);

Mat source_Padded = new Mat();
int radius = kernelSize / 2;
Cv2.CopyMakeBorder(source, source_Padded, radius, radius, radius, radius, BorderTypes.Replicate);
int paddedHeight = source_Padded.Rows;
int paddedWidth = source_Padded.Cols;

Mat[] mats = source_Padded.Split();
Mat[] mats1 = new Mat[mats.Length];
Parallel.For(0, mats.Length, channel =>
{
Mat padded = mats[channel];
padded.ConvertTo(padded, MatType.CV_64FC1);

Mat padded_bak = padded.Clone();
double* p = (double*)padded.DataPointer;
double* p1 = (double*)padded_bak.DataPointer;

Parallel.For(radius, paddedHeight - radius, j =>
{
for (int i = radius; i < paddedWidth - radius; i++)
{
double s1 = 0;
double s2 = 0;
for (int m = j - radius; m <= j + radius; m++)
{
for (int n = i - radius; n <= i + radius; n++)
{
double tmp = *(p1 + m * paddedWidth + n);
s1 += Math.Pow(tmp, Q + 1);
s2 += Math.Pow(tmp, Q);
}
}
s2 = s2 == 0 ? 0.0000001 : s2;
s1 /= s2;
*(p + j * paddedWidth + i) = s1;
}
});
mats1[channel] = padded[new Rect(radius, radius, source.Width, source.Height)];
});
Mat dst = new Mat();
Cv2.Merge(mats1, dst);
dst.ConvertTo(dst, MatType.CV_8U);

我用C++重写后,代码如下。
在x64 Release模式下,运行一次需要195 ms,比C#的要慢三倍。

int64 t1 = cv::getTickCount();

cv::Mat src = QImageToMat(image1);
std::vector<cv::Mat> nsrc;
cv::split(src, nsrc);

int channel = src.channels();
channel = channel > 3 ? 3 : channel;

cv::Mat dst;
cv::Mat padded;

int ks = ui.nudMeanFilter->value();
int radius = ks / 2;
cv::copyMakeBorder(src, padded, radius, radius, radius, radius, cv::BORDER_REPLICATE);
int srcW = src.cols;
int srcH = src.rows;
int paddedW = padded.cols;
int paddedH = padded.rows;

padded.convertTo(padded, CV_64F);
std::vector<cv::Mat> npadded;
cv::split(padded, npadded);

double Q = ui.nudContraharmonicMeanFilterQ->value();

cv::parallel_for_(cv::Range::Range(0, channel), [&](const cv::Range& range) {
for (int ch = range.start; ch < range.end; ch++)
{
cv::Mat _ori = npadded[ch];
cv::Mat _new(_ori.size(), _ori.type());

double* op = (double*)_ori.data;
double* np = (double*)_new.data;

cv::parallel_for_(cv::Range::Range(radius, paddedH - radius), [&](const cv::Range& range) {
for (int i = range.start; i < range.end; i++)
{
for (int j = radius; j < paddedW - radius; j++)
{
double s1 = 0;
double s2 = 0;
for (int m = i - radius; m <= i + radius; m++)
{
for (int n = j - radius; n <= j + radius; n++)
{
double tmp = *(op + m * paddedW + n);
s1 += std::pow(tmp, Q + 1);
s2 += std::pow(tmp, Q);
}
}
s1 /= s2 + 0.000000001;
*(np + i * paddedW + j) = s1;
}
}});
nsrc[ch] = cv::Mat(_new, cv::Rect(radius, radius, srcW, srcH));
}});
cv::merge(nsrc, dst);
dst.convertTo(dst, CV_8U);

int64 t2 = cv::getTickCount();


C#和C++都调用的同一个文件夹下的OpenCV静态库。
我的怀疑方向:
1、C++的项目设置不对,可能有些选项能优化执行速度
2、C#的并行能更智能得调配运算资源
3、我的C++代码写的有问题

这对我是一个很重要的问题,为什么C++的并行运算会如此差劲。
...全文
9914 4 打赏 收藏 转发到动态 举报
AI 作业
写回复
用AI写文章
4 条回复
切换为时间正序
请发表友善的回复…
发表回复
mmcanyu 2020-09-03
  • 打赏
  • 举报
回复
问题一般出在内存管理,以前sun公司经常拿java和c++比执行速度。C++每次new都是向系统申请内存,delete也是向系统撤销内存,这是低效耗时的。java和C#有自己的内存管理机制,简单点说,就是申请了内存不释放,等下次继续用。 C++要高效就用内存池相关技术。
Peki.L 2020-09-03
  • 打赏
  • 举报
回复
找到原因了,用Concurrency::parallel_for()替换cv::parallel_for_()即可,这个可能跟编译opencv时选择哪个并行框架有关。
Peki.L 2020-09-02
  • 打赏
  • 举报
回复
这是opencv的编译信息,大家看下有没有问题,是不是该有的优化选项没有开?或者是我上面那一段代码有问题?

General configuration for OpenCV 4.4.0 =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            D:/Project/opencv/opencv_files/opencv_contrib/modules
    Version control (extra):     unknown

  Platform:
    Timestamp:                   2020-08-19T10:54:43Z
    Host:                        Windows 10.0.19041 AMD64
    CMake:                       3.15.1
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1927

  CPU/HW features:
    Baseline:                    SSE SSE2
      requested:                 SSE2
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX
      requested:                 SSE4_1 SSE4_2 AVX FP16
      SSE4_1 (14 files):         + SSE3 SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSE3 SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSE3 SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (4 files):             + SSE3 SSSE3 SSE4_1 POPCNT SSE4_2 AVX

  C/C++:
    Built as dynamic libs?:      NO
    C++ standard:                11
    C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.27.29110/bin/Hostx64/x86/cl.exe  (ver 19.27.29111.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise  /arch:SSE /arch:SSE2 /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP  /MT /O2 /Ob2 /DNDEBUG 
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise  /arch:SSE /arch:SSE2 /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP  /MTd /Zi /Ob0 /Od /RTC1 
    C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.27.29110/bin/Hostx64/x86/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise  /arch:SSE /arch:SSE2 /MP   /MT /O2 /Ob2 /DNDEBUG 
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise  /arch:SSE /arch:SSE2 /MP /MTd /Zi /Ob0 /Od /RTC1 
    Linker flags (Release):      /machine:X86  /NODEFAULTLIB:atlthunk.lib /INCREMENTAL:NO  /NODEFAULTLIB:libcmtd.lib /NODEFAULTLIB:libcpmtd.lib /NODEFAULTLIB:msvcrtd.lib
    Linker flags (Debug):        /machine:X86  /NODEFAULTLIB:atlthunk.lib /debug /INCREMENTAL  /NODEFAULTLIB:libcmt.lib /NODEFAULTLIB:libcpmt.lib /NODEFAULTLIB:msvcrt.lib
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:          comctl32 gdi32 ole32 setupapi ws2_32
    3rdparty dependencies:       ittnotify libprotobuf zlib libjpeg-turbo libwebp libpng libtiff libjasper IlmImf quirc

  OpenCV modules:
    To be built:                 aruco bgsegm bioinspired calib3d ccalib core dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking video videoio videostab xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    datasets gapi java_bindings_generator python_bindings_generator python_tests world
    Disabled by dependency:      -
    Unavailable:                 alphamat cnn_3dobj cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv freetype hdf java js julia matlab ovis python2 python3 sfm ts viz
    Applications:                -
    Documentation:               NO
    Non-free algorithms:         YES

  Windows RT support:            NO

  GUI: 
    Win32 UI:                    YES
    VTK support:                 NO

  Media I/O: 
    ZLib:                        build (ver 1.2.11)
    JPEG:                        build-libjpeg-turbo (ver 2.0.5-62)
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.0.10)
    JPEG 2000:                   build Jasper (ver 1.900.1)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      NO
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.54.100)
      avformat:                  YES (58.29.100)
      avutil:                    YES (56.31.100)
      swscale:                   YES (5.5.100)
      avresample:                YES (4.0.0)
    GStreamer:                   NO
    DirectShow:                  YES

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Lapack:                      NO
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.5.1)

  OpenCL:                        YES (NVD3D11)
    Include path:                D:/Project/opencv/opencv_files/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python (for build):            C:/Anaconda/python.exe

  Install to:                    D:/Project/opencv/opencv_files/build_win_x86/install
-----------------------------------------------------------------
Peki.L 2020-09-02
  • 打赏
  • 举报
回复
c++如果用x86 OpenCV的话,速度会更慢。

4,269

社区成员

发帖
与我相关
我的任务
社区描述
OpenCV相关技术交流专区
计算机视觉人工智能opencv 技术论坛(原bbs) 广东省·深圳市
社区管理员
  • OpenCV
  • 幻灰龙
  • OpenCV中国团队
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告

OpenCV中国团队官方博客:https://blog.csdn.net/opencvchina

试试用AI创作助手写篇文章吧