QRB5165上运行inception_v3的速度很低, 如何提升到官方宣传的范围?

weixin_32299171 2024-07-05 11:21:29

"QRB5165上运行inception_v3的速度很低, 如何提升到官方宣传的范围?

高通官方文档里宣称的是能运行到337 inf/s, 但是我们实际测试的结果只有90 inf/s. 这个是测试的命令:

sh-5.0# snpe-parallel-run --container dlc-model/inception-v3/inception_v3_quantized.dlc \

> --input_list dlc-model/inception-v3/target_raw_list.txt \

> --perf_profile burst --cpu_fallback false --enable_init_cache \

> --userbuffer_tf8 --profiling_level basic --use_aip \

> --perf_profile burst --cpu_fallback false --enable_init_cache \

> --userbuffer_tf8 --profiling_level basic --use_aip \

> --perf_profile burst --cpu_fallback false --enable_init_cache \

> --userbuffer_tf8 --profiling_level basic --use_aip \

> --perf_profile burst --cpu_fallback false --enable_init_cache \

> --userbuffer_tf8 --profiling_level basic --use_aip \

> --duration 10

The number of input image is: 1

CONTAINER SAVE SUCCESS

Saved container into archive successfully

PSNPE inputDimensions is: [ 1 299 299 3 ]

Batch size for the container is: 1

Input/output buffer number is:  1

Processing DNN input(s):

./dlc-model/inception-v3/chairs.raw

PSNPE start executing...

runtimes: aip_fixed8_tf aip_fixed8_tf aip_fixed8_tf aip_fixed8_tf CPU Fxp Mode: 0 - Mode :0- Number of images processed: 903

 Build time: 0.099488 seconds.

 Start timestamp of the first input loading (0.0s): 1717741692765047

 End time of the last input loading: 0.005729

 Start time of the first execution: 0.005755

 Start time of the last getOutputCallback: 10.0094

 Start time of the first getOutputCallback: -1.71774e+09

 End time of the last getOutputCallback: 10.01

 Execution Time: 10.0036

 Execution Time + getOutput Time: 10.0043

 LoadInput time + Execution Time + getOutput Time: 10.01

 Mean output time: 10.01

90.2673 infs/sec

Successfully executed!"

...全文
1226 1 打赏 收藏 转发到动态 举报
AI 作业
写回复
用AI写文章
1 条回复
切换为时间正序
请发表友善的回复…
发表回复
weixin_38498942 2024-07-08
  • 打赏
  • 举报
回复

1, 使用命令snpe-dlc-graph-prepare对你的模型进行处理, 使模型可以运行在AIP+DSP上.
2, 进行尝试, 验证AIP和DSP的分布最合理的比例:
通过验证可以发现, 使用以下的命令:

snpe-throughput-net-run --container inception_v3_quantized.dlc --perf_profile burst --userbuffer_tf8 --use_dsp --container inception_v3_quantized.dlc 

--perf_profile burst --userbuffer_tf8 --use_aip --container inception_v3_quantized.dlc --perf_profile burst --userbuffer_tf8 --use_aip --container inc

eption_v3_quantized.dlc --perf_profile burst --userbuffer_tf8 --use_aip --container inception_v3_quantized.dlc --perf_profile burst --userbuffer_tf8 -

-use_aip --duration 20


可以跑出最快的速度, 结果大约是:

Output:

/prj/qct/webtech_hyd18/mlg_user_admin/qaisw_source_repo/qaisw_repo_release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-4.1.0/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

Processing DNN input(s):

Processing DNN input(s):

Processing DNN input(s):

Processing DNN input(s):

Processing DNN input(s):

[Thread 0 - dsp_fixed8_tf] 82.7196 infs/sec - Number of images processed: 1655 - Build time: 342806 microseconds - Elapsed time: 20009778 microseconds - Real time: 20007353 microseconds - Teardown time: 98853 microseconds - Batch : 1

[Thread 1 - aip_fixed8_tf] 62.4897 infs/sec - Number of images processed: 1250 - Build time: 42899 microseconds - Elapsed time: 20005349 microseconds - Real time: 20003308 microseconds - Teardown time: 14370 microseconds - Batch : 1

[Thread 2 - aip_fixed8_tf] 62.9518 infs/sec - Number of images processed: 1259 - Build time: 24320 microseconds - Elapsed time: 20001755 microseconds - Real time: 19999433 microseconds - Teardown time: 10066 microseconds - Batch : 1

[Thread 3 - aip_fixed8_tf] 62.7556 infs/sec - Number of images processed: 1256 - Build time: 13685 microseconds - Elapsed time: 20016268 microseconds - Real time: 20014137 microseconds - Teardown time: 103485 microseconds - Batch : 1

[Thread 4 - aip_fixed8_tf] 62.7268 infs/sec - Number of images processed: 1255 - Build time: 14063 microseconds - Elapsed time: 20009735 microseconds - Real time: 20007390 microseconds - Teardown time: 99701 microseconds - Batch : 1

Total throughput: 333.644 infs/sec

基本可以达到官方宣传的330+的输出.

3,304

社区成员

发帖
与我相关
我的任务
社区描述
本论坛以AI、WoS 、XR、IoT、Auto、生成式AI等核心板块组成,为开发者提供便捷及高效的学习和交流平台。 高通开发者专区主页:https://qualcomm.csdn.net/
人工智能物联网机器学习 技术论坛(原bbs) 北京·东城区
社区管理员
  • csdnsqst0050
  • chipseeker
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧