QRB5165上运行inception_v3的速度很低, 如何提升到官方宣传的范围?

weixin_32299171 2024-07-05 11:21:29

"QRB5165上运行inception_v3的速度很低, 如何提升到官方宣传的范围?

高通官方文档里宣称的是能运行到337 inf/s, 但是我们实际测试的结果只有90 inf/s. 这个是测试的命令:

sh-5.0# snpe-parallel-run --container dlc-model/inception-v3/inception_v3_quantized.dlc \

> --input_list dlc-model/inception-v3/target_raw_list.txt \

> --perf_profile burst --cpu_fallback false --enable_init_cache \

> --userbuffer_tf8 --profiling_level basic --use_aip \

> --perf_profile burst --cpu_fallback false --enable_init_cache \

> --userbuffer_tf8 --profiling_level basic --use_aip \

> --perf_profile burst --cpu_fallback false --enable_init_cache \

> --userbuffer_tf8 --profiling_level basic --use_aip \

> --perf_profile burst --cpu_fallback false --enable_init_cache \

> --userbuffer_tf8 --profiling_level basic --use_aip \

> --duration 10

The number of input image is: 1

CONTAINER SAVE SUCCESS

Saved container into archive successfully

PSNPE inputDimensions is: [ 1 299 299 3 ]

Batch size for the container is: 1

Input/output buffer number is: 1

Processing DNN input(s):

./dlc-model/inception-v3/chairs.raw

PSNPE start executing...

runtimes: aip_fixed8_tf aip_fixed8_tf aip_fixed8_tf aip_fixed8_tf CPU Fxp Mode: 0 - Mode :0- Number of images processed: 903

Build time: 0.099488 seconds.

Start timestamp of the first input loading (0.0s): 1717741692765047

End time of the last input loading: 0.005729

Start time of the first execution: 0.005755

Start time of the last getOutputCallback: 10.0094

Start time of the first getOutputCallback: -1.71774e+09

End time of the last getOutputCallback: 10.01

Execution Time: 10.0036

Execution Time + getOutput Time: 10.0043

LoadInput time + Execution Time + getOutput Time: 10.01

Mean output time: 10.01

90.2673 infs/sec

Successfully executed!"

...全文

1236 1 打赏收藏转发到动态举报

写回复

用AI写文章

1 条回复

切换为时间正序

请发表友善的回复…

发表回复

weixin_38498942 2024-07-08

打赏
举报

1, 使用命令snpe-dlc-graph-prepare对你的模型进行处理, 使模型可以运行在AIP+DSP上.
2, 进行尝试, 验证AIP和DSP的分布最合理的比例:
通过验证可以发现, 使用以下的命令:

snpe-throughput-net-run --container inception_v3_quantized.dlc --perf_profile burst --userbuffer_tf8 --use_dsp --container inception_v3_quantized.dlc 

--perf_profile burst --userbuffer_tf8 --use_aip --container inception_v3_quantized.dlc --perf_profile burst --userbuffer_tf8 --use_aip --container inc

eption_v3_quantized.dlc --perf_profile burst --userbuffer_tf8 --use_aip --container inception_v3_quantized.dlc --perf_profile burst --userbuffer_tf8 -

-use_aip --duration 20

可以跑出最快的速度, 结果大约是:

Output:

/prj/qct/webtech_hyd18/mlg_user_admin/qaisw_source_repo/qaisw_repo_release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-4.1.0/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

Processing DNN input(s):

Processing DNN input(s):

Processing DNN input(s):

Processing DNN input(s):

Processing DNN input(s):

[Thread 0 - dsp_fixed8_tf] 82.7196 infs/sec - Number of images processed: 1655 - Build time: 342806 microseconds - Elapsed time: 20009778 microseconds - Real time: 20007353 microseconds - Teardown time: 98853 microseconds - Batch : 1

[Thread 1 - aip_fixed8_tf] 62.4897 infs/sec - Number of images processed: 1250 - Build time: 42899 microseconds - Elapsed time: 20005349 microseconds - Real time: 20003308 microseconds - Teardown time: 14370 microseconds - Batch : 1

[Thread 2 - aip_fixed8_tf] 62.9518 infs/sec - Number of images processed: 1259 - Build time: 24320 microseconds - Elapsed time: 20001755 microseconds - Real time: 19999433 microseconds - Teardown time: 10066 microseconds - Batch : 1

[Thread 3 - aip_fixed8_tf] 62.7556 infs/sec - Number of images processed: 1256 - Build time: 13685 microseconds - Elapsed time: 20016268 microseconds - Real time: 20014137 microseconds - Teardown time: 103485 microseconds - Batch : 1

[Thread 4 - aip_fixed8_tf] 62.7268 infs/sec - Number of images processed: 1255 - Build time: 14063 microseconds - Elapsed time: 20009735 microseconds - Real time: 20007390 microseconds - Teardown time: 99701 microseconds - Batch : 1

Total throughput: 333.644 infs/sec