在 QCS8550 开发板上，使用 NPU 对 llama-v2-7b-chat 进行推理失败。

强袭自由高达 2024-10-18 13:04:34

模型导出过程大致如下：
python -m qai_hub_models.models.llama_v2_7b_chat_quantized.export --skip-downloading --skip-profiling --skip-inferencing
python qai_hub_models/models/llama_v2_7b_chat_quantized/demo.py --on-device --hub-model-id {model_id} --device "Samsung Galaxy S24 (Family)"
python gen_ondevice_llama.py --hub-model-id {model_id} --output-dir./export --tokenizer-zip-path./tokenizer.zip --target-gen snapdragon-gen2 --target-os android
所有步骤均成功。
运行时提示以下错误：
[WARN] "Unable to initialize logging in backend extensions."
[INFO] "Using create From Binary"
[INFO] "Allocated total size = 300255744 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 0 : err 1009"
[ERROR] "Create From Binary FAILED!"
Failure to initialize model
ERROR at line 234: Failed to create the dialog.

...全文

1021 3 打赏收藏转发到动态举报

写回复

用AI写文章

3 条回复

切换为时间正序

请发表友善的回复…

发表回复

weixin_38498942 2024-10-21

打赏
举报

可以在android上运行这个命令:

cd /data/local/tmp/shared_bins_snapdragon-gen2_android
export LD_LIBRARY_PATH=./
chmod a+x genie-t2t-run
./genie-t2t-run -c ./htp-model-config-llama2-7b.json -p "What is the most popular cookie in the world?"

我们可以看到这个模型实际上是为S24使用的, S24是8Gen3,而 QCS8550是8Gen2, 你需要把你的模型切换为S23就可以跑起来了.