2,851
社区成员




在 QCS8550 开发板上,使用 NPU 对 llama-v2-7b-chat 进行推理失败。
模型导出过程大致如下:
python -m qai_hub_models.models.llama_v2_7b_chat_quantized.export --skip-downloading --skip-profiling --skip-inferencing
python qai_hub_models/models/llama_v2_7b_chat_quantized/demo.py --on-device --hub-model-id {model_id} --device "Samsung Galaxy S24 (Family)"
python gen_ondevice_llama.py --hub-model-id {model_id} --output-dir./export --tokenizer-zip-path./tokenizer.zip --target-gen snapdragon-gen2 --target-os android
所有步骤均成功。
运行时提示以下错误:
[WARN] "Unable to initialize logging in backend extensions."
[INFO] "Using create From Binary"
[INFO] "Allocated total size = 300255744 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 0 : err 1009"
[ERROR] "Create From Binary FAILED!"
Failure to initialize model
ERROR at line 234: Failed to create the dialog.
可以在android上运行这个命令:
cd /data/local/tmp/shared_bins_snapdragon-gen2_android
export LD_LIBRARY_PATH=./
chmod a+x genie-t2t-run
./genie-t2t-run -c ./htp-model-config-llama2-7b.json -p "What is the most popular cookie in the world?"
我们可以看到这个模型实际上是为S24使用的, S24是8Gen3,而 QCS8550是8Gen2, 你需要把你的模型切换为S23就可以跑起来了.