高通 QCS6490 平台上 YOLOv8 系列模型的性能测试

企业官方账号

2025-07-14 13:35:00

前言

随着边缘智能计算与计算机视觉技术的深度融合，终端设备对实时感知、智能决策的需求日益迫切，而高性能硬件平台与高效 AI 模型的协同成为推动这一领域突破的核心动力。高通 QC6490 平台作为边缘计算领域的标杆性产品，凭借其卓越的综合性能成为众多智能终端的理想选择。该平台采用先进的 6nm 制程工艺，搭载八核 Kryo 670 CPU（包含 4 个高性能 Cortex-A78 核心与 4 个能效比优异的 Cortex-A55 核心），在实现 2.7GHz 高频算力的同时，保持了出色的功耗平衡；其集成的第 6 代高通 AI Engine，配合 Hexagon 处理器与融合 AI 加速器，可提供高达 12 TOPS 的 AI 算力，为复杂模型的实时推理提供强大支撑。此外，QC6490 支持企业级 Wi-Fi 6/6E，具备多千兆位的数据传输速率与超低延迟特性，搭配高性能三重 ISP（可支持 5 路摄像头并发及 192MP 图像捕捉），为多源视觉数据的高效处理与传输奠定了坚实基础。

在具体应用场景中，QC6490 的性能优势得以充分释放，尤其在机器人与无人机领域展现出不可替代的价值。在机器人领域：于服务机器人中，其强大的实时计算能力可快速解析激光雷达、摄像头等传感器采集的环境数据，结合 SLAM 算法与 AI 模型实现精准的障碍物规避、动态路径规划，同时通过自然语言处理与人机交互算法，为用户提供流畅的服务体验；在工业机器人场景下，QC6490 能够驱动视觉系统对生产线产品进行高速质检，凭借高算力支持的缺陷检测模型，在毫秒级时间内完成产品表面瑕疵、尺寸偏差等问题的识别，配合低延迟通信能力与机械臂控制系统联动，显著提升生产效率与质量管控水平。在无人机领域：QC6490 赋能无人机突破传统飞行限制，在电力巡检中，可通过多摄像头同步采集输电线路图像，实时运行目标检测与缺陷识别模型，精准定位绝缘子破损、导线断股等隐患；在农业植保场景下，能快速处理农田航拍图像，识别作物长势、病虫害区域，并结合飞行控制系统实现变量施药；而在测绘任务中，其高效的数据处理能力可支持实时拼接航拍图像生成三维地图，Wi-Fi 6/6E 的高速传输特性则确保关键数据能即时回传至地面站，保障作业安全性与时效性。

作为目标检测领域的最新标杆，YOLOv8 模型由 Ultralytics 团队开发，在继承 YOLO 系列 “快速精准” 核心优势的基础上，通过网络架构的深度优化实现了性能跃升。其采用全新的 CSPDarknet53 改进版作为 backbone，结合 PAN-FPN 结构增强多尺度特征融合能力，同时引入动态 Task-Aligned Assigner 标签分配策略与 CIoU 损失函数，在精度与速度的平衡上达到新高度。YOLOv8 支持目标检测、实例分割、姿态估计等多任务，且提供 n/s/m/l/x 等不同尺度模型，可灵活适配从嵌入式设备到云端服务器的多样化部署需求。在应用场景中，YOLOv8 已广泛渗透至安防监控（实时异常行为检测）、智能交通（车辆识别与流量统计）、工业质检（零部件缺陷检测）、医疗影像（病灶识别）等领域，凭借其高效的推理性能与鲁棒性，成为边缘端视觉智能解决方案的核心组件。

鉴于高通 QC6490 平台在边缘计算领域的硬件优势，以及 YOLOv8 模型在视觉任务中的技术领先性，二者的协同性能对机器人、无人机等终端设备的智能化升级具有关键影响。本文聚焦于高通 QC6490 平台上 YOLOv8 系列模型的性能测试，旨在通过系统的实验与分析，揭示不同尺度 YOLOv8 模型在该平台上的推理速度、精度表现及资源占用情况，为边缘智能设备的模型选型与部署优化提供数据支撑，进而推动视觉 AI 技术在机器人交互、无人机巡检等场景的深度落地。

高通6490硬件介绍

深度解析 QCS6490：硬件性能全揭秘-CSDN博客

YOLOv8模型性能指标

YOLOv8系列性能指标-QCS6490
模型尺寸640*640	CPU		NPU QNN2.31
模型尺寸640*640	FP32		INT8
YOLOv8n	215.19 ms	4.65 FPS	4.76 ms	210.08 FPS
YOLOv8s	590.92 ms	1.69 FPS	7.84 ms	127.55 FPS
YOLOv8m	1310.2 ms	0.76 FPS	17.41 ms	57.44 FPS
YOLOv8l	2697.88 ms	0.37 FPS	29.71 ms	33.66 FPS
YOLOv8x	4022.64 ms	0.25 FPS	47.35 ms	21.12 FPS

点击链接可以下载YOLOv8系列模型的pt格式，其他模型尺寸可以通过AIMO转换模型，并修改下面参考代码中的model_size测试即可

模型转换与部署

模型下载网站： YOLO-Ultralytics YOLO 文档

模型转换网站：AI Model Optimizer

模型广场：端侧AI生态门户

接口参考文件：AidLite Python 接口文档 | APLUX Doc Center

模型查看工具： Netron

（一）将pt格式转换为onnx格式

Step1：升级pip版本为25.1.1

python3.10 -m pip install --upgrade pip
pip -V
aidlux@aidlux:~/aidcode$ pip -V
pip 25.1.1 from /home/aidlux/.local/lib/python3.10/site-packages/pip (python 3.10)

Step2：安装Yolov8和onnx

pip install yolov8 onnx

Step3:设置yolo命令的环境变量

方法 1：临时添加环境变量（立即生效）

在终端中执行以下命令，将 ~/.local/bin 添加到当前会话的环境变量中

export PATH="$PATH:$HOME/.local/bin"

说明：此操作仅对当前终端会话有效，关闭终端后失效。
验证：执行 yolo --version，若输出版本号（如 0.0.2），则说明命令已生效。

方法 2：永久添加环境变量（长期有效）

echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc
source ~/.bashrc  # 使修改立即生效

验证：执行 yolo --version，若输出版本号（如 0.0.2），则说明命令已生效。

测试环境中安装yolo版本为8.3.152

提示：如果遇到用户组权限问题，可以忽悠，因为yolo命令会另外构建临时文件，也可以执行下面命令更改用户组，执行后下面的警告会消失：

sudo chown -R aidlux:aidlux ~/.config/
sudo chown -R aidlux:aidlux ~/.config/Ultralytics

可能遇见的报错如下：

WARNING ⚠️ user config directory '/home/aidlux/.config/Ultralytics' is not writeable, defaulting to '/tmp' or CWD.Alternatively you can define a YOLO_CONFIG_DIR environment variable for this path.

Step4：将Yolov8系列模型的pt格式转换为onnx格式

新建一个python文件，命名自定义即可，用于模型转换以及导出：

from ultralytics import YOLO

# 加载同级目录下的.pt模型文件
model = YOLO('./yolov8x.pt')  # 替换为实际模型文件名

# 导出ONNX配置参数
export_params = {
    'format': 'onnx',
    'opset': 12,          # 推荐算子集版本
    'simplify': True,     # 启用模型简化
    'dynamic': False,     # 固定输入尺寸
    'imgsz': 640,         # 标准输入尺寸
    'half': False         # 保持FP32精度
}

# 执行转换并保存到同级目录
model.export(**export_params)

执行该程序完成将pt模型导出为onnx模型

提示:Yolov8n,Yolov8s,Yolov8m,Yolov8l替换代码中Yolov8x即可；

（二）使用AIMO将onnx格式转换高通NPU可以运行的模型格式

Step1：选择模型优化，模型格式选择onnx格式上传模型

Step2：选择芯片型号以及目标框架，这里我们选择QCS6490+Qnn2.31

Step3：点击查看模型，使用Netron查看模型结构，进行输入输出的填写

使用Netron工具查看onnx模型结构，选择剪枝位置

/model.22/Mul_2_output_0

/model.22/Sigmoid_output_0

参考上图中红色框部分填写，其他不变，注意开启自动量化功能，AIMO更多操作查看使用说明或参考AIMO平台

Step4：接下来进行提交即可，转换完成后将目标模型文件下载，解压缩后其中的.bin.aidem文件即为模型文件

qnn测试代码

这段代码是一个基于高通 QC6490 平台的 YOLOv8 目标检测推理程序，主要实现了在边缘设备上高效运行深度学习模型进行实时目标检测的功能。其核心功能和特点包括：

主要功能

模型配置与初始化：支持加载 YOLOv8 量化模型（QNN/SNPE 格式），并配置 DSP 硬件加速。
图像预处理：通过 letterbox 算法调整图像尺寸，保持原始宽高比，同时进行归一化和颜色空间转换。
模型推理与性能测试：执行 100 次推理迭代计算平均耗时、FPS 等性能指标，包含预热过程以确保结果准确性。
后处理与检测结果优化：通过 NMS 算法过滤重叠检测框，结合置信度阈值筛选有效目标。
结果可视化：在原图上绘制边界框、类别标签和置信度，并添加推理时间和 FPS 信息。

技术特点

硬件加速：利用高通 QC6490 平台的 DSP 进行 AI 推理加速。
精度与速度平衡：支持量化模型（8 位权重 / 激活），在保证精度的同时提升效率。
多后端支持：可切换 QNN 和 SNPE 两种推理框架。
扩展性：通过命令行参数灵活配置模型路径、输入图像和推理后端。

应用场景

该代码适用于智能安防监控、工业质检、无人机巡检等需要在边缘设备上实时进行目标检测的场景，为开发者提供了在高通平台上部署 YOLOv8 模型的完整解决方案。

import time
import numpy as np
import cv2
import os
import aidlite
import argparse

# COCO数据集的80个类别名称
# COCO数据集是一个广泛使用的目标检测、图像分割和关键点检测数据集，这里列出了其包含的80个类别名称
coco_class = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
              'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
              'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
              'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
              'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
              'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
              'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
              'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
              'hair drier', 'toothbrush']

# 为每个类别随机分配颜色，用于绘制检测框
# 为每个类别生成一个随机的RGB颜色，方便后续在图像上绘制不同类别的检测框时进行区分
colors = {name: [np.random.randint(0, 255) for _ in range(3)] for i, name in enumerate(coco_class)}


def xywh2xyxy(x):
    '''
    将边界框格式从(中心x, 中心y, 宽度, 高度)转换为(左上角x, 左上角y, 右下角x, 右下角y)
    这是YOLO模型常用的边界框表示格式转换，因为在后续的处理和绘制中，(左上角x, 左上角y, 右下角x, 右下角y)格式更方便使用
    '''
    y = np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # 左上角x坐标
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # 左上角y坐标
    y[:, 2] = x[:, 0] + x[:, 2] / 2  # 右下角x坐标
    y[:, 3] = x[:, 1] + x[:, 3] / 2  # 右下角y坐标
    return y


def xyxy2xywh(box):
    '''
    将边界框格式从(左上角x, 左上角y, 右下角x, 右下角y)转换为(左上角x, 左上角y, 宽度, 高度)
    适合用于OpenCV的矩形绘制函数，因为OpenCV的矩形绘制函数通常使用(左上角x, 左上角y, 宽度, 高度)格式
    '''
    box[:, 2:] = box[:, 2:] - box[:, :2]
    return box


def NMS(dets, thresh):
    '''
    单类非极大值抑制(NMS)算法
    作用是在重叠的检测框中保留置信度最高的框，避免在同一目标上出现多个重叠的检测框
    dets.shape = (N, 5), (左上角x, 左上角y, 右下角x, 右下角y, 置信度)
    '''
    dets = np.array(dets)
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    areas = (y2 - y1 + 1) * (x2 - x1 + 1)  # 计算每个框的面积
    scores = dets[:, 4]  # 提取置信度
    keep = []  # 保存最终保留的框索引
    index = scores.argsort()[::-1]  # 按置信度从高到低排序

    # 循环处理每个框
    while index.size > 0:
        i = index[0]  # 当前置信度最高的框
        keep.append(i)  # 保留该框

        # 计算当前框与其他框的重叠区域
        x11 = np.maximum(x1[i], x1[index[1:]])
        y11 = np.maximum(y1[i], y1[index[1:]])
        x22 = np.minimum(x2[i], x2[index[1:]])
        y22 = np.minimum(y2[i], y2[index[1:]])

        w = np.maximum(0, x22 - x11 + 1)  # 重叠区域宽度
        h = np.maximum(0, y22 - y11 + 1)  # 重叠区域高度
        overlaps = w * h  # 重叠区域面积

        # 计算IoU (Intersection over Union)
        ious = overlaps / (areas[i] + areas[index[1:]] - overlaps)

        # 保留IoU小于阈值的框的索引
        idx = np.where(ious <= thresh)[0]
        index = index[idx + 1]  # +1是因为index[0]已经处理过
    return dets[keep]


def letterbox(img, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    '''
    调整图像大小并进行填充，保持原始图像的宽高比
    常用于目标检测预处理，确保输入图像尺寸符合模型要求，同时避免图像变形
    '''
    # 获取原始图像尺寸
    shape = img.shape[:2]  # 当前形状 [高度, 宽度]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # 计算缩放比例（保持宽高比）
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # 只缩小不放大（用于更好的测试mAP）
        r = min(r, 1.0)

    # 计算新的未填充尺寸和填充量
    ratio = r, r  # 宽度、高度比例
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # 宽度和高度的填充量

    if auto:  # 最小矩形填充
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # 确保填充量是stride的倍数
    elif scaleFill:  # 拉伸填充
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # 宽度、高度比例

    dw /= 2  # 将填充量分为左右两侧
    dh /= 2  # 将填充量分为上下两侧

    # 调整图像大小
    if shape[::-1] != new_unpad:  # 如果需要调整大小
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)

    # 计算填充边界
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))

    # 添加边界填充
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)
    return img, ratio, (dw, dh)


def preprocess_img(img, target_shape, means=[0, 0, 0], stds=[255, 255, 255]):
    '''
    图像预处理函数:
    1. 将图像调整为正方形
    2. 转换颜色空间
    3. 归一化处理
    target_shape: 目标尺寸
    means: 通道均值，用于z-score归一化
    stds: 通道标准差，用于z-score归一化
    '''
    img_processed = np.copy(img)
    # 获取图像尺寸
    [height, width, _] = img_processed.shape
    length = max((height, width))  # 取宽高的最大值
    scale = length / target_shape  # 计算缩放比例
    ratio = [scale, scale]  # 保存宽高比

    # 创建正方形画布并居中放置原始图像
    image = np.zeros((length, length, 3), np.uint8)
    image[0:height, 0:width] = img_processed

    # 转换颜色空间为RGB（OpenCV默认读取为BGR）
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    # 调整图像大小为目标尺寸
    img_input = cv2.resize(image, (target_shape, target_shape))
    print("image.shape==", image.shape)

    # 归一化处理（z-score）
    img_processed = (img_processed - means) / stds
    img_processed = img_processed.astype(np.float32)

    return img_processed, ratio


def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):
    '''
    将检测框坐标从处理后的图像尺寸缩放回原始图像尺寸
    img1_shape: 处理后的图像尺寸
    coords: 检测框坐标
    img0_shape: 原始图像尺寸
    ratio_pad: 缩放和填充信息
    '''
    if ratio_pad is None:  # 如果没有提供缩放和填充信息，则计算
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # 计算缩放比例
        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2  # 计算填充量
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    # 调整坐标（减去填充量并除以缩放比例）
    coords[:, [0, 2]] -= pad[0]  # x方向填充量
    coords[:, [1, 3]] -= pad[1]  # y方向填充量
    coords[:, :4] /= gain  # 应用缩放比例

    # 裁剪坐标，确保不超出图像边界
    clip_coords(coords, img0_shape)
    return coords


def clip_coords(boxes, img_shape):
    '''
    裁剪边界框坐标，确保它们在图像范围内
    boxes: 边界框坐标
    img_shape: 图像尺寸
    '''
    boxes[:, 0].clip(0, img_shape[1], out=boxes[:, 0])  # 裁剪x1
    boxes[:, 1].clip(0, img_shape[0], out=boxes[:, 1])  # 裁剪y1
    boxes[:, 2].clip(0, img_shape[1], out=boxes[:, 2])  # 裁剪x2
    boxes[:, 3].clip(0, img_shape[0], out=boxes[:, 3])  # 裁剪y2


def postprocess(outputs, ratio, conf_threshold=0.5, nms_threshold=0.45):
    '''
    模型输出后处理函数：
    1. 过滤低置信度检测
    2. 应用非极大值抑制
    3. 缩放检测框到原始图像尺寸
    outputs: 模型输出
    ratio: 缩放比例
    conf_threshold: 置信度阈值
    nms_threshold: NMS阈值
    '''
    rows = outputs.shape[0]  # 检测框数量
    boxes = []  # 存储边界框
    scores = []  # 存储置信度
    class_ids = []  # 存储类别ID

    # 遍历所有检测框
    for i in range(rows):
        classes_scores = outputs[i][4:]  # 获取类别分数（前4个是边界框信息）
        (minScore, maxScore, minClassLoc, (x, maxClassIndex)) = cv2.minMaxLoc(classes_scores)  # 获取最大分数和对应类别

        if maxScore >= conf_threshold:  # 如果置信度高于阈值
            # 提取边界框信息（中心坐标和宽高）
            box = [
                outputs[i][0] - (0.5 * outputs[i][2]), outputs[i][1] - (0.5 * outputs[i][3]),
                outputs[i][2], outputs[i][3]]
            boxes.append(box)
            scores.append(maxScore)
            class_ids.append(maxClassIndex)

    # 使用OpenCV的NMS函数进行非极大值抑制
    result_boxes = cv2.dnn.NMSBoxes(boxes, scores, score_threshold=conf_threshold, nms_threshold=nms_threshold, eta=0.5)
    result_boxes = result_boxes.reshape(-1)

    # 处理NMS后的结果
    new_bboxes = []
    new_scores = []
    new_class_ids = []
    for i in range(len(result_boxes)):
        index = result_boxes[i]
        bbox = boxes[index]
        x, y, w, h = float(bbox[0]), float(bbox[1]), float(bbox[2]), float(bbox[3])
        # 缩放坐标到原始图像尺寸
        new_bboxes.append([round(x * ratio[0]), round(y * ratio[1]), round(w * ratio[0]), round(h * ratio[1])])
        new_scores.append(scores[index])
        new_class_ids.append(class_ids[index])

    # 整理结果格式
    new_scores = np.expand_dims(new_scores, 1)
    new_class_ids = np.expand_dims(new_class_ids, 1)

    boxes = np.concatenate((new_bboxes, new_scores), axis=1)
    boxes = np.concatenate((boxes, new_class_ids), axis=1)

    return boxes


def draw_res(img, boxes):
    '''
    在图像上绘制检测结果：
    1. 绘制边界框
    2. 添加类别标签和置信度
    img: 原始图像
    boxes: 检测框信息，包含坐标、置信度和类别ID
    '''
    img = img.astype(np.uint8)  # 确保图像类型正确
    for i, [x, y, w, h, scores, class_ids] in enumerate(boxes):
        x = int(x)
        y = int(y)
        w = int(w)
        h = int(h)
        name = coco_class[int(class_ids)]  # 获取类别名称
        print(i + 1, [x, y, w, h], round(scores, 4), name)  # 打印检测信息

        label = f'{name} ({scores:.2f})'  # 构建标签文本
        W, H = cv2.getTextSize(label, 0, fontScale=1, thickness=2)[0]  # 获取文本尺寸
        color = colors[name]  # 获取类别对应的颜色

        # 绘制边界框
        cv2.rectangle(img, (x, y), (int(x + w), int(y + h)), color, thickness=2)

        # 绘制标签背景
        cv2.rectangle(img, (x, int(y - H)), (int(x + W / 2), y), (0, 255,), -1, cv2.LINE_AA)

        # 添加标签文本
        cv2.putText(img, label, (x, int(y) - 6), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1)
    return img


def main(args):
    '''
    主函数：
    1. 初始化模型和配置
    2. 读取和预处理图像
    3. 执行模型推理
    4. 处理和可视化结果
    '''
    print("Start image inference ... ...")

    # 初始化模型部分与原代码相同
    size = 640  # 模型输入尺寸
    config = aidlite.Config.create_instance()
    if config is None:
        print("Create config failed !")
        return False

    config.implement_type = aidlite.ImplementType.TYPE_LOCAL

    # 根据命令行参数选择模型框架
    if args.model_type.lower() == "qnn":
        config.framework_type = aidlite.FrameworkType.TYPE_QNN231  # 指定Qnn版本
    elif args.model_type.lower() == "snpe2" or args.model_type.lower() == "snpe":
        config.framework_type = aidlite.FrameworkType.TYPE_SNPE2

    config.accelerate_type = aidlite.AccelerateType.TYPE_DSP  # 使用DSP加速
    config.is_quantify_model = 1  # 使用量化模型

    # 创建并配置模型
    model = aidlite.Model.create_instance(args.target_model)
    if model is None:
        print("Create model failed !")
        return False
    input_shapes = [[1, size, size, 3]]  # 模型输入形状
    output_shapes = [[1, 80, 8400], [1, 4, 8400]]  # 模型输出形状

    # 设置模型属性
    model.set_model_properties(input_shapes, aidlite.DataType.TYPE_FLOAT32,
                               output_shapes, aidlite.DataType.TYPE_FLOAT32)

    # 构建和初始化解释器
    interpreter = aidlite.InterpreterBuilder.build_interpretper_from_model_and_config(model, config)
    if interpreter is None:
        print("build_interpretper_from_model_and_config failed !")
        return None
    result = interpreter.init()
    if result != 0:
        print(f"interpreter init failed !")
        return False
    result = interpreter.load_model()
    if result != 0:
        print("interpreter load model failed !")
        return False
    print("detect model load success!")

    # 读取图片
    img = cv2.imread(args.image_path)
    if img is None:
        print("Error: Could not open image file")
        return False

    # 图片预处理
    img_processed = np.copy(img)
    [h, w, _] = img_processed.shape
    length = max((h, w))
    scale = length / size
    ratio = [scale, scale]

    # 创建正方形画布并居中放置原始图像
    image = np.zeros((length, length, 3), np.uint8)
    image[0:h, 0:w] = img_processed

    # 转换颜色空间并调整大小
    img_input = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    img_input = cv2.resize(img_input, (size, size))

    # 归一化处理
    mean_data = [0, 0, 0]
    std_data = [255, 255, 255]
    img_input = (img_input - mean_data) / std_data  # HWC格式
    img_input = img_input.astype(np.float32)

    # 预热运行
    warmup_iters = 10
    print(f"Warming up with {warmup_iters} iterations...")
    for _ in range(warmup_iters):
        interpreter.set_input_tensor(0, img_input.data)
        interpreter.invoke()

    # 性能测试
    invoke_nums = 100
    invoke_times = []

    print(f"Running performance test with {invoke_nums} iterations...")
    for i in range(invoke_nums):
        # 设置输入tensor
        interpreter.set_input_tensor(0, img_input.data)

        # 只计算模型推理的时间
        t1 = time.time()
        result = interpreter.invoke()
        t2 = time.time()

        if result != 0:
            print("interpreter invoke() failed")
            return False

        invoke_time = (t2 - t1) * 1000  # 转换为毫秒
        invoke_times.append(invoke_time)

        # 每10次打印一次进度
        if (i + 1) % 10 == 0:
            print(f"Completed {i + 1}/{invoke_nums} iterations")

    # 计算统计指标
    mean_invoke_time = np.mean(invoke_times)
    max_invoke_time = np.max(invoke_times)
    min_invoke_time = np.min(invoke_times)
    var_invoke_time = np.var(invoke_times)
    fps = 1000 / mean_invoke_time  # 计算FPS

    # 打印性能结果
    print(f"\nInference {invoke_nums} times:\n"
          f"-- mean_invoke_time is {mean_invoke_time:.2f} ms\n"
          f"-- max_invoke_time is {max_invoke_time:.2f} ms\n"
          f"-- min_invoke_time is {min_invoke_time:.2f} ms\n"
          f"-- var_invoke_time is {var_invoke_time:.2f}\n"
          f"-- FPS: {fps:.2f}\n")

    # 获取最后一次的输出结果用于后处理
    qnn_local = interpreter.get_output_tensor(0).reshape(*output_shapes[0])
    qnn_conf = interpreter.get_output_tensor(1).reshape(*output_shapes[1])

    # 后处理
    qnn_result = np.concatenate((qnn_local, qnn_conf), axis=1)
    qnn_result = qnn_result.transpose(0, 2, 1)
    qnn_result = qnn_result[0]

    # 应用后处理函数获取最终检测结果
    detect = postprocess(qnn_result, ratio, conf_threshold=0.5, nms_threshold=0.45)
    print(f"Detected {len(detect)} targets in the image")

    # 在原图上绘制检测结果
    res_img = draw_res(img, list(detect))

    # 添加处理时间文本
    cv2.putText(res_img, f"Inference Time: {mean_invoke_time:.2f} ms | FPS: {fps:.2f}", (10, 30),
                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

    # 保存结果图片
    cv2.imwrite('output.jpg', res_img)
    print("Output image saved as 'output.jpg'")

    # 释放资源
    result = interpreter.destory()


def parser_args():
    '''
    解析命令行参数
    '''
    parser = argparse.ArgumentParser(description="Run image inference benchmarks")
    parser.add_argument('--target_model', type=str,
                        default='yolov8s/cutoff_yolov8s_qcs6490_w8a8.qnn231.ctx.bin',
                        help="inference model path")
    parser.add_argument('--image_path', type=str, default='bus.jpg', help="Input image path")
    parser.add_argument('--model_type', type=str, default='QNN', help="run backend")
    args = parser.parse_args()
    return args


if __name__ == "__main__":
    args = parser_args()
    main(args)