高通 QCS6490 平台上 YOLOv11 系列模型的性能测试

企业官方账号

2025-07-19 10:33:47

加精

前言

随着边缘智能计算的迅猛发展，机器人与无人机领域对硬件平台的性能提出了极高要求，既要满足复杂任务下的算力需求，又要在低功耗状态下实现高效运行。高通 QCS6490 平台应运而生，成为这一领域的关键驱动力。它基于先进的 6nm 制程工艺打造，融合了八核 Kryo 670 CPU，能在高达 2.7GHz 的主频下稳定工作，在实现强劲算力的同时，将功耗有效控制在较低水平，为机器人和无人机长时间运行提供坚实保障。其集成的第 6 代高通 AI Engine，搭配 Hexagon 处理器与融合 AI 加速器，可释放高达 12 TOPS 的 AI 算力，赋予设备强大的实时推理能力，使其能够快速处理复杂的视觉数据与决策任务。

在机器人应用场景中，无论是服务机器人对环境的实时感知与路径规划，还是工业机器人在生产线上的高速视觉质检，QCS6490 都能凭借出色的计算性能与低功耗优势，保障系统稳定高效运行。在无人机领域，面对电力巡检、农业植保、测绘等复杂任务，QCS6490 支持多摄像头并发采集数据，并实时运行各类检测与分析模型，同时借助企业级 Wi-Fi 6/6E 实现低延迟、多千兆位的数据传输，确保任务的精准与高效执行。

作为目标检测领域的前沿代表，YOLOv11 系列模型在网络架构、检测精度与速度等方面实现了重大突破，为边缘设备的智能化视觉感知提供了有力支持。在此背景下，深入探究高通 QCS6490 平台上 YOLOv11 系列模型的性能表现，对于推动机器人与无人机在智能化道路上的深度发展，挖掘硬件与算法的协同潜力，具有至关重要的现实意义。

高通6490硬件介绍

深度解析 QCS6490：硬件性能全揭秘-CSDN博客

YOLOv11模型性能指标

YOLOv11系列性能指标-QCS6490
模型尺寸640*640	CPU		NPU QNN2.31
模型尺寸640*640	FP32		INT8
YOLO11n	182.86 ms	5.47 FPS	5.02 ms	199.20 FPS
YOLO11s	458.7 ms	2.18 FPS	8.4 ms	119.05 FPS
YOLO11m	1266 ms	0.79 FPS	20.92 ms	47.80 FPS
YOLO11l	1601.65 ms	0.62 FPS	25.13 ms	39.79 FPS
YOLO11x	3196.28 ms	0.31 FPS	64.63 ms	15.47 FPS

点击链接可以下载YOLOv11系列模型的pt格式，其他模型尺寸可以通过AIMO转换模型，并修改下面参考代码中的model_size测试即可。

（一）将pt模型转换为onnx格式

Step1：升级pip版本为25.1.1

python3.10 -m pip install --upgrade pip
pip -V
aidlux@aidlux:~/aidcode$ pip -V
pip 25.1.1 from /home/aidlux/.local/lib/python3.10/site-packages/pip (python 3.10)

Step2：安装ultralytics和onnx

pip install ultralytics onnx

Step3:设置yolo命令的环境变量

方法 1：临时添加环境变量（立即生效）

在终端中执行以下命令，将 ~/.local/bin 添加到当前会话的环境变量中

export PATH="$PATH:$HOME/.local/bin"

说明：此操作仅对当前终端会话有效，关闭终端后失效。
验证：执行 yolo --version，若输出版本号（如 0.0.2），则说明命令已生效。

方法 2：永久添加环境变量（长期有效）

echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc
source ~/.bashrc  # 使修改立即生效

验证：执行 yolo --version，若输出版本号（如 0.0.2），则说明命令已生效。

测试环境中安装yolo版本为8.3.152

提示：如果遇到用户组权限问题，可以忽悠，因为yolo命令会另外构建临时文件，也可以执行下面命令更改用户组，执行后下面的警告会消失：

sudo chown -R aidlux:aidlux ~/.config/
sudo chown -R aidlux:aidlux ~/.config/Ultralytics

可能遇见的报错如下：

WARNING ⚠️ user config directory '/home/aidlux/.config/Ultralytics' is not writeable, defaulting to '/tmp' or CWD.Alternatively you can define a YOLO_CONFIG_DIR environment variable for this path.

Step4：将Yolov11系列模型的pt格式转换为onnx格式

新建一个python文件，命名自定义即可，用于模型转换以及导出：

from ultralytics import YOLO

# 加载同级目录下的.pt模型文件
model = YOLO('yolo11n.pt')  # 替换为实际模型文件名

# 导出ONNX配置参数
export_params = {
    'format': 'onnx',
    'opset': 12,          # 推荐算子集版本
    'simplify': True,     # 启用模型简化
    'dynamic': False,     # 固定输入尺寸
    'imgsz': 640,         # 标准输入尺寸
    'half': False         # 保持FP32精度
}

# 执行转换并保存到同级目录
model.export(**export_params)

执行该程序完成将pt模型导出为onnx模型。

提示:Yolo11s,Yolo11m,Yolo11l替换代码中Yolo11n即可；

（二）使用AIMO将onnx模型转换高通NPU可以运行的模型格式

Step1：选择模型优化，模型格式选择onnx格式上传模型

Step2：选择芯片型号以及目标框架，这里我们选择QCS6490+Qnn2.31

Step3：点击查看模型，使用Netron查看模型结构，进行输入输出的填写

使用Netron工具查看onnx模型结构，选择剪枝位置

/model.23/Sigmoid_output_0

/model.23/Mul_2_output_0

参考上图中红色框部分填写，其他不变，注意开启自动量化功能，AIMO更多操作查看使用说明或参考AIMO平台

Step4：接下来进行提交即可，转换完成后将目标模型文件下载，解压缩后其中的.bin.aidem文件即为模型文件

QNN代码

该代码实现了基于 YOLOv8 模型的目标检测流程，使用aidlite框架调用量化模型（QNN 或 SNPE 格式）在嵌入式设备（如 DSP 加速）上进行推理。核心流程包括：

图像预处理：调整尺寸、填充、归一化，确保符合模型输入要求。
模型推理：使用aidlite加载模型并执行推理，支持性能测试（计算平均耗时、FPS 等）。
后处理：过滤低置信度检测框、应用非极大值抑制（NMS）、将坐标映射回原始图像。
结果可视化：在图像上绘制检测框、类别标签和性能信息，并保存输出图像。

代码注释详细说明了每个函数的作用、参数含义和关键步骤，便于理解目标检测的完整流程及嵌入式部署的优化细节。

import time
import numpy as np
import cv2
import os
import aidlite
import argparse
 
# COCO数据集的80个类别名称（与模型训练时使用的类别对应）
coco_class = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
              'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
              'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
              'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
              'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
              'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
              'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
              'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
              'hair drier', 'toothbrush']
 
# 为每个类别随机分配唯一颜色（用于绘制检测框时区分不同类别）
colors = {name: [np.random.randint(0, 255) for _ in range(3)] for i, name in enumerate(coco_class)}
 
 
def xywh2xyxy(x):
    '''
    将边界框格式从(中心x, 中心y, 宽度, 高度)转换为(左上角x, 左上角y, 右下角x, 右下角y)
    这是YOLO系列模型输出的边界框格式转换，便于后续绘制和计算
    
    参数:
        x: 输入边界框数组，形状为(N, 4)，每个元素为[cx, cy, w, h]
    返回:
        y: 转换后的边界框数组，形状为(N, 4)，每个元素为[x1, y1, x2, y2]
    '''
    y = np.copy(x)  # 复制输入避免修改原数据
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # 计算左上角x坐标
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # 计算左上角y坐标
    y[:, 2] = x[:, 0] + x[:, 2] / 2  # 计算右下角x坐标
    y[:, 3] = x[:, 1] + x[:, 3] / 2  # 计算右下角y坐标
    return y
 
 
def xyxy2xywh(box):
    '''
    将边界框格式从(左上角x, 左上角y, 右下角x, 右下角y)转换为(左上角x, 左上角y, 宽度, 高度)
    转换后的格式适合OpenCV的矩形绘制函数cv2.rectangle()
    
    参数:
        box: 输入边界框数组，形状为(N, 4)，每个元素为[x1, y1, x2, y2]
    返回:
        转换后的边界框数组，形状为(N, 4)，每个元素为[x1, y1, w, h]
    '''
    box[:, 2:] = box[:, 2:] - box[:, :2]  # 宽度=右下角x-左上角x，高度=右下角y-左上角y
    return box
 
 
def NMS(dets, thresh):
    '''
    单类非极大值抑制(NMS)算法：去除重叠度高的冗余检测框，保留置信度最高的框
    
    参数:
        dets: 检测框数组，形状为(N, 5)，每个元素为[x1, y1, x2, y2, score]
        thresh: IoU阈值，大于该值的重叠框会被过滤
    返回:
        过滤后的检测框数组
    '''
    dets = np.array(dets)
    x1 = dets[:, 0]  # 所有框的左上角x
    y1 = dets[:, 1]  # 所有框的左上角y
    x2 = dets[:, 2]  # 所有框的右下角x
    y2 = dets[:, 3]  # 所有框的右下角y
    areas = (y2 - y1 + 1) * (x2 - x1 + 1)  # 计算每个框的面积（+1避免面积为0）
    scores = dets[:, 4]  # 提取所有框的置信度
    keep = []  # 保存最终保留的框的索引
    index = scores.argsort()[::-1]  # 按置信度从高到低排序的索引
    
    # 迭代处理所有框
    while index.size > 0:
        i = index[0]  # 当前置信度最高的框的索引
        keep.append(i)  # 保留该框
        
        # 计算当前框与其他框的重叠区域坐标
        x11 = np.maximum(x1[i], x1[index[1:]])  # 重叠区域左上角x
        y11 = np.maximum(y1[i], y1[index[1:]])  # 重叠区域左上角y
        x22 = np.minimum(x2[i], x2[index[1:]])  # 重叠区域右下角x
        y22 = np.minimum(y2[i], y2[index[1:]])  # 重叠区域右下角y
        
        w = np.maximum(0, x22 - x11 + 1)  # 重叠区域宽度（确保非负）
        h = np.maximum(0, y22 - y11 + 1)  # 重叠区域高度（确保非负）
        overlaps = w * h  # 重叠区域面积
        
        # 计算IoU（交并比）= 重叠面积 / (当前框面积 + 其他框面积 - 重叠面积)
        ious = overlaps / (areas[i] + areas[index[1:]] - overlaps)
        
        # 保留IoU小于阈值的框（这些框与当前框重叠度低）
        idx = np.where(ious <= thresh)[0]
        index = index[idx + 1]  # +1是因为index[0]已处理
    
    return dets[keep]  # 返回保留的框
 
 
def letterbox(img, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    '''
    调整图像大小并填充，保持原始宽高比（避免图像拉伸变形）
    目标检测预处理核心函数，确保输入图像尺寸符合模型要求的同时保留物体比例
    
    参数:
        img: 输入图像（H, W, C）
        new_shape: 目标尺寸，默认(640, 640)
        color: 填充区域的颜色（BGR格式）
        auto: 是否自动填充为最小矩形（确保填充量为stride的倍数）
        scaleFill: 是否直接拉伸图像至目标尺寸（可能导致变形）
        scaleup: 是否允许放大图像（测试时通常关闭以提高精度）
        stride: 模型步长，填充量需为其倍数以满足模型要求
    返回:
        img: 处理后的图像
        ratio: 缩放比例 [w_ratio, h_ratio]
        (dw, dh): 宽度和高度方向的填充量
    '''
    shape = img.shape[:2]  # 原始图像尺寸 [高度, 宽度]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)  # 若输入为整数，则转为正方形尺寸
 
    # 计算缩放比例（取宽和高中较小的比例，确保图像完全放入新尺寸）
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # 禁止放大（仅缩小）
        r = min(r, 1.0)
 
    # 计算缩放后的尺寸和填充量
    ratio = r, r  # 宽、高缩放比例
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))  # 缩放后的宽、高
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # 宽、高方向需填充的像素数
    
    if auto:  # 自动填充为最小矩形（填充量为stride的倍数）
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)
    elif scaleFill:  # 拉伸图像至目标尺寸（不保留比例）
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # 强制缩放比例
 
    dw /= 2  # 填充量分为左右两侧
    dh /= 2  # 填充量分为上下两侧
 
    # 调整图像大小
    if shape[::-1] != new_unpad:  # 若尺寸变化则 resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    
    # 计算上下左右填充量（取整避免浮点误差）
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    
    # 添加填充（边界扩展）
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)
    return img, ratio, (dw, dh)
 
 
def preprocess_img(img, target_shape, means=[0, 0, 0], stds=[255, 255, 255]):
    '''
    图像预处理：调整尺寸、转换颜色空间、归一化
    将原始图像转换为模型可接受的输入格式
    
    参数:
        img: 原始图像（BGR格式）
        target_shape: 目标尺寸（正方形边长）
        means: 归一化均值（RGB通道）
        stds: 归一化标准差（RGB通道）
    返回:
        img_processed: 预处理后的图像（可直接输入模型）
        ratio: 缩放比例 [宽比例, 高比例]
    '''
    img_processed = np.copy(img)
    [height, width, _] = img_processed.shape
    length = max((height, width))  # 取宽高最大值，构建正方形画布
    scale = length / target_shape  # 计算缩放比例（将正方形缩放到目标尺寸）
    ratio = [scale, scale]  # 宽和高的缩放比例（因使用正方形画布，比例相同）
    
    # 创建正方形画布并将原始图像居中放置
    image = np.zeros((length, length, 3), np.uint8)
    image[0:height, 0:width] = img_processed
    
    # 转换颜色空间（OpenCV默认BGR，模型通常需要RGB）
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # 调整图像大小至目标尺寸
    img_input = cv2.resize(image, (target_shape, target_shape))
    print("image.shape==", image.shape)  # 调试信息：打印处理中的图像尺寸
 
    # 归一化处理（减均值除标准差）
    img_processed = (img_processed - means) / stds
    img_processed = img_processed.astype(np.float32)  # 转换为float32类型（模型输入要求）
 
    return img_processed, ratio
 
 
def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):
    '''
    将检测框从处理后的图像尺寸映射回原始图像尺寸
    后处理核心函数，解决因预处理缩放和填充导致的坐标偏移问题
    
    参数:
        img1_shape: 处理后的图像尺寸 (H, W)
        coords: 检测框坐标（x1, y1, x2, y2）
        img0_shape: 原始图像尺寸 (H, W)
        ratio_pad: 缩放比例和填充量（若为None则自动计算）
    返回:
        coords: 映射后的原始图像坐标
    '''
    if ratio_pad is None:  # 计算缩放比例和填充量
        # 缩放比例 = 处理后尺寸 / 原始尺寸（取宽高比例中的较小值）
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])
        # 填充量 = (处理后尺寸 - 原始尺寸*缩放比例) / 2
        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]
 
    # 调整坐标：减去填充量，再除以缩放比例（反向还原预处理的缩放和填充）
    coords[:, [0, 2]] -= pad[0]  # x方向减去填充
    coords[:, [1, 3]] -= pad[1]  # y方向减去填充
    coords[:, :4] /= gain  # 除以缩放比例
    
    # 裁剪坐标至图像范围内（避免超出图像边界）
    clip_coords(coords, img0_shape)
    return coords
 
 
def clip_coords(boxes, img_shape):
    '''
    将边界框坐标限制在图像范围内（防止坐标超出图像宽高）
    
    参数:
        boxes: 边界框数组（x1, y1, x2, y2）
        img_shape: 图像尺寸 (H, W)
    '''
    boxes[:, 0].clip(0, img_shape[1], out=boxes[:, 0])  # x1 >= 0 且 <= 图像宽度
    boxes[:, 1].clip(0, img_shape[0], out=boxes[:, 1])  # y1 >= 0 且 <= 图像高度
    boxes[:, 2].clip(0, img_shape[1], out=boxes[:, 2])  # x2 >= 0 且 <= 图像宽度
    boxes[:, 3].clip(0, img_shape[0], out=boxes[:, 3])  # y2 >= 0 且 <= 图像高度
 
 
def postprocess(outputs, ratio, conf_threshold=0.5, nms_threshold=0.45):
    '''
    模型输出后处理：过滤低置信度检测框、应用NMS、映射坐标至原始图像
    
    参数:
        outputs: 模型输出（通常为(N, 85)，前4为框坐标，第5为置信度，后80为类别分数）
        ratio: 缩放比例（用于还原坐标）
        conf_threshold: 置信度阈值（过滤低置信度框）
        nms_threshold: NMS的IoU阈值（过滤重叠框）
    返回:
        boxes: 处理后的检测框数组，每个元素为[x, y, w, h, score, class_id]
    '''
    rows = outputs.shape[0]  # 检测框数量
    boxes = []  # 存储边界框（x, y, w, h）
    scores = []  # 存储置信度
    class_ids = []  # 存储类别ID
    
    # 遍历所有检测框
    for i in range(rows):
        classes_scores = outputs[i][4:]  # 提取类别分数（前4为框坐标，第5为置信度）
        # 获取最大类别分数及对应类别索引
        (minScore, maxScore, minClassLoc, (x, maxClassIndex)) = cv2.minMaxLoc(classes_scores)
        
        if maxScore >= conf_threshold:  # 过滤低置信度框
            # 提取边界框信息（中心坐标和宽高）并转换为左上角坐标和宽高
            box = [
                outputs[i][0] - (0.5 * outputs[i][2]),  # x = cx - w/2
                outputs[i][1] - (0.5 * outputs[i][3]),  # y = cy - h/2
                outputs[i][2],  # w
                outputs[i][3]   # h
            ]
            boxes.append(box)
            scores.append(maxScore)
            class_ids.append(maxClassIndex)
 
    # 应用非极大值抑制（使用OpenCV内置函数）
    result_boxes = cv2.dnn.NMSBoxes(
        boxes, scores, 
        score_threshold=conf_threshold, 
        nms_threshold=nms_threshold, 
        eta=0.5  # 用于迭代NMS的参数
    )
    result_boxes = result_boxes.reshape(-1)  # 调整形状便于遍历
    
    # 处理NMS后的结果
    new_bboxes = []
    new_scores = []
    new_class_ids = []
    for i in range(len(result_boxes)):
        index = result_boxes[i]
        bbox = boxes[index]
        x, y, w, h = float(bbox[0]), float(bbox[1]), float(bbox[2]), float(bbox[3])
        # 将坐标缩放回原始图像尺寸
        new_bboxes.append([
            round(x * ratio[0]),  # x
            round(y * ratio[1]),  # y
            round(w * ratio[0]),  # w
            round(h * ratio[1])   # h
        ])
        new_scores.append(scores[index])
        new_class_ids.append(class_ids[index])
 
    # 整理结果格式（拼接边界框、置信度、类别ID）
    new_scores = np.expand_dims(new_scores, 1)  # 扩展维度便于拼接
    new_class_ids = np.expand_dims(new_class_ids, 1)
 
    boxes = np.concatenate((new_bboxes, new_scores), axis=1)
    boxes = np.concatenate((boxes, new_class_ids), axis=1)
 
    return boxes
 
 
def draw_res(img, boxes):
    '''
    在图像上绘制检测结果：边界框、类别标签和置信度
    
    参数:
        img: 原始图像
        boxes: 检测框数组（x, y, w, h, score, class_id）
    返回:
        img: 绘制后的图像
    '''
    img = img.astype(np.uint8)  # 确保图像类型正确（防止绘制异常）
    for i, [x, y, w, h, scores, class_ids] in enumerate(boxes):
        x = int(x)
        y = int(y)
        w = int(w)
        h = int(h)
        name = coco_class[int(class_ids)]  # 获取类别名称
        print(i + 1, [x, y, w, h], round(scores, 4), name)  # 打印检测信息
        
        # 构建标签文本（类别 + 置信度）
        label = f'{name} ({scores:.2f})'
        # 获取文本尺寸（用于绘制标签背景）
        W, H = cv2.getTextSize(label, 0, fontScale=1, thickness=2)[0]
        color = colors[name]  # 获取该类别的颜色
        
        # 绘制边界框
        cv2.rectangle(img, (x, y), (int(x + w), int(y + h)), color, thickness=2)
        
        # 绘制标签背景（半透明矩形）
        cv2.rectangle(img, (x, int(y - H)), (int(x + W / 2), y), (0, 255,), -1, cv2.LINE_AA)
        
        # 绘制标签文本
        cv2.putText(
            img, label, (x, int(y) - 6), 
            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1
        )
    return img
 
 
def main(args):
    '''
    主函数：整合预处理、模型推理、后处理和结果可视化的完整流程
    
    参数:
        args: 命令行参数（包含模型路径、图像路径等）
    '''
    print("Start image inference ... ...")
    
    # 模型配置参数
    size = 640  # 模型输入尺寸（YOLOv8默认640）
    config = aidlite.Config.create_instance()  # 创建配置实例
    if config is None:
        print("Create config failed !")
        return False

    config.implement_type = aidlite.ImplementType.TYPE_LOCAL  # 本地部署模式
    
    # 根据命令行参数选择模型框架（QNN或SNPE）
    if args.model_type.lower() == "qnn":
        config.framework_type = aidlite.FrameworkType.TYPE_QNN231  # QNN框架（版本231）
    elif args.model_type.lower() == "snpe2" or args.model_type.lower() == "snpe":
        config.framework_type = aidlite.FrameworkType.TYPE_SNPE2  # SNPE框架

    config.accelerate_type = aidlite.AccelerateType.TYPE_DSP  # 使用DSP加速推理
    config.is_quantify_model = 1  # 启用量化模型（提升速度，降低精度损失）

    # 创建模型实例并加载模型
    model = aidlite.Model.create_instance(args.target_model)
    if model is None:
        print("Create model failed !")
        return False
    # 定义输入输出形状（需与模型实际输入输出匹配）
    input_shapes = [[1, size, size, 3]]  # 输入形状：[批次, 高, 宽, 通道]
    output_shapes = [[1, 4, 8400], [1, 80, 8400]]  # 输出形状：[批次, 4/80, 检测框数量]
    
    # 设置模型输入输出属性（数据类型为float32）
    model.set_model_properties(
        input_shapes, aidlite.DataType.TYPE_FLOAT32,
        output_shapes, aidlite.DataType.TYPE_FLOAT32
    )

    # 构建解释器（用于执行推理）
    interpreter = aidlite.InterpreterBuilder.build_interpretper_from_model_and_config(model, config)
    if interpreter is None:
        print("build_interpretper_from_model_and_config failed !")
        return None
    # 初始化解释器
    result = interpreter.init()
    if result != 0:
        print(f"interpreter init failed !")
        return False
    # 加载模型到设备（DSP/NPU等）
    result = interpreter.load_model()
    if result != 0:
        print("interpreter load model failed !")
        return False
    print("detect model load success!")

    # 读取输入图像
    img = cv2.imread(args.image_path)
    if img is None:
        print("Error: Could not open image file")
        return False

    # 图像预处理
    img_processed = np.copy(img)
    [h, w, _] = img_processed.shape
    length = max((h, w))  # 构建正方形画布的边长
    scale = length / size  # 计算缩放比例
    ratio = [scale, scale]  # 宽高缩放比例（因使用正方形画布，比例相同）
    
    # 创建正方形画布并放置原始图像（避免拉伸）
    image = np.zeros((length, length, 3), np.uint8)
    image[0:h, 0:w] = img_processed
    
    # 颜色空间转换（BGR->RGB）和尺寸调整
    img_input = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    img_input = cv2.resize(img_input, (size, size))

    # 归一化处理（减均值除标准差）
    mean_data = [0, 0, 0]
    std_data = [255, 255, 255]
    img_input = (img_input - mean_data) / std_data  # 归一化到[0, 1]
    img_input = img_input.astype(np.float32)  # 转换为模型输入要求的类型

    # 预热运行（消除首次推理的初始化耗时影响）
    warmup_iters = 10
    print(f"Warming up with {warmup_iters} iterations...")
    for _ in range(warmup_iters):
        interpreter.set_input_tensor(0, img_input.data)  # 设置输入
        interpreter.invoke()  # 执行推理

    # 性能测试（多次推理计算平均耗时）
    invoke_nums = 100  # 推理次数
    invoke_times = []  # 存储每次推理耗时（毫秒）
    
    print(f"Running performance test with {invoke_nums} iterations...")
    for i in range(invoke_nums):
        # 设置输入张量
        interpreter.set_input_tensor(0, img_input.data)
        
        # 记录推理耗时（仅计算模型执行时间）
        t1 = time.time()
        result = interpreter.invoke()  # 执行推理
        t2 = time.time()
        
        if result != 0:
            print("interpreter invoke() failed")
            return False
        
        invoke_time = (t2 - t1) * 1000  # 转换为毫秒
        invoke_times.append(invoke_time)
        
        # 打印进度（每10次迭代）
        if (i + 1) % 10 == 0:
            print(f"Completed {i + 1}/{invoke_nums} iterations")

    # 计算性能指标
    mean_invoke_time = np.mean(invoke_times)  # 平均耗时
    max_invoke_time = np.max(invoke_times)    # 最大耗时
    min_invoke_time = np.min(invoke_times)    # 最小耗时
    var_invoke_time = np.var(invoke_times)    # 方差（反映耗时稳定性）
    fps = 1000 / mean_invoke_time  # 计算FPS（每秒帧率）

    # 打印性能结果
    print(f"\nInference {invoke_nums} times:\n"
          f"-- mean_invoke_time is {mean_invoke_time:.2f} ms\n"
          f"-- max_invoke_time is {max_invoke_time:.2f} ms\n"
          f"-- min_invoke_time is {min_invoke_time:.2f} ms\n"
          f"-- var_invoke_time is {var_invoke_time:.2f}\n"
          f"-- FPS: {fps:.2f}\n")

    # 获取模型输出并处理（以最后一次推理结果为例）
    # 输出张量0：边界框坐标（1,4,8400）-> 重塑为输出形状
    qnn_local = interpreter.get_output_tensor(1).reshape(*output_shapes[0])
    # 输出张量1：类别分数（1,80,8400）-> 重塑为输出形状
    qnn_conf = interpreter.get_output_tensor(0).reshape(*output_shapes[1])

    # 拼接输出（将坐标和类别分数合并）
    qnn_result = np.concatenate((qnn_local, qnn_conf), axis=1)
    qnn_result = qnn_result.transpose(0, 2, 1)  # 转置为（1, 8400, 84）
    qnn_result = qnn_result[0]  # 去除批次维度（仅单张图像）

    # 后处理：过滤和NMS
    detect = postprocess(qnn_result, ratio, conf_threshold=0.5, nms_threshold=0.45)
    print(f"Detected {len(detect)} targets in the image")

    # 绘制检测结果
    res_img = draw_res(img, list(detect))

    # 在图像上添加性能信息（平均耗时和FPS）
    cv2.putText(
        res_img, 
        f"Inference Time: {mean_invoke_time:.2f} ms | FPS: {fps:.2f}", 
        (10, 30),
        cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2
    )

    # 保存结果图像
    cv2.imwrite('output.jpg', res_img)
    print("Output image saved as 'output.jpg'")

    # 释放资源
    result = interpreter.destory()
 
 
def parser_args():
    '''
    解析命令行参数：定义可配置的输入项（模型路径、图像路径、框架类型等）
    
    返回:
        args: 解析后的命令行参数
    '''
    parser = argparse.ArgumentParser(description="Run image inference benchmarks")
    parser.add_argument(
        '--target_model', type=str,
        default='yolov8n/cutoff_yolov8n_qcs8550_fp16.qnn231.ctx.bin',
        help="inference model path"
    )
    parser.add_argument(
        '--image_path', type=str, default='bus.jpg', 
        help="Input image path"
    )
    parser.add_argument(
        '--model_type', type=str, default='QNN', 
        help="run backend (QNN or SNPE)"
    )
    args = parser.parse_args()
    return args
 
 
if __name__ == "__main__":
    args = parser_args()  # 解析命令行参数
    main(args)  # 执行主函数