高通手机跑AI系列之——手部姿势跟踪

伊利丹~怒风

企业官方账号

2025-06-16 14:44:39

环境准备

手机

测试手机型号：Redmi K60 Pro

处理器：第二代骁龙8移动--8gen2

运行内存：8.0GB ，LPDDR5X-8400，67.0 GB/s

摄像头：前置16MP+后置50MP+8MP+2MP

AI算力：NPU 48Tops INT8 && GPU 1536ALU x 2 x 680MHz = 2.089 TFLOPS

软件

APP：AidLux2.0

系统环境：Ubuntu 20.04.3 LTS

提示：AidLux登录后代码运行更流畅，在代码运行时保持AidLux APP在前台运行，避免代码运行过程中被系统回收进程，另外屏幕保持常亮，一般息屏后一段时间，手机系统会进入休眠状态，如需长驻后台需要给APP权限。

算法Demo

代码功能分析

这段代码是一个基于 AI 的手部检测与识别程序，它通过摄像头实时捕捉画面，检测画面中的手部，并识别出手部的关键点位置，最终在画面上绘制出手部轮廓和关键点。

代码应用场景分析

这段代码实现了实时手部检测和关键点识别功能，可应用于多种场景：

手势识别与交互系统
- 智能家居控制：通过特定手势控制灯光、电器等设备
- 虚拟现实 / 增强现实 (VR/AR)：手部动作作为交互输入，增强沉浸感
- 游戏控制：替代传统控制器，实现更自然的游戏操作
人机协作与机器人控制
- 工业机器人引导：通过手势指挥机器人执行任务
- 远程操作：操作人员通过手势控制远程设备
医疗与康复领域
- 手部运动康复训练：监测患者手部动作，评估康复进度
- 手术辅助：医生通过手势操作医疗设备或查看影像资料
教育与演示
- 互动教学：教师通过手势控制教学内容展示
- 演示系统：演讲者通过手势控制幻灯片或其他演示内容
安防监控
- 异常行为检测：分析人员手部动作识别潜在威胁行为
- 身份验证：结合手部特征进行身份识别

AidLite 推理引擎功能

AidLite 推理引擎是专为边缘设备优化的轻量级 AI 推理引擎，具有以下核心功能：

多框架支持
- 支持 TensorFlow Lite、ONNX 等多种模型格式
- 代码中使用了 TensorFlow Lite 格式的模型 (.tflite)
硬件加速
- 支持 GPU、CPU 、NPU等多种加速方式
- 手掌检测模型使用 GPU 加速，提高实时性能
- 关键点识别模型使用 CPU 加速，保证精度
轻量化设计
- 专为资源受限的边缘设备优化
- 低内存占用，高效的模型推理能力
易用的 API 接口
- 提供模型加载、输入设置、推理执行、输出获取等完整流程的 API
- 代码中通过aidlite.Model、aidlite.Config和aidlite.InterpreterBuilder等类实现模型管理和推理

OpenCV (代码中的 CV) 功能

OpenCV 是计算机视觉领域的经典库，在这段代码中主要用于以下功能：

图像采集与处理
- 通过cv2.VideoCapture获取摄像头实时视频流
- 图像预处理：颜色空间转换 (cv2.cvtColor)、缩放 (cv2.resize) 等
图像显示与可视化
- 使用cv2.imshow显示处理后的图像
- 绘制检测框 (cv2.rectangle)、关键点 (cv2.circle) 和连接线 (cv2.line)
辅助计算功能
- 计算手掌重心：cv2.moments计算图像矩
- 边界框计算：cv2.boundingRect计算包围关键点的最小矩形
图像操作
- 图像翻转：使用cv2.flip处理前置摄像头的镜像问题
- 区域提取：从原始图像中提取手部区域进行单独处理

OpenCV 提供的这些功能为 AI 模型的输入准备和输出结果可视化提供了基础支持，使整个系统能够实现从图像采集到结果展示的完整流程。

AI 模型作用介绍

代码中使用了两个 AI 模型协同工作：

手掌检测模型 (palm_detection.tflite)
- 这是一个轻量级的目标检测模型，专门用于检测图像中的手掌。
- 模型输入：128×128 像素的 RGB 图像
- 模型输出：包含手掌位置和边界框信息
- 作用：快速定位图像中的手掌，为后续的关键点识别提供感兴趣区域 (ROI)
- 加速方式：使用 GPU 加速，提高检测速度
手部关键点检测模型 (hand_landmark.tflite)
- 该模型对手部区域进行更精细的分析，识别 21 个关键点
- 模型输入：224×224 像素的 RGB 图像（通常是手掌检测模型输出的 ROI）
- 模型输出：21 个三维关键点坐标，表示手部的详细姿态
- 作用：精确识别手指关节、指尖等部位的位置
- 加速方式：使用 CPU 加速，保证识别精度

这两个模型结合使用，实现了从图像中检测手掌位置，到精确识别手部 21 个关键点的完整流程，能够实时跟踪手部动作和姿态。

import cv2
import time 
from time import sleep
import subprocess
import math
import sys
import numpy as np
from blazeface import *  # 导入BlazeFace人脸/手部检测模型相关函数
import aidlite  # AidLux平台的AI推理框架
import os

# 获取摄像头设备ID，优先选择USB摄像头
def get_cap_id():
    try:
        # 构造命令，使用awk处理输出
        cmd = "ls -l /sys/class/video4linux | awk -F ' -> ' '/usb/{sub(/.*video/, \"\", $2); print $2}'"
        result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
        output = result.stdout.strip().split()

        # 转换所有捕获的编号为整数，找出最小值
        video_numbers = list(map(int, output))
        if video_numbers:
            return min(video_numbers)
        else:
            return None
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

# 图像预处理函数，将图像转换为适合TFLite模型输入的格式
def preprocess_image_for_tflite32(image, model_image_size=300):
    try:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # 转换颜色空间从BGR到RGB
        image = cv2.resize(image, (model_image_size, model_image_size))  # 调整图像大小
        image = np.expand_dims(image, axis=0)  # 添加批次维度
        image = (2.0 / 255.0) * image - 1.0  # 归一化处理，将像素值缩放到[-1,1]范围
        image = image.astype('float32')  # 转换数据类型为float32
    except cv2.error as e:
        print(str(e))
    return image
        
# 在图像上绘制手部检测框
def plot_detections(img, detections, with_keypoints=True):
        output_img = img
        print(img.shape)
        x_min=[0,0]  # 存储两只手的最小x坐标
        x_max=[0,0]  # 存储两只手的最大x坐标
        y_min=[0,0]  # 存储两只手的最小y坐标
        y_max=[0,0]  # 存储两只手的最大y坐标
        hand_nums=len(detections)  # 检测到的手的数量
        print("Found %d hands" % hand_nums)
        if hand_nums >2:
            hand_nums=2  # 最多处理两只手        
        for i in range(hand_nums):
            ymin = detections[i][ 0] * img.shape[0]  # 计算边界框的y最小值
            xmin = detections[i][ 1] * img.shape[1]  # 计算边界框的x最小值
            ymax = detections[i][ 2] * img.shape[0]  # 计算边界框的y最大值
            xmax = detections[i][ 3] * img.shape[1]  # 计算边界框的x最大值
            w=int(xmax-xmin)  # 计算边界框宽度
            h=int(ymax-ymin)  # 计算边界框高度
            h=max(h,w)  # 取宽高的最大值
            h=h*224./128.  # 调整高度尺寸
            
            x=(xmin+xmax)/2.  # 计算中心点x坐标
            y=(ymin+ymax)/2.  # 计算中心点y坐标
            
            # 调整边界框大小和位置
            xmin=x-h/2.
            xmax=x+h/2.
            ymin=y-h/2.-0.18*h
            ymax=y+h/2.-0.18*h
            
            # 存储边界框坐标
            x_min[i]=int(xmin)
            y_min[i]=int(ymin)
            x_max[i]=int(xmax)
            y_max[i]=int(ymax)            
            p1 = (int(xmin),int(ymin))  # 边界框左上角坐标
            p2 = (int(xmax),int(ymax))  # 边界框右下角坐标
            cv2.rectangle(output_img, p1, p2, (0,255,255),2,1)  # 在图像上绘制边界框
            
        return x_min,y_min,x_max,y_max
        
# 在图像上绘制手部网格关键点
def draw_mesh(image, mesh, mark_size=4, line_width=1):
    """Draw the mesh on an image"""
    # The mesh are normalized which means we need to convert it back to fit
    # the image size.
    image_size = image.shape[0]
    mesh = mesh * image_size
    for point in mesh:
        cv2.circle(image, (point[0], point[1]),
                   mark_size, (255, 0, 0), 4)

# 计算手掌的重心
def calc_palm_moment(image, landmarks):
    image_width, image_height = image.shape[1], image.shape[0]

    palm_array = np.empty((0, 2), int)

    # 收集手掌区域的关键点
    for index, landmark in enumerate(landmarks):
        if math.isnan(landmark[0]):
            landmark[0] = 0
        if math.isnan(landmark[1]):
            landmark[1] = 0
        landmark_x = min(int(landmark[0] * image_width), image_width - 1)
        landmark_y = min(int(landmark[1] * image_height), image_height - 1)

        landmark_point = [np.array((landmark_x, landmark_y))]

        if index == 0:  # 手腕1
            palm_array = np.append(palm_array, landmark_point, axis=0)
        if index == 1:  # 手腕2
            palm_array = np.append(palm_array, landmark_point, axis=0)
        if index == 5:  # 食指：根部
            palm_array = np.append(palm_array, landmark_point, axis=0)
        if index == 9:  # 中指：根部
            palm_array = np.append(palm_array, landmark_point, axis=0)
        if index == 13:  # 无名指：根部
            palm_array = np.append(palm_array, landmark_point, axis=0)
        if index == 17:  # 小指：根部
            palm_array = np.append(palm_array, landmark_point, axis=0)
    
    # 计算重心
    M = cv2.moments(palm_array)
    cx, cy = 0, 0
    if M['m00'] != 0:
        cx = int(M['m10'] / M['m00'])
        cy = int(M['m01'] / M['m00'])

    return cx, cy

# 计算包围手部关键点的矩形框
def calc_bounding_rect(image, landmarks):
    image_width, image_height = image.shape[1], image.shape[0]

    landmark_array = np.empty((0, 2), int)

    # 收集所有关键点坐标
    for _, landmark in enumerate(landmarks):
        landmark_x = min(int(landmark[0] * image_width), image_width - 1)
        landmark_y = min(int(landmark[0] * image_height), image_height - 1)

        landmark_point = [np.array((landmark_x, landmark_y))]

        landmark_array = np.append(landmark_array, landmark_point, axis=0)

    # 计算包围矩形
    x, y, w, h = cv2.boundingRect(landmark_array)

    return [x, y, x + w, y + h]

# 在图像上绘制包围矩形
def draw_bounding_rect(use_brect, image, brect):
    if use_brect:
        # 外接矩形
        cv2.rectangle(image, (brect[0], brect[1]), (brect[2], brect[3]),
                     (0, 255, 0), 2)

    return image
    
# 在图像上绘制手部关键点和连接线
def draw_landmarks(image, cx, cy, landmarks):
    
    image_width, image_height = image.shape[1], image.shape[0]

    landmark_point = []

    # 绘制关键点
    for index, landmark in enumerate(landmarks):
        # if landmark.visibility < 0 or landmark.presence < 0:
        #     continue

        landmark_x = min(int(landmark[0] * image_width), image_width - 1)
        landmark_y = min(int(landmark[1] * image_height), image_height - 1)

        landmark_point.append((landmark_x, landmark_y))

        # 根据关键点类型绘制不同大小和颜色的点
        if index == 0:  # 手腕1
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 1:  # 手腕2
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 2:  # 拇指：根部
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 3:  # 拇指：第1关节
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 4:  # 拇指：指尖
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
            cv2.circle(image, (landmark_x, landmark_y), 12, (0, 255, 0), 2)
        if index == 5:  # 食指：根部
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 6:  # 食指：第2关节
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 7:  # 食指：第1关节
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 8:  # 食指：指尖
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
            cv2.circle(image, (landmark_x, landmark_y), 12, (0, 255, 0), 2)
        if index == 9:  # 中指：根部
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 10:  # 中指：第2关节
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 11:  # 中指：第1关节
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 12:  # 中指：指尖
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
            cv2.circle(image, (landmark_x, landmark_y), 12, (0, 255, 0), 2)
        if index == 13:  # 无名指：根部
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 14:  # 无名指：第2关节
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 15:  # 无名指：第1关节
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 16:  # 无名指：指尖
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
            cv2.circle(image, (landmark_x, landmark_y), 12, (0, 255, 0), 2)
        if index == 17:  # 小指：根部
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 18:  # 小指：第2关节
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 19:  # 小指：第1关节
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
        if index == 20:  # 小指：指尖
            cv2.circle(image, (landmark_x, landmark_y), 5, (0, 255, 0), 2)
            cv2.circle(image, (landmark_x, landmark_y), 12, (0, 255, 0), 2)

    # 绘制连接线
    if len(landmark_point) > 0:
        # 拇指
        cv2.line(image, landmark_point[2], landmark_point[3], (0, 255, 0), 2)
        cv2.line(image, landmark_point[3], landmark_point[4], (0, 255, 0), 2)

        # 食指
        cv2.line(image, landmark_point[5], landmark_point[6], (0, 255, 0), 2)
        cv2.line(image, landmark_point[6], landmark_point[7], (0, 255, 0), 2)
        cv2.line(image, landmark_point[7], landmark_point[8], (0, 255, 0), 2)

        # 中指
        cv2.line(image, landmark_point[9], landmark_point[10], (0, 255, 0), 2)
        cv2.line(image, landmark_point[10], landmark_point[11], (0, 255, 0), 2)
        cv2.line(image, landmark_point[11], landmark_point[12], (0, 255, 0), 2)

        # 无名指
        cv2.line(image, landmark_point[13], landmark_point[14], (0, 255, 0), 2)
        cv2.line(image, landmark_point[14], landmark_point[15], (0, 255, 0), 2)
        cv2.line(image, landmark_point[15], landmark_point[16], (0, 255, 0), 2)

        # 小指
        cv2.line(image, landmark_point[17], landmark_point[18], (0, 255, 0), 2)
        cv2.line(image, landmark_point[18], landmark_point[19], (0, 255, 0), 2)
        cv2.line(image, landmark_point[19], landmark_point[20], (0, 255, 0), 2)

        # 手掌
        cv2.line(image, landmark_point[0], landmark_point[1], (0, 255, 0), 2)
        cv2.line(image, landmark_point[1], landmark_point[2], (0, 255, 0), 2)
        cv2.line(image, landmark_point[2], landmark_point[5], (0, 255, 0), 2)
        cv2.line(image, landmark_point[5], landmark_point[9], (0, 255, 0), 2)
        cv2.line(image, landmark_point[9], landmark_point[13], (0, 255, 0), 2)
        cv2.line(image, landmark_point[13], landmark_point[17], (0, 255, 0), 2)
        cv2.line(image, landmark_point[17], landmark_point[0], (0, 255, 0), 2)

    # 绘制重心点
    if len(landmark_point) > 0:
        cv2.circle(image, (cx, cy), 12, (0, 255, 0), 2)

    return image


# 初始化手掌检测模型
inShape =[[1 , 128 ,128 ,3]]  # 模型输入形状
outShape= [[1 , 896,18],[1,896,1]]  # 模型输出形状
model_path="models/palm_detection.tflite"  # 手掌检测模型路径

# 创建Model实例对象，并设置模型相关参数
model = aidlite.Model.create_instance(model_path)
if model is None:
    print("Create palm_detection model failed !")

# 设置模型属性
model.set_model_properties(inShape, aidlite.DataType.TYPE_FLOAT32, outShape,aidlite.DataType.TYPE_FLOAT32)

# 创建Config实例对象，并设置配置信息
config = aidlite.Config.create_instance()
config.implement_type = aidlite.ImplementType.TYPE_FAST  # 快速推理实现类型
config.framework_type = aidlite.FrameworkType.TYPE_TFLITE  # TensorFlow Lite框架类型
config.accelerate_type = aidlite.AccelerateType.TYPE_GPU  # GPU加速
config.number_of_threads = 4  # 线程数

# 创建推理解释器对象
fast_interpreter = aidlite.InterpreterBuilder.build_interpretper_from_model_and_config(model, config)
if fast_interpreter is None:
    print("palm_detection model build_interpretper_from_model_and_config failed !")
    
# 完成解释器初始化
result = fast_interpreter.init()
if result != 0:
    print("palm_detection model interpreter init failed !")
    
# 加载模型
result = fast_interpreter.load_model()
if result != 0:
    print("palm_detection model interpreter load model failed !")
    
print("palm_detection model load success!")


# 初始化手部关键点检测模型
model_path1="models/hand_landmark.tflite"  # 手部关键点检测模型路径
inShape1 =[[1 , 224 , 224 ,3]]  # 模型输入形状
outShape1= [[1 , 63],[1],[1]]  # 模型输出形状

# 创建Model实例对象，并设置模型相关参数
model1 = aidlite.Model.create_instance(model_path1)
if model1 is None:
    print("Create hand_landmark model failed !")
    
# 设置模型属性
model1.set_model_properties(inShape1, aidlite.DataType.TYPE_FLOAT32, outShape1,
                           aidlite.DataType.TYPE_FLOAT32)

# 创建Config实例对象，并设置配置信息
config1 = aidlite.Config.create_instance()
config1.implement_type = aidlite.ImplementType.TYPE_FAST  # 快速推理实现类型
config1.framework_type = aidlite.FrameworkType.TYPE_TFLITE  # TensorFlow Lite框架类型
config1.accelerate_type = aidlite.AccelerateType.TYPE_CPU  # CPU加速
config.number_of_threads = 4  # 线程数

# 创建推理解释器对象
fast_interpreter1 = aidlite.InterpreterBuilder.build_interpretper_from_model_and_config(model1, config1)
if fast_interpreter1 is None:
    print("hand_landmark model build_interpretper_from_model_and_config failed !")
    
# 完成解释器初始化
result = fast_interpreter1.init()
if result != 0:
    print("hand_landmark model interpreter init failed !")
    
# 加载模型
result = fast_interpreter1.load_model()
if result != 0:
    print("hand_landmark model interpreter load model failed !")
    
print("hand_landmark model load success!")


# 加载锚点数据，用于模型推理
anchors = np.load('models/anchors.npy').astype(np.float32)

# 设置Aidlux平台类型和摄像头ID
aidlux_type="basic"
# 0-后置，1-前置
camId = 1
opened = False

# 尝试打开摄像头
while not opened:
    if aidlux_type == "basic":
        cap=cv2.VideoCapture(camId, device='mipi')
    else:
        capId = get_cap_id()
        print("usb camera id: ", capId)
        if capId is None:
            print ("no found usb camera")
            # 默认用1-前置摄像头打开相机，若打开失败，请尝试修改为0-后置
            cap=cv2.VideoCapture(1, device='mipi')
        else:
            camId = capId
            cap = cv2.VideoCapture(camId)
            cap.set(6, cv2.VideoWriter.fourcc('M','J','P','G'))
    
    if cap.isOpened():
        opened = True
    else:
        print("open camera failed")
        cap.release()
        time.sleep(0.5)
    
# 手检测标志和坐标初始化
bHand=False
x_min=[0,0]
x_max=[0,0]
y_min=[0,0]
y_max=[0,0]
fface=0.0
use_brect=True

# 主循环：持续捕获视频帧并进行手部检测和关键点识别
while True:
    
    ret, frame=cap.read()  # 读取一帧视频
    if not ret:
        continue
    if frame is None:
        continue
        
    # 如果使用前置摄像头，水平翻转图像以获得自然的镜像效果
    if camId==1:
        frame=cv2.flip(frame,1)
        
    # 图像预处理，为手掌检测模型准备输入
    img = preprocess_image_for_tflite32(frame,128)

    # 手部检测和关键点识别流程
    if bHand==False:
        # 设置输入数据
        result = fast_interpreter.set_input_tensor(0, img.data)
        if result != 0:
            print("palm_detection model interpreter set_input_tensor() failed")
            
        # 执行手掌检测模型推理
        result = fast_interpreter.invoke()
        if result != 0:
            print("palm_detection model interpreter invoke() failed")
            
        # 获取输出数据
        raw_boxes = fast_interpreter.get_output_tensor(0)
        if raw_boxes is None:
            print("sample : palm_detection model interpreter->get_output_tensor(0) failed !")

        classificators = fast_interpreter.get_output_tensor(1)
        if classificators is None:
            print("sample : palm_detection model interpreter->get_output_tensor(1) failed !")
    
        # 解析检测结果
        detections = blazeface(raw_boxes, classificators, anchors)

        # 在图像上绘制检测框并获取边界框坐标
        x_min,y_min,x_max,y_max=plot_detections(frame, detections[0])
        
        # 如果检测到至少一只手，则设置标志为True，准备进行关键点识别
        if len(detections[0])>0 :
            bHand=True
            
    # 如果已检测到手部，进行关键点识别
    if bHand:
        hand_nums=len(detections[0])
        if hand_nums>2:
            hand_nums=2
            
        # 对每只检测到的手进行关键点识别
        for i in range(hand_nums):
            
            print(x_min,y_min,x_max,y_max)
            # 确保边界框坐标在有效范围内
            xmin=max(0,x_min[i])
            ymin=max(0,y_min[i])
            xmax=min(frame.shape[1],x_max[i])
            ymax=min(frame.shape[0],y_max[i])
    
            # 提取手部区域
            roi_ori=frame[ymin:ymax, xmin:xmax]
            # 预处理手部区域图像，为关键点检测模型准备输入
            roi =preprocess_image_for_tflite32(roi_ori,224)
               
            # 设置输入数据
            result = fast_interpreter1.set_input_tensor(0, roi.data)
            if result != 0:
                print("hand_landmark model interpreter set_input_tensor() failed")
                
            # 执行手部关键点检测模型推理
            result = fast_interpreter1.invoke()
            if result != 0:
                print("hand_landmark model interpreter invoke() failed")
                
            # 获取输出数据
            mesh = fast_interpreter1.get_output_tensor(0)
            if mesh is None:
                print("sample : hand_landmark model interpreter->get_output_tensor(0) failed !")

            # 重置手检测标志，准备下一帧的检测
            bHand=False
            
            # 处理关键点数据并在图像上绘制
            mesh = mesh.reshape(21, 3)/224
            cx, cy = calc_palm_moment(roi_ori, mesh)
            draw_landmarks(roi_ori,cx,cy,mesh)
            frame[ymin:ymax, xmin:xmax]=roi_ori
    
    # 显示处理后的图像
    cv2.imshow("", frame)