双流3D视频动作识别下载

PIPI_333 2018-12-19 09:09:53

3-D convolutional neural networks (3-D-convNets)
have been very recently proposed for action recognition in
videos, and promising results are achieved. However, existing 3-
D-convNets has two “artificial” requirements that may reduce the
quality of video analysis: 1) It requires a fixed-sized (e.g., 112×112)
input video; and 2)most of the 3-D-convNets require a fixed-length
input (i.e., video shots with fixed number of frames). To tackle
these issues, we propose an end-to-end pipeline named Two-stream
3-D-convNet Fusion, which can recognize human actions in videos
of arbitrary size and length using multiple features. Specifically,
we decompose a video into spatial and temporal shots. By taking
a sequence of shots as input, each stream is implemented using
a spatial temporal pyramid pooling (STPP) convNet with a long
short-term memory (LSTM) or CNN-E model, softmax scores of
which are combined by a late fusion.We devise the STPP convNet to
extract equal-dimensional descriptions for each variable-size shot,
andwe adopt theLSTM/CNN-Emodel to learn a global description
for the input video using these time-varying descriptions. With
these advantages, our method should improve all 3-D CNN-based
video analysis methods. We empirically evaluate our method for
action recognition in videos and the experimental results show that
our method outperforms the state-of-the-art methods (both 2-D
and 3-D based) on three standard benchmark datasets (UCF101,
HMDB51 and ACT datasets).
相关下载链接：//download.csdn.net/download/u011049137/10859509?utm_source=bbsseo

...全文

173 回复打赏收藏转发到动态举报

写回复

回复

切换为时间正序

请发表友善的回复…

发表回复

本文详解UCF-101数据集的分卷下载、结构解析与标准化解压方法，阐明其按‘组’划分的评估协议对动作识别实验的重要性；系统梳理双流网络、3D CNN、I3D及TimeSformer等核心技术演进路径；涵盖预处理一致性、官方分割使用、内存优化、迁移学习等关键实践要点，聚焦计算机视觉中视频动作识别的完整技术链。

本文深入解析视频动作识别中的双流卷积网络与3D卷积网络，比较两者在时空特征提取上的优劣。双流模型分离空间与时间信息，适合小规模数据；3D卷积实现端到端时空融合，在大数据下表现更佳。结合TensorLayer框架的应用实例，涵盖智能监控与体育分析等场景，为初学者提供入门指导。

本文系统介绍Kinetics-I3D模型——基于3D卷积的视频动作识别标杆架构。涵盖其核心创新：3D卷积核膨胀、双流（RGB+光流）融合、ImageNet预训练与Kinetics大数据集训练。详解Inception-I3D网络结构、数据预处理（25fps采样、224×224中心裁剪、归一化）、训练配置优化及自定义微调方法，并分析其在体育、医疗、安防等场景的应用价值。

I3D模型通过三维卷积与双流架构实现视频动作识别，结合RGB帧与光流信息，有效理解时空动态。该技术广泛应用于智能安防与体育分析，支持预训练权重迁移和高效数据处理，推动AI对动态视频的理解迈向实时化、智能化。

本文介绍了NTU RGB+D动作识别数据集，包含RGB视频、深度图、3D骨架数据和红外视频。数据集由3个Microsoft Kinect v.2相机捕获，总计56,880个动作样本。3D骨架数据通过骨骼追踪技术获取，用于双流卷积网络的动作识别算法。此外，还提及了Cross-View命名规则和数据的可视化展示。 106063418,5826262,Linux服务器编程：TCP/IP通信与Squid代理实践,['Linux', '网络编程']

下载资源悬赏专区

13,654

社区成员

12,572,364

社区内容

发帖

与我相关

我的任务

其他技术论坛（原bbs）

社区管理员

加入社区

近7日
近30日
至今

加载中

查看更多榜单

社区公告

暂无公告

试试用AI创作助手写篇文章吧

+ 用AI写文章