视频中的物体检测问题由于其在高级AI系统如自动驾驶、家居机器人中的潜在用途近年来获得了广泛的关注。相较于传统静止图片的物体检测问题现有的视频物体检测方法通常基于时空“管道”(tubelets)即跨时间连接的检测框来有效的应用视频中的时域信息。但是现有方法中时空管道生成的质量和效率往往差强人意基于运动信息的生成方法只能生成较短的时空管道而基于图像信息的生成方法需要花费大量的计算量也不能保证对于物体较高的召回率。该论文[5]将传统针对静止图像的FasterRCNN框架进行了扩展将视频物体检测框架扩展为“候选时空管道生成”和“候选时空管道识别”两个模块提出了一种高效率的候选时空管道生成方法能够在保证时空管道较长长度的同时尽可能的保留不同时空管道的多样性从而提高物体的召回率。基于这些高质量的候选时空管道应用编码-解码LSTM网络进行时空管道的识别能够有效的提升检测整体的正确率。作者还对TubeletProposal Network初始化和不同设置进行了详尽分析基于TPN的物体检测平均正确率相较于静止图像检测框架有>5%的提升。 该论文作者在2015和2016连续两年取得ImageNet视频物体检测项目第一。该论文是他们在2016年ImageNet竞赛第一工作上进行扩展后提出的全新视频物体检测框架相较于现有算法对视频中物体检测的效率和准确率进一步提升。 论文标题:Object Detection in Videos with Tubelet Proposal Networks 论文作者:Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, XihuiLiu, Xiaogang Wang Multi-Context Attention:单人体姿态识别数据集MPII准确率第一 人体姿态估计旨在检测出图像或视频中人体各关键点的位置具有很大的应用价值如体感游戏人机交互机器人虚拟现实设备动作捕捉机器视觉等。然而因为人体姿态非常丰富图像视频背景冗杂人体遮挡等情况常有发生所以人体姿态估计问题极具挑战性。要解决上述问题需要充分理解图像的上下文信息传统方法通常使用多个不同尺度的图像块来对多尺度信息建模这样获得的多尺度信息往往缺少灵活性和多样性。 人脑视觉注意力机制是人脑高效理解自然场景的有效机制。通过将注意力集中到核心区域人脑能有效排除与任务无关的其他干扰区域并着重分析与任务相关的关键区域。 此论文[1]提出的多情境注意力机制网络multi-context attention network首次将注意力机制模型与人体姿态估计任务有效结合通过设计三种不同的注意力机制模型——多尺度注意力机制multi-resolutionattention多语义注意力机制multi-semantics attention人体全局-局部注意力机制hierarchicalglobal-part attention——来学习图像上下文信息能够有效去除人体姿态估计任务里的冗余背景提高对易混淆人体部位分辨能力从而人体关键部位的检测精度。在最广泛使用的单人体姿态识别数据集MPII上该方法准确度在已发表的工作中最高。该方法目前已经开源代码可在如下网站获得https://github.com/bearpaw/pose-attention 论文标题:Multi-Context Attention for Human PoseEstimation 论文作者:Xiao Chu, Wei Yang, Wanli Ouyang, ChengMa, Alan L. Yuille, Xiaogang Wang 附录 商汤科技及香港中大-商汤科技联合实验室共有23篇论文被接收,附上CVPR2017上Session时间 Multi-Context Attention for Human Pose Estimation - Saturday, July 22, 2017,09:00–10:30 Multi-Scale Continuous CRFs as Sequential Deep Networksfor Monocular Depth Estimation - Saturday, July 22, 2017,09:00–10:30 Accurate Single Stage Detector Using Recurrent RollingConvolution - Saturday, July 22, 2017,10:30–12:30 Mimicking Very Efficient Network for Object Detection -Saturday, July 22, 2017,10:30–12:30 Object Detection in Videos with Tubelet Proposal Networks- Saturday, July 22, 2017,10:30–12:30 Spindle Net: Person Re-identification with Human BodyRegion Guided Feature Decomposition and Fusion.- Saturday, July 22, 2017,10:30–12:30 Discover and Learn New Objects from Documentaries - Saturday, July 22, 2017,13:30–15:00 Learning object interactions and deions forSemantic Image Segmentation - Saturday, July 22, 2017,13:30–15:00 Learning Spatial Regularization with Image-levelSupervisions for Multi-label Image Classification Saturday, July 22, 2017- 15:00–17:00 Scale-Aware Face Detection - Saturday, July 22, 2017,15:00–17:00 Interpretable Structure-Evolving LSTM - Sunday, July 23,2017,08:30–10:00 Detecting Visual Relationships with Deep RelationalNetworks - Sunday, July 23, 2017,13:00–14:30 Joint Detection and Identification Feature Learning forPerson Search - Sunday, July 23, 2017,13:00–14:30 Learning Cross-Modal Deep Representations for RobustPedestrian Detection - Sunday, July 23, 2017,14:30–16:30 PolyNet: A Pursuit of Structural Diversity in Very DeepNetworks - Sunday, July 23, 2017,14:30–16:30 Pyramid Scene Parsing Network - Sunday, July 23, 2017,14:30–16:30 Person Search with Natural Language Deion - Monday,July 24, 2017,10:00–12:00 Quality Aware Network for Set to Set Recognition - Monday, July 24, 10:00–12:00 UntrimmedNets for Weakly Supervised Action Recognitionand Detection - Tuesday, July 25, 2017,10:00–12:00 Not All Pixels Are Equal: Difficulty-Aware SemanticSegmentation via Deep Layer Cascade - Tuesday, July 25, 2017,13:00–14:30 - Tuesday, July 25, 13:00–14:30 ViP-CNN: A Visual Phrase Reasoning Convolutional NeuralNetwork for Visual Relationship Detection -Tuesday, July 25, 2017,1430–1630 (责任编辑:本港台直播) |