從CVPR2017 看多樣目標檢測

02-09

作者： BookThief
原文鏈接：https://www.jianshu.com/p/78f614799cf2
查看更多的專業文章，請移步至「人工智慧LeadAI」公眾號，查看更多的課程信息和產品信息，請移步至全新打造呈現的官網：www.leadai.org.

When you have trouble with object detection, keep calm and use deep learning.

這句話是作者自己抖機靈的話，如果說 deep learning 已經攻陷計算機視覺這個領域的話，Object Detection可以說是受災最嚴重的區域了。不管是基於region proposal的RCNN系列，還是 end-to-end 的YOLO系列，基於深度學習的方法已經完勝手工特徵方法。

Object Detection 這塊眾多博士科研工作者和大批公司關注的「硬骨頭」是否已經黔驢技窮，無從下手？當然不是，而且，從近幾年的趨勢來看，如果你想「一文驚人」，抑或得到Best Paper Award，Object Detection是最佳方向（從RestNet 到 DenseNet 再到前幾天ICCV kaimingHe雙best paper）。

同時，近兩年一個更明顯的特點，最吸引人的Object Detection方法都是：

Simple Clean But Effective

這些方法都是基於網路結構很簡單的思想，基本結構都是：

Skip Connection (RestNet, DenseNet)
Joint Multi Feature Map (R-FCN, FPN)

關於目標檢測的發展（傳統方法-RCNN系列-YOLO系列）以及目標檢測的一些經常使用的術語（IOU, NMS, BBOX回歸， MAP）可以見我另一篇博客（https://www.jianshu.com/p/e6496a764b51）。

2、從CVPR2016看Object Detection發展

a、檢測精度（「又准」）

檢測精度是目標檢測任務最初始也是最重要的指標，如何提高方法檢測精度指標MAP，是各種方法比較的最基本的指標。這也是深度方法完勝手工方法的地方。

CVPR2016代表性工作有：ResNet, ION, HyperNet.

b、檢測效率（「又快」）

網路的時間開銷，如何提高檢測速度，實現又快又好地檢測。

YOLO：這個工作在識別效率方面優勢十分明顯。

c、定位精度（「又好」）

如何產生更準確的Boundbox？如何逐步提高評價參數IOU（voc數據集，這個值為0.5）？

代表工作LocNet：拋棄Boundbox回歸，使用概率模型。

總結：總最初始最基本的檢測指標檢測精度MAP，到如何減少時間開銷，再到一個更準確的bbox。側面反映了目標檢測的不斷發展: 又准(檢測精度)又快(檢測效率)又好(定位精度)

3、從CVPR2017看多樣的目標檢測

從CVPR2017 論文list看，新的目標檢測論文已經不再拘泥於ImageNet，VOC，CoCo數據集了，也不再拘泥於前面的檢測精度，檢測效率，以及定位精度了（當然這方面也有很多文章）。大家的目光開始轉向一個特定環境特點條件下特定目標的檢測（最大的特點是有很多這些特點目標的數據集文章出現）。目標檢測呈現出百花齊放的景象。

1、object action detection

Action Detection

特定行為特定動作的檢測，「一個人在刷牙」不是檢測出「人」和「牙刷」，而是「刷牙「這個動作。

CVPR2017相關文章：

Temporal Convolutional Networks for Action Segmentation and Detection ;
Predictive-Corrective Networks for Action Detection;
SCC: Semantic Context Cascade for Efficient Action Detection ;
UntrimmedNets for Weakly Supervised Action Recognition and Detection

2、video object detection

基於視頻的目標檢測，傳統目標檢測都是基於靜態圖片的，基於視頻的目標檢測有很多不同點，大部分都是和跟蹤演算法想結合的。

CVPR2017相關文章：

Object Detection in Videos With Tubelet Proposal Networks ;
YouTube-BoundingBoxes: A Large High-Precision Data Set for Object Detection in Video；
Spatio-Temporal Self-Organizing Map Deep Network for Dynamic Object Detection From Videos;

3、3D object detection

3D Object Detection

自然環境下的目標檢測如何轉換到3D空間下。

CVPR2017相關文章：

Visual-Inertial-Semantic Scene Representation for 3D Object Detection;
Multi-View 3D Object Detection Network for Autonomous Driving;
Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes From 2D Ones in RGB-Depth Images

4、text detection

TEXT Detection

圖像里文字的檢測和識別。

CVPR2017相關文章：

Deep Matching Prior Network: Toward Tighter Multi-Oriented Text Detection;
End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering

5、rail detection

Rain Detection

檢測出圖像中的雨，並且去除得到去雨的照片。（記得之前有一篇去霧的）

CVPR2017相關文章：

Deep Joint Rain Detection and Removal From a Single Image

6、line detection

Line Detection

圖片裡邊緣線段的檢測。

CVPR2017相關文章：

MCMLSD: A Dynamic Programming Approach to Line Segment Detection

7、pedestrain detection

Pedestrian Detection

行人檢測一直是一個重要的topic，當然也少不了。

CVPR2017相關文章：

What Can Help Pedestrian Detection?;
CityPersons: A Diverse Dataset for Pedestrian Detection;
Learning Cross-Modal Deep Representations for Robust Pedestrian Detection

8、moving object detection

Moving Object Detection

移動物體檢測不同於基於視頻的目標檢測，移動物體一般都是移動速度很快的物體（汽車，摩托，飛機，動物等等）。

CVPR2017相關文章：

Minimum Delay Moving Object Detection

9、facial landmark detection

Facial Landmark Detection

人臉關鍵點檢測也是一個一直很火熱的話題。

CVPR2017相關文章：

A Dee Regression Architecture With Two-Stage Re-Initialization for High Performance Facial Landmark Detection;
Simultaneous Facial Landmark Detection, Pose and Deformation Estimation Under Facial Occlusion;
Interspecies Knowledge Transfer for Facial Keypoint Detection

10、small object detection