動態放大網路在大圖像中的目標檢測

08-17

動態放大網路在大圖像中的目標檢測

來自專欄深度學習論文詳讀1 人贊了文章

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Gao, Mingfei，Yu, Ruichi，Li, Ang，Morariu, Vlad I.，Davis, Larry S

本論文收錄於CVPR2018,文章地址：Dynamic Zoom-in Network for Fast Object Detection in Large Images

文章主要內容：

主要解決的問題：

問題：標準數據集解析度較小，下採樣導致小物體檢測不到。識別小物體需要花費大量的計算資源；

為了增加對小物體識別的準確率，YOLO，SSD等多種演算法採用多尺度特徵融合的演算法，會花費大量的計算資源。本文就是解決大圖像中小目標識別問題

主要思路：

採用強化學習的思路，不斷放大感興趣區域進行目標的識別

coarse-to-fine strategy

主要框架：

R-net:使用的是Faster-RCNN,做粗糙物體檢測

Q-net：強化學習網路，做精細的物體檢測

網路框架：

網路的整體框架分為兩個部分：目標檢測網路R-net和強化學習檢測網路Q-net

網路框架的詳細圖：

網路詳細框架

下面詳細解讀網路的具體細節：

R-net:

R-net

Given a down-sampled image as input, the R-net generates an initial accuracy gain (AG) map indicating the potential zoom-in accuracy gain of different regions (initial state).

R-net採用的是Fast-RCNN作為檢測器，對低解析度圖像（1/2原圖像）進行目標檢測。檢測結果用於AG map的生成。

CR layer（Correlation Regression (CR) layer ）：

Estimate the zoom-in accuracy gain of proposal k：估計proposal需要放大的概率值

結構：兩個全連接神經網路，第一層：4096個神經元，第二層:一個輸出單元，輸出值用於AG map的生成

目標函數：

論文中使用了兩個檢測器，分別使用高解析度圖像和低解析度圖像作為輸入，分別得到 $p_{k}^{l}$ , $p_{k}^{h}$ ,目標函數的物理意義就是：在低分辨圖像中檢測到大物體，能夠達到一定的精度並不需要再次放大進行檢測，網路的輸出值較小；低解析度圖像中檢測不到的小物體，網路的輸出值比較高

AG map(AccuracyGain map):

AG map 的生成公式：

$d_{k}^{l}$ :是proposal

分子是網路輸出值，分母是proposal中像素的個數， $alpha$ 是固定值

通過輸入proposal經過CR Layer 的到網路輸出，根據網路輸出生成AG map

AG map refinement：

在AG map 的基礎上，作者進行了一個簡單的微調，微調公式：

AG map微調參數

微調參數

相對於上下左右移動 $mu$ 個單位，取結果最好的位置。 $mu=1/4(W or H)$

精調結果

經過精調之後，可以明顯的減少截斷現象的發生

Q-net:

Q-net

將生成的AG map作為輸入，找一個最可能存在物體的位置，去原圖像中取相應的高解析度圖像作為網路下一次的輸入，進行迭代直到到底指定結束條件。

細節:The notation 128×15×20:(7,10) means 128 convolution kernels with size 15×20, and stride of 7/10 in height/width;使用大卷積核和大步長進行卷積操作

Action:

(x, y, w, h) where (x, y) indicates the location, and (w, h) specifies the size of the region.動作就是選取圖像中對應的位置

Cost-aware reward function：

獎勵函數：前半部分用來衡量檢測的準確度，後半部分是用來平衡選取的尺寸

K：proposal k is included in the region selected by action a

啟發函數：

該函數的意義就是計算當前狀態的下一步選取最大獎勵值， $gamma$ 是衰減係數，取0.5

Q-net cost function:

Learn the Q function for candidate actions by minimizing the loss function at the i-th iteration。

物理意義：通過不斷的迭代，找出使獎勵最大的action，使網路能夠更有效的發現圖像中的小目標物體。

Result：

結果對比圖

論文將強化學習的思路引入到圖像小目標識別當中，其識別原理更加符合人眼的attention機制，非常值得去借鑒。

引用文獻：

1.Dynamic Zoom-in Network for Fast Object Detection in Large Images，Mingfei Gao, Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis，CVPR 2018

2.Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun,In Advances in neural information processing systems, pages 91–99, 2015.