CVPR2018_ZCC_Seeing Small Faces from Robust Anchor』s Perspective

07-12

來自專欄人臉檢測

CVPR2018的人臉檢測文章，基於frcnn、ssd的anchor機制，作者認為小人臉檢測不到的主要原因是因為anchor bbox與小人臉的iou過低，現有的anchor based detector處理scale invariance並不給力；作者提出了Expected Max Overlapping (EMO) score來解釋iou低的原因，並提出了anchor stride reduction with new network architectures, extra shifted anchors, stochastic face shifting等策略提升人臉檢測性能；

Anchor-based detectors quantize the continuous space of all possible face bounding boxes on the image plane into the discrete space of a set of pre-defined anchor boxes that serve as references.這句話的意思是：face size在圖像上其實是連續的，但預定義的anchor size卻是離散的；

During training, each face is matched with one or several close anchors. These faces are trained to output high confidence scores and then regress to ground-truth boxes. 訓練階段，一個gt face bbox匹配若干個anchor；

During inference, faces in a testing image are detected by classifying and regressing anchors.測試階段，根據anchor來分類和回歸face

作者認為現有anchor方案存在的問題：

after classifying and adjusting anchor boxes, the new boxes with high confidence scores are still not highly overlapped with enough small faces. 檢測階段，根據anchor調整的bbox與小人臉匹配度不夠

分析方案：

For each face we compute its highest IoU with overlapped anchors. Then faces are divided into several scale groups. Within each scale group we compute the averaged highest IoU score. 根據匹配的人臉尺度做group劃分，再然後計算平均iou score.

find that average IoUs across face scales are positively correlated with the recall rates. We think anchor boxes with low IoU overlaps with small faces are harder to be adjusted to the ground-truth, resulting in low recall of small faces.

fig 1(b)中，人臉越小，iou score越低；人臉越小，anchor本身與gt bbox iou就很低，再加上pred bbox與gt調整的難度越大，就導致了對小人臉的低召回率；

fig1(b)的意思，計算每個gt face bbox與匹配anchor的max iou score，再根據bbox的尺度做group，計算每個group內的iou score的均值，結論就是人臉越小，iou score越低；

fig1(c)就是作者提出的EMO方案，可以讓anchor對face bbox有更高的iou；

EMO操作簡介：

given a face of known size and a set of anchors, we compute the expected max IoU of the face with the anchors, assuming the face』s location is drawn from a 2D distribution of on the image plane. 假設人臉位置分布在圖像上服從二維分布，因為人臉可以出現在圖像的任何位置，可以理解是二維圖像上的均勻分布吧；

The EMO score theoretically explains why larger faces are easier to be highly overlapped by anchors and that densely distributed anchors are more likely to cover faces. EMO score可解釋兩點：1 face bbox越大，matched anchor的iou score越大；2 anchor分布越密集，越容易match face

基於EMO score，我們提出了若干個方案：

1 reduce the anchor stride with various network architecture designs；減少anchor stride，這點跟S3FD有一點類似；

2 add anchors shifted away from the canonical center so that the anchor distribution becomes denser；在原有生成的anchor位置做一個抖動漂移，可以讓anchor分布更加密集；

3 stochastically shift the faces in order to increase the chance of getting higher IoU overlaps；對gt face bbox隨機的shift來提升iou score？

4 match low-overlapped faces with multiple anchors；通過多重anchor來提升低匹配face bbox的iou；

作者總結的論文創新點：

1 Provide an in-depth analysis of the anchor matching mechanism under different conditions with the newly proposed Expected Max Overlap (EMO) score to theoretically characterize anchors』 ability of achieving high face IoU scores；通過EMO score分析來提升gt bbox與anchor的iou score；

2 Propose several effective techniques of new anchor design for higher IoU scores, especially for tiny faces including anchor stride reduction with new network architectures, extra shifted anchors,

and stochastic face shifting.基於1，提出了三個方案來提升anchor與face gt的iou；

3 Expected Max Overlapping Scores

EMO score：characterize anchors』 ability of achieving high face IoU scores. derive the EMO by computing the expected max IoU between a face and anchors w.r.t. the distribution of face』s location. 依照人臉位置分布，計算face bbox與anchor的最佳匹配；

(a) anchor的分布特性，就是frcnn的一套；(b) anchor的匹配策略，有點反直覺，從直覺上看應該是這個位置的更小的anchor匹配gt face bbox；但實際匹配的是更大的anchor；(c) 依照EMO score的匹配方案，可以匹配更貼近的anchor，比單獨基於iou的方案更好；

第三節分析了人臉檢測中iou的匹配和計算方式，介紹了EMO的計算方式，為第四節提出的各種改進方案提供了理論上的指導；

3.1 Overview of AnchorBased Detector

本小節回顧了two stage detector中anchor匹配的規則；

Anchors are associated with certain feature maps which determine the location and stride of anchors. frcnn的話，就只對應一層的feature map了，ssd對應多層；

c x h x w be interpreted as c-dimensional representations corresponding to h x w sliding-window locations distributed regularly on the image. The distance between adjacent locations is the feature stride sF and decided by H/h = W/w = sF .

Anchors take those locations as their centers and use the corresponded representations to compute confidence scores and bounding box regression. So, the anchor stride is equivalent to the feature stride, i.e. sA = sF .解釋了anchor和stride的對應關係，在原始版本的frcnn中，anchor stride和org image size/feature map size相等；

3.2 Anchor Setup and Matching

介紹了anchor的匹配方法，結合fig 2a、2b，比較常規；看看即可；

3.3 Computing the EMO Score

介紹了emo score的定義：

結合fig3介紹了emo score的計算方法：

結論就是：

1 在傳統的anchor設置方案中，人臉越大，積分里後面一大坨也會對應大，所以emo高；

2 對應給定的人臉，anchor stride越小，相當於anchor採樣越密集，那麼emo越高；

fig4說明，stride越小，emo score越高；

4. Strategies of New Anchor Design

We aim at improving the average IoU especially for tiny faces from the view of theoretically improving EMO score, since average IoU scores are correlated with face recall rate. 作者認為EMO score、IoU scores、 face recall rate三者正相關；

因此想法就是：increase the average IoU by reducing anchor stride as well as reducing the

distance between the face center and the anchor center.

分四步來提高小人臉與anchor的匹配度：

1 提出了三種新的網路架構：new network architectures to change the stride of feature map associated with anchors.

2 重新定義了anchor的位置：we redefined the anchor locations such that the anchor stride can be further reduced

3 人臉位置漂移：face shift jittering method

4 在訓練階段對小人臉加強與anchor的匹配：a compensation strategy for training which matches very tiny faces to multiple anchors.

4.1. Stride Reduction with Enlarged Feature Maps

one way to increase the EMO scores is to reduce the anchor stride by enlarging the feature map

三種方案在保持feature map的前提下，減少stride，其實都是很普遍的方案；

a 使用反卷積；就是先upscale，再用反卷積filter，反卷積filter初始化由雙線性上採樣參數初始化，再在訓練中學習該參數；

b 增加了一個跨層鏈接，類似fpn；1 x 1卷積使得融合channel一致；再接一個3 x 3卷積；

c 蟲洞卷積，也是常規做法；作者認為c方案最好，因為沒有使用新的參數增加計算量；

we take out the stride-2 operation (either pooling or convolution) right after the shallower large

map and dilate the filters in all the following convolution layers. Note that 1 x 1 filters are not required to be dilated. In addition to not having any additional parameters, dilated convolution also preserve the size of receptive fields of the filters.

4.2. Extra Shifted Anchors

比較簡單，就是增加anchor的採樣密度，使得Sa更小；

4.1方案中，Sa還是等於Sf的，4.2就可以保證Sa<Sf。

These shifted anchors share the same feature representation with the anchors

in the centers.作者將新增的密集採樣的anchor特徵，設置為與中心anchor的特徵保持一致；簡而言之，就是復用中心anchor的特徵；

only need to add small shifted anchors since large anchors already guarantee high average IoUs, 僅僅在小人臉上使用這個策略，也即在conv3_3這種low feature map上用用就行；

fig 7說明了，使用4.1+4.2小節的方法，確實是有效果；

4.3. Face Shift Jittering

In order to increase the probability for those faces to get high IoU overlap with anchors,

they are randomly shifted in each iteration during training.在訓練時，對整個圖像做一個隨機的抖動，這樣可以增加每個face與anchor匹配的iou；----因為人臉在圖像中的位置是固定的，如果增加了抖動，那麼每輪訓練，anchor和人臉的位置，就會因為一個隨機抖動，就不那麼固定了；

操作流程：其實很簡單，每次抖動的step在0~Sa/2中隨機採樣；

4.4. Hard Face Compensation

It is because face scales and locations are continuous whereas anchor scales and locations

are discrete. Therefore, there are still some faces whose scales or locations are far away from the anchor.因為真實的人臉位置和尺度連續，但anchor的位置和尺度離散，就導致了還是有些小人臉比較難以被匹配；

以上就是作者提出的anchor補償策略：

1 熟悉的配方，使用Th選擇正anchor；

2 如果一個face bbox，與之匹配的maxi iou score的anchor，其score還是小於Th，那麼這個人臉就是一個難以搞定的人臉(hard face)，然後我們將與之匹配的anchor按score降序排序，選擇Top N anchors作為正anchor；

5實驗

總結：作者認為anchor和face gt bbox的低iou匹配是造成小人臉檢測精度低的原因；因此基於emo score，作者提出了一系列改進小人臉iou低匹配度的方法；

This work identified low face-anchor overlap as the major reason hindering anchor-based detectors to detect tiny faces. We proposed the new EMO score to characterize anchors』 capability of getting high overlaps with faces, providing an in-depth analysis of the anchor matching mechanism.

This inspired us to come up with several simple but effective strategies of a new anchor design for higher face IoU scores.

sumplements裡面介紹了下蟲洞卷積的思想，講的蠻好，分析為為什麼蟲洞卷積比單純地使用上採樣、上採樣+跨層連接效果好；

論文參考

1 CVPR2018_ZCC_Seeing Small Faces from Robust Anchor』s Perspective