Face Attention Network: An Effective Face Detector for the Occluded Faces論文筆記

03-03

人臉檢測的性能近期由於深度學習的發展大幅度提高, 然而對於人臉的遮擋問題一直是人臉檢測中一個比較有挑戰的問題, 這種場景多出現於佩戴口罩、太陽鏡和其他人的遮擋等。

這篇文章提出了Face Attention Network (FAN), 有效提升了有遮擋人臉的召回。提出了a new anchor-level attention，增強人臉區域的特徵. 結合anchor assign strategy and data augmentation techniques，在WiderFace and MAFA上達到了state-of-the-art的效果。

Base Framework

U-shape的結構能夠很好地融合底層豐富的特徵信息和High-level的語義信息. 基礎結構借鑒RetinaNet(FPN + ResNet). RetinaNet包括兩個subnet, 一個用於分類, 另一個用於回歸。

分類subnet使用4個 3*3 conv layers (each with 256 filters), followed by a 3×3 convolution

layer with KA filters where K means the number of classes and A means the number of anchors per location.

For face detection K = 1 since we use sigmoid activation, and we use A = 6 in most experiments.

回歸subnet terminates 4A conv filters with liner activation.

Attention Network

Anchor Assign Strategy
Attention Function
Data Augmentation

Anchor Assign Strategy

在FAN中,共有5個detector layers，每一個都有特定的scale anchor. 另外, anchor的長寬比都是1和1.5，因為大多數的人臉都接近1:1.5的長寬比. 論文統計了WiderFace人臉的像素大小佔比, 用於調整anchors的大小。

Attention Function

為了解決遮擋的問題, 提出了novel anchor-level attention.

The attention supervision information is obtained by filling the ground-truth box.

可以近似為加了一個segment的branch.

Data Augmentation

提出了隨機crop策略, 來模擬訓練數據中的遮擋.Besides from the random crop dataset

augmentation, we also employ augmentation from random flip and color jitter.

Loss function分為三部分：

多pyramid level的分類，回歸和mask的pixel-wise sigmoid cross entropy.