Paper Reading導讀(三)

01-30

16.SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND <0.5MB MODEL SIZE

這篇文章提出了一種新的網路結構，叫做SQUEEZENET，性能是AlexNet-level accuracy，但參數量是其的1/50。經過壓縮之後，模型大小僅為0.5MB(510× smaller than AlexNet).

模型壓縮的手段有很多，比如SVD，Network Pruning，quantization and huffman encoding等等，文章提出了網路結構後，還對Network design space進行了exploration。

網路的提出主要有以下三個策略：

1.Replace 3x3 filters with 1x1 filters.（for 9X fewer params.）

2.Decrease the number of input channels to 3x3 filters. (for fewer params.)

3.Downsample late in the network so that convolution layers have large activation maps. (Our intuition is that large activation maps (due to delayed downsampling) can lead to higher classification accuracy, with all else held equal.)

基於這些strategy , paper 先後闡述了module 和 full network architecture.

module

In a Fire module, s1x1 is the number of filters in the squeeze layer (all 1x1), e1x1 is the number of 1x1 filters in the expand layer, and e3x3 is the number of 3x3 filters in the expand layer. When we use Fire modules we set s1x1 to be less than (e1x1 + e3x3),so the squeeze layer helps to limit the number of input channels to the 3x3 filters.

architecture

Evaluation

最後闡述了design space, 從starting point開始做搜索實驗，最終得出最優的architecture 。

17. Identity Mappings in Deep Residual Networks

Kaiming很早的工作，證明ResNet的有效性機理，並提出了1000層的殘差網路。

雖然1×1的卷積捷徑連接引入了更多的參數，本應該比恆等捷徑連接具有更加強大的表達能力。但是它的效果並不好，這表明了這些模型退化問題的原因是優化問題(1*1阻礙了Loss的反向傳播)，而不是表達能力的問題。

後面的實驗：

實際上只是激活函數（ReLU/BN）的位置有所不同，由於作者希望構建一個恆等的f(yl)=yl，將激活項分為了預激活（pre-activation）和後激活（post-activation）。實驗發現，將ReLU和BN都放在預激活中，即full pre-activation在ResNet-110和ResNet-164上的效果都最好。

作者認為這是由兩方面引起的：第一，由於f也是恆等映射，優化變得更加簡單(與原始ResNet相比)。第二，在預激活中使用BN能夠提高模型的正則化，從而減少了過擬合的影響。

18.Data-Driven Sparse Structure Selection for Deep Neural Networks

Tusimple今年七月的工作，模型壓縮。

提出了對於模型壓縮一種新的框架，data-driven and end-to-end, utilize a modified stochastic Accelerated Proximal Gradient (APG) method to jointly optimize the weights of CNNs and scaling factors with sparsity regularizations.

利用joint參數做稀疏結構選擇，增加稀疏正則化，選用最常用的L1範數，使其變為聯合稀疏正則化優化問題。在ResNet和ResNext上做了實驗，稀疏結構選擇（SSS）來自適應地學習CNN的結構。引入縮放因子來縮小特定結構的輸出，通過將縮放因子變為0，我們的方法可以去除與零縮放因子相對應的結構，並修改了加速近似梯度法。

19.Vision-and-Language Navigation:Interpreting visually-grounded navigation instructions in real environments

這篇文章針對Vision-and-Language Navigation問題，並且提出了the Matternport3D Simulator, 一個基於真實圖片的大規模的強化學習環境，使用這個模擬器，提出了第一個for visually-grounded natural language navigation in real buildings 的benchmark數據集------Room-to-Room dataset.並且使用 Sequence-to-Sequence Model 實現了 baseline.

20.End-to-end Learning of Driving Models from Large-scale Video Datasets

這篇文章提出了一個FCN-LSTM架構，從大規模眾包駕駛數據中學習模型，加入image segmentation的task來提升performance. 此外，還開源了數據集，是迄今為止無人駕駛領域最大的數據集。

We formulate the problem as learning a generic driving model/policy; our learned model

is generic in that it learns a predictive future motion path given the present agent state.

模型採用 raw pixels、當前和之前的vehicle state signals 作為輸入，預測最大可能的future motion.

Our model is able to jointly train motion prediction and pixel-level supervised tasks.