caffe中的deconvolution和upsample的區別？

01-23

網上有人說deconvolution就是upsample?求指教

Deconvolution：

在各深度平台中，都作為Transpose Convolution功能實現。形象理解過程如下：

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Input pixel * filter = output window，不同output window重合的部分使用sum疊加處理

這一解釋和caffe的定義保持一致，caffe中定義解釋過來就是：「DeconvolutionLayer 逐像素地將輸入值乘上一個filter，並將結果輸出windows疊加起來」

http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1DeconvolutionLayer.html

Convolve the input with a bank of learned filters, and (optionally) add biases, treating filters and convolution parameters in the opposite sense as ConvolutionLayer.
ConvolutionLayer computes each output value by dotting an input window with a filter; DeconvolutionLayer multiplies each input value by a filter elementwise, and sums over the resulting output windows. In other words, DeconvolutionLayer is ConvolutionLayer with the forward and backward passes reversed. DeconvolutionLayer reuses ConvolutionParameter for its parameters, but they take the opposite sense as in ConvolutionLayer (so padding is removed from the output rather than added to the input, and stride results in upsampling rather than downsampling).

Upsample：

該層代碼也是通過ConvTranspose實現，需要注意的是它的權重初始化和學習率：

1、該層權重通過BilinearFiller初始化，因此當學習率為0時，權重在訓練過程中保持初始值不變，一一直作為bilinear resize的作用。

Mxnet中，bilinear filter Initializer實現代碼

class Bilinear(Initializer) """Initialize weight for upsampling layers.""" def __init__(self): super(Bilinear, self).__init__() def _init_weight(self, _, arr): weight = np.zeros(np.prod(arr.shape), dtype="float32") shape = arr.shape f = np.ceil(shape[3] / 2.) c = (2 * f - 1 - f % 2) / (2. * f) for i in range(np.prod(shape)): x = i % shape[3] y = (i / shape[3]) % shape[2] weight[i] = (1 - abs(x / f - c)) * (1 - abs(y / f - c)) arr[:] = weight.reshape(shape)

weight[i] = (1 - abs(x / f - c)) * (1 - abs(y / f - c))具體的公式來源我還沒有找到，不過猜測是從以下公式來的（神似的相乘）

這個公式的推導過程詳見：雙線性插值（Bilinear interpolation）

我嘗試著令"arr = np.zeros((1, 1, 3, 3))"，得到的arr是這個樣子的：

[[[[ 0.0625 0.1875 0.1875]

[ 0.1875 0.5625 0.5625]

[ 0.1875 0.5625 0.5625]]]]

2、而當該層學習率不為0時，權重不再固定為BilinearFiller初始化，隨著網路的訓練，其功能等價於Deconvolution，也就是ConvTranspose

推薦個進階一點的論文 [1707.05847v1] The Devil is in the Decoder

比較詳細的講解了deconv, bilinear 等upsampling方法的區別和聯繫，不妨一讀。

再多說幾句

上採樣就是把[W,H]大小的feature map F_{W,H}擴大為[nW,nH]尺寸大小的hat{F}_{nW,nH}，其中n為上採樣倍數。那麼可以很容易的想到我們可以在擴大的feature map hat{F}上每隔n個位置填補原F中對應位置的值。但是剩餘的那些位置怎麼辦呢？

deconv操作是把剩餘位置填0，然後這個大feature map過一個conv。擴大+填0+conv = deconv操作。

插值上採樣類似，擴大+插值=插值上採樣操作。

還有一個unpooling操作，如果是max unpooling，那麼在接受[W,H]大小的feature map之外還需要接收一個pooling的index，表示F[w,h]在hat{F}中的對應位置。一般max unpooling需要和max pooling對應。 max pooling+max unpooling等價於在F上篩一遍，只保留pooling window中max位置的值。

這是文檔

A common use case is with the DeconvolutionLayer acting as upsampling. You can upsample a feature map with shape of (B, C, H, W) by any integer factor using the following proto.

layer { name: "upsample", type: "Deconvolution" bottom: "{{bottom_name}}" top: "{{top_name}}" convolution_param { kernel_size: {{2 * factor - factor % 2}} stride: {{factor}} num_output: {{C}} group: {{C}} pad: {{ceil((factor - 1) / 2.)}} weight_filler: { type: "bilinear" } bias_term: false } param { lr_mult: 0 decay_mult: 0 } }

實際使用過程中，會把deconv層的卷積核設置成為雙線性插值，學習率設置成為0。因為很多論文表明，學習率變化與否，對於性能沒有差距。

簡單說兩點

1. caffe的deconvolution又被叫做transposed convolution，例如在TensorFlow中。

查資料的時候可以多用個關鍵字。

2. upsampling可以通過特定kernel的deconvolution 來實現，順便把learning rate 設成0.

現在可以通過weight_filler去設。

老版是在Python介面下用net surgery來做的。

3、按照字面意思定義的upsampling，conv+maxpooling+upsampling必然會造成信息丟失，無法做到精確的點對點匹配。而deconvolution 則不一定

上採樣記得是有三種方式，其中一種是反卷積

Upsampling是上採樣的過程，caffe中實現的deconvolution是upsampling的一種方式，源碼來看的話，用的是bilinear

是不是一個完整的upsample包括

unpool

rectify（ReLU）

filter（Deconv）？

那麼我原來不是用的pool，而是用的conv縮小的圖片，此時就沒有max location了，那麼我該如何upsample？

今天看了一篇關於目標檢測的文章Feature Pyramid Networks for Object Detection，上面說到將深層特徵圖upsample和淺層特徵圖相結合，可以提高mAP。在SSD中特徵圖的縮小是通過conv實現的，那麼如何將深層特徵圖upsample呢？僅僅一個簡單的deconv嗎？

完全不是一個概念啊