經典模型-2：手把手帶你分析、解讀ResNet源代碼

02-09

ResNet開源代碼分析

目標

本篇文章的目標是：探索「殘差網路（ResNet）」開源代碼，搞清楚它是如何工作的。

準備工作

本文不是入門文章，建議讀者在閱讀之前，先做好以下準備：

1. 學習吳恩達在coursera的「深度學習課程」中關於殘差網路的內容

2. 讀該模型的原版論文：Deep Residual Learning for Image Recognition，如果閱讀有難度，可以參考網路上的翻譯稿，這裡有一篇筆者的翻譯稿供參考。

3. 註冊github，用於查看和下載殘差網路的開源源碼。註冊地址。

4. 複製源代碼到本地。源碼地址在此。

其他說明

筆者使用的操作系統是macOS Sierra-version 10.12.6
Python 版本是Python 2.7.13

先測試模型效果

測試代碼的話，先看看源碼的說明文檔。地址在這裡。

根據說明文檔的描述，該repo包含以下模型：

- VGG16

- VGG19

- ResNet50

- Inception v3

- CRNN for music tagging

我們只關心ResNet模型，也就是這裡的ResNet50。

繼續看說明文檔，有關於圖片分類的示例代碼，觀察該代碼，使用的是renet。這就是我們要測試的代碼：

from resnet50 import ResNet50from keras.preprocessing import imagefrom imagenet_utils import preprocess_input, decode_predictionsmodel = ResNet50(weights=imagenet)img_path = elephant.jpgimg = image.load_img(img_path, target_size=(224, 224))x = image.img_to_array(img)x = np.expand_dims(x, axis=0)x = preprocess_input(x)preds = model.predict(x)print(Predicted:, decode_predictions(preds))# print: [[un02504458, uAfrican_elephant]]

上述示例的第一句就是讀取resnet50中的ResNet50，所以我們創建resnet50.py文件，並複製ResNet50的代碼。

觀察resnet50.py的執行代碼與上述示例代碼一致，也就是說我們可以直接運行該文檔。

在存放resnet50.py的本地文檔打開Terminal，然後運行resnet50.py：

> python resnet50.py

可惜報錯了!!!

raceback (most recent call last): File "resnet50.py", line 289, in <module> model = ResNet50(include_top=True, weights=imagenet) File "resnet50.py", line 193, in ResNet50 include_top=include_top)TypeError: _obtain_input_shape() got an unexpected keyword argument include_top

沒關係，我們看一下什麼錯誤？

首先找到出錯的位置："resnet50.py"的193行。

input_shape = _obtain_input_shape(input_shape, default_size=224, min_size=197, data_format=K.image_data_format(), include_top=include_top)

這一行就是錯誤信息中提到的_obtain_input_shape()函數。錯誤信息的意思是該函數有一個錯誤的參數：include_top？

OK，我們來查一下_obtain_input_shape()應該有哪些參數？以下是該函數的定義。

def _obtain_input_shape(input_shape, default_size, min_size, data_format, require_flatten, weights=None):

這裡沒有include_top參數，多了一個require_flatten參數。我們試一下將include_top改為require_flatten，再運行一下。

一個好消息和一個壞消息，_obtain_input_shape()函數報錯消失了，但是又有一個新的報錯：

Traceback (most recent call last): File "resnet50.py", line 292, in <module> img = image.load_img(img_path, target_size=(224, 224)) File "/Users/freefrog/anaconda2/lib/python2.7/site-packages/keras/preprocessing/image.py", line 322, in load_img img = pil_image.open(path) File "/Users/freefrog/anaconda2/lib/python2.7/site-packages/PIL/Image.py", line 2410, in open fp = builtins.open(filename, "rb")IOError: [Errno 2] No such file or directory: elephant.jpg

仔細看報錯信息的最後一行，我們沒有添加命名為elephant.jpg的圖片，我們可以從網上下載一張圖片，並取名為elephant.jpg，放在resnet50.py同一個文件夾下，然後再次運行代碼：

如果你看到類似下面的返回值，那麼恭喜你，腳本運行成功了！！！

Input image shape: (1, 224, 224, 3)Predicted: [[(un02504458, uAfrican_elephant, 0.53912073), (un01871265, utusker, 0.26061574), (un02504013, uIndian_elephant, 0.13235191), (un02437312, uArabian_camel, 0.021120256), (un02109047, uGreat_Dane, 0.0058048805)]]

我們先大概看看返回的結果什麼意思？

- (un02504458, uAfrican_elephant, 0.53912073)的意思應該是預測非洲象的概率是0.53912073。

- (un01871265, utusker, 0.26061574)的意思是『有長牙的動物（tusker）』的概率是0.26061574。

再後面的返回值是印度象、阿拉伯駱駝、大丹犬的概率分別是多少。

從結果來看預測是非洲象的概率最大，預測成功！！！下面是我們使用的elephant.jpg圖片。

看看代碼結構

resnet50.py定義了三個函數：identity_block, conv_block和ResNet50。從名字上看，應該是恆等結構快、卷積結構塊和resnet模型。兩種結構塊是構成模型的基本單元，這一點通過示例代碼也可以發現（示例代碼只調用了ResNet50函數）。

為什麼要定義兩種結構塊？

在論文的3.3節Residual Network段落，作者提到，如果輸入和輸出層的維度一致，那麼可以使用恆等快捷通道（對應identity_block結構塊），但是，如果維度不一致的話，需要採取措施如投影快捷連接（對應代碼中的conv_block結構塊）。

既然identity_block, conv_block用於構建ResNet50，我們從函數的參數和返回值入手，弄清楚以下問題：

1. 構建resnet需要哪些參數，有什麼作用？

2. 基本結構塊需要哪些參數，有什麼作用？

首先來看一下ResNet50函數

該函數定義如下：

def ResNet50(include_top=True, weights=imagenet, input_tensor=None, input_shape=None, pooling=None, classes=1000):

該函數一共有6個參數，我們分別來看一下是什麼意思（以下內容的英文原文在源代碼的函數里）：

include_top: 邏輯值，在網路架構的頂端（也就是最後），是否包含全連接層。
weights: 二選一：None（代表隨機初始化權重）或者"imagenet"（代表讀入在ImageNet上預訓練的權重）。
input_tensor: 可選參數，Keras tensor（即layers.Input()的輸出），作為模型的輸入圖片。
input_shape: 可選參數，元組類型的維度，只有當include_top參數是False時，需要指定該參數，否則，輸入圖片的維度必須是(224, 224, 3)（channels_last格式）或(3, 224, 244)（channels_first格式），channels維度必須是3，寬度和高度大小必須大於197.
pooling: 可選參數，特徵圖提取的池化模式，僅當include_top為False時指定該參數的值。可選項包括None（無池化），avg（平均池化），max（最大池化）。
classes: 可選參數，指定圖片分類的類別數量，該參數只有當include_top是True且沒有指定weights參數時指定。

我們發現，其他很多參數都與『include_top』參數是否已指定有關，那麼『include_top』指定的全連接層有什麼作用呢？

簡單來說，全連接層將學到的「分散式特徵表示」映射到樣本標記空間的作用。所以，如果不使用全連接層的話，我們可以指定池化的方式得到輸出。如果使用全連接層的話，我們可以指定輸出的類別的數量。

讀者可以參考

該函數的輸出是一個Keras模型的實例。

搞清楚了ResNet50函數的輸入輸出，我們再回過頭來看一下我們的實例代碼中調用該函數的語句，有以下兩句：

model = ResNet50(weights=imagenet)preds = model.predict(x)

第一句指定『weights=imagenet』，意思是讀入預訓練的權重，返回的model是Keras模型的實例，既然是Keras模型的實例就可以使用.predict函數。x是處理後的圖片，也就是對圖片x進行預測，返回預測值。

再來看一下identity_block函數

identity_block函數的定義如下：

def identity_block(input_tensor, kernel_size, filters, stage, block):

該函數一共有5個參數：

input_tensor: 輸入tensor
kernel_size: 默認值3, 主路中間卷積層的卷積核大小
filters: 實數列表，主路3個卷積核的數目。
stage: 實數，當前階段標籤，用於生成層名稱。
block: a、b等，當前結構塊標籤，用於生成層名稱。

函數的輸出是tensor。

分析輸入參數和返回值，我們對該函數的理解是，一個恆等結構塊由三個卷積層和一個恆等快捷通道組成，正如論文中所說，恆等快捷通道不需要任何參數！我們需要指定的參數是卷積核大小、三個過濾器以及每一層的名稱。

輸入一個tensor給該結構塊，經過三個卷積層和一個恆等快捷通道後，得到輸出tensor。

最後來看一下conv_block函數

該函數的定義如下：

def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):

該函數包含6個參數：

- input_tensor: 輸入tensor

- kernel_size: 默認值3, 主路中間卷積層的卷積核大小

- filters: 實數列表，主路3個卷積層的濾波器大小。

- stage: 實數，當前階段標籤，用於生成層名稱。

- block: a、b等，當前結構塊標籤，用於生成層名稱。

- strides=(2, 2): 對於階段3，主路的第一個卷積層和快捷通道的strides都是（2，2）

函數的輸出是tensor。

conv_block函數的輸入參數比identity_block多了一個『strides=(2, 2)』，強制將其設置為『strides=(2, 2)』，為什麼要這樣做？我們在下一節進一步探索這三個函數是如何工作的。

分析輸入參數和返回值，我們對該函數的理解是，一個卷積結構塊由三個卷積層和一個卷積快捷通道組成，卷積快捷通道的作用是匹配輸入和輸出維度，具體是怎麼做到的呢？我們接下來繼續探索。

進一步探索

主函數

有了對三個函數的基本了解，下面進行進一步探索。這次我們從主函數開始，主函數是下面這個樣子：

if __name__ == __main__: model = ResNet50(include_top=True, weights=imagenet) img_path = elephant.jpg img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x) print(Input image shape:, x.shape) preds = model.predict(x) print(Predicted:, decode_predictions(preds))

源代碼通過空行將這段代碼分割為三部分，我們分別來分析一下。

（註：if __name__ == __main__:的作用這裡不再贅述，如果不明白請百度，或者簡單理解為：加了這一行該腳本既可以單獨運行也可以被其他腳本調用，參考這裡。）

第一部分只有一行代碼，也很好理解：運行ResNet50函數。我們已經分析過該函數，得到的返回值是一個Keras模型的實例。這裡代入的參數是include_top=True, weights=imagenet，代表該模型有全連接層，且讀入在ImageNet上預訓練的權重。

model = ResNet50(include_top=True, weights=imagenet)

第二部分共6行代碼。

img_path = elephant.jpg img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x) print(Input image shape:, x.shape)

我們依次來解讀一下。

前兩行很簡單，設置圖片地址為img_path = elephant.jpg，也就是說，我們需要一張名稱為『elephant.jpg』的圖片在本地文件夾。第二行就是讀入圖片並命名為變數img，參數也很好理解，分別是我們設置好的圖片路徑和圖片大小，這裡為什麼要限制圖片大小？我們在分析ResNet50函數的輸入參數是提到過，如果該函數的include_top參數為True，我們的圖片大小必須是（224，224）。

img_path = elephant.jpg img = image.load_img(img_path, target_size=(224, 224))

這裡，還需要注意的是，image.load_img()函數從哪裡來呢？答案是keras.preprocessing。我們查看一下腳本前面的引入部分，有一行代碼是：from keras.preprocessing import image。如果我們不使用該函數行不行呢？我們來做個實驗：

首先引入matplotlib庫和scipy庫用於圖片的讀取和縮放。

import matplotlib.image as mpimgfrom scipy import misc

然後將img = image.load_img(img_path, target_size=(224, 224))替換為下面的代碼：

img = mpimg.imread(img_path) img = misc.imresize(img, [224,224])

我們再次運行腳本，同樣能夠得到預測結果，但是，結果的數值與原來不同了，為什麼？

因為在對圖片進行強制轉換大小時，misc.imresize()和image.load_img()採用了不同的差值方法！

關於差值方法我們這裡不深入討論，感興趣的讀者可以顯示這兩種方法產生的圖片感受一下。

可以使用如下測試代碼：

import matplotlib.pyplot as pltimport matplotlib.image as mpimgfrom scipy import miscfrom keras.preprocessing import imageimg_path = elephant.jpgimg = image.load_img(img_path, target_size=(224, 224))#img = mpimg.imread(img_path) #img = misc.imresize(img, [224,224])x = image.img_to_array(img)plt.imshow(img)plt.show()

所以當我們使用keras庫時，建議採用該庫的keras.preprocessing.image.

繼續看下一行。

x = image.img_to_array(img)

該函數字面意思很容易理解，將圖片轉換格式為array，為了方便後續處理

x = np.expand_dims(x, axis=0)

然後擴展array的維度，也就是從(224,224,3)擴展為(1,224,224,3)，為什麼要這樣么做呢，很簡單，擴展的一個維度用於代表樣本，假設我們有100張圖片，那麼就可以用這種4維的tensor一次性輸入模型，而不是一張一張輸入。

再來看最後一句：

x = preprocess_input(x)

對x進行預處理，都進行哪些預處理呢？主要是對圖片進行歸一化處理。preprocess_input的官方代碼在這

前面的準備工作做完，下面進入主函數代碼的第三部分。

preds = model.predict(x) print(Predicted:, decode_predictions(preds))

由於我們的代碼不需要訓練，使用的是預訓練的權重，因此只需要直接將輸入圖片作為參數預測即可得到結果。

接下來進一步探索ResNet50函數。

ResNet50

為了簡單起見，也為了跟代碼思路同步，我們一起按照注釋的劃分進行代碼分析。

注釋一共有5處，將代碼劃分為5部分，我們依次來看一下。

第一部分：參數約束

代碼的第一部分注釋解釋了該函數的作用以及參數定義，該內容我們已經分析過了，直接看最後一句注釋和接下來的代碼。

""" # Raises ValueError: in case of invalid argument for `weights`, or invalid input shape. """ if weights not in {imagenet, None}: raise ValueError(The `weights` argument should be either `None` (random initialization) or `imagenet` (pre-training on ImageNet).) if weights == imagenet and include_top and classes != 1000: raise ValueError(If using `weights` as imagenet with `include_top` as true, `classes` should be 1000)

這裡定義了兩種情況，發現這兩種情況，函數將報錯，這兩種情況分別是：

weights參數既不是『imagenet』也不是『None』
當weights是『imagenet』，include_top是true，classes不等於1000

為什麼是這兩種情況的話，請看我們前面對該函數參數的分析。

另外這裡需要記住的是raise ValueError()的用法，結合條件判斷語句常用來對參數的有效性進行判斷。

第二部分：搭建演算法框架

此部分代碼如下：

# Determine proper input shape input_shape = _obtain_input_shape(input_shape, default_size=224, min_size=197, data_format=K.image_data_format(), require_flatten=include_top) if input_tensor is None: img_input = Input(shape=input_shape) else: if not K.is_keras_tensor(input_tensor): img_input = Input(tensor=input_tensor, shape=input_shape) else: img_input = input_tensor if K.image_data_format() == channels_last: bn_axis = 3 else: bn_axis = 1 x = ZeroPadding2D((3, 3))(img_input) x = Conv2D(64, (7, 7), strides=(2, 2), name=conv1)(x) x = BatchNormalization(axis=bn_axis, name=bn_conv1)(x) x = Activation(relu)(x) x = MaxPooling2D((3, 3), strides=(2, 2))(x) x = conv_block(x, 3, [64, 64, 256], stage=2, block=a, strides=(1, 1)) x = identity_block(x, 3, [64, 64, 256], stage=2, block=b) x = identity_block(x, 3, [64, 64, 256], stage=2, block=c) x = conv_block(x, 3, [128, 128, 512], stage=3, block=a) x = identity_block(x, 3, [128, 128, 512], stage=3, block=b) x = identity_block(x, 3, [128, 128, 512], stage=3, block=c) x = identity_block(x, 3, [128, 128, 512], stage=3, block=d) x = conv_block(x, 3, [256, 256, 1024], stage=4, block=a) x = identity_block(x, 3, [256, 256, 1024], stage=4, block=b) x = identity_block(x, 3, [256, 256, 1024], stage=4, block=c) x = identity_block(x, 3, [256, 256, 1024], stage=4, block=d) x = identity_block(x, 3, [256, 256, 1024], stage=4, block=e) x = identity_block(x, 3, [256, 256, 1024], stage=4, block=f) x = conv_block(x, 3, [512, 512, 2048], stage=5, block=a) x = identity_block(x, 3, [512, 512, 2048], stage=5, block=b) x = identity_block(x, 3, [512, 512, 2048], stage=5, block=c) x = AveragePooling2D((7, 7), name=avg_pool)(x) if include_top: x = Flatten()(x) x = Dense(classes, activation=softmax, name=fc1000)(x) else: if pooling == avg: x = GlobalAveragePooling2D()(x) elif pooling == max: x = GlobalMaxPooling2D()(x)

首先是對輸入圖片維度大小的約束：

我們在分析函數的輸入參數時提到過，如果不使用默認的(224,224)大小的圖片的話，可以指定input_shape參數，該參數被傳入_obtain_input_shape()函數，該函數起到什麼作用呢，我們來做一個測試。

測試代碼如下，我們可以更改input_shape的值測試_obtain_input_shape()函數的返回值。

import keras.backend as Kfrom keras.applications.imagenet_utils import _obtain_input_shapeinclude_top=Trueinput_shape = (224,200,3)input_shape = _obtain_input_shape(input_shape, default_size=224, min_size=197, data_format=K.image_data_format(), require_flatten=include_top)print(input_shape)

如果input_shape = None，函數將返回默認值(224,224,3)。如果input_shape的值前兩個維度小於197時，比如(224,120,3)，將報錯。input_shape格式必須符合image_data_format()的設置，舉例來說，我們的設置是channels_last,也就是代表『通道』的3必須在最後一個維度:(224,224,3)。

總結一下，這裡的_obtain_input_shape與前面的錯誤警告代碼起到的作用相同，就是約束輸入參數input_shape，使其有合法的值。

分析到這裡，該代碼的『參數約束』部分就結束了，這部分代碼的作用總結起來的話就是對函數的輸入參數進行約束，使其值合法。使用的方法有錯誤報警和『參數約束函數』。

我們可以將第一部分和第二部分的以上代碼總結為「步驟一：參數約束」。

繼續往下看代碼，這裡有兩個if else條件語句.

我們先看第一個：

if input_tensor is None: img_input = Input(shape=input_shape) else: if not K.is_keras_tensor(input_tensor): img_input = Input(tensor=input_tensor, shape=input_shape) else: img_input = input_tensor

條件判斷的依據是input_tensor參數，條件判斷的目的是得到合法的img_input作為模型的輸入（我們終於要給模型輸入了，離搭建模型不遠了）。

先說input_tensor參數，該參數是ResNet50的輸入參數，默認值是None。

當input_tensor = None時，img_input變數由Input函數得到，該函數的參數為shape=input_shape。

這裡的Input函數起到什麼作用呢？

我們先回到腳本的前面，看看Input函數從哪裡來？

答案是：from keras.layers import Input。

看一下這個函數的作用,keras的開源代碼和document都沒有關於這個函數的詳細描述，但是我們可以看stack overflow上關於這個函數作用的回答。

簡單來說，該函數的作用是將輸入轉換為Tensor，具體來說，我們的input_shape是(224,224,3)，那麼輸入Input後，得到的就是Tensor(?,224,224,3).

測試代碼如下：

from keras.layers import Inputinput_shape = (224,224,3)img_input = Input(shape=input_shape)print(input_shape)print(img_input)

測試代碼運行結果：

Using TensorFlow backend.(224, 224, 3)Tensor("input_1:0", shape=(?, 224, 224, 3), dtype=float32)

理解了Input的作用，這個條件語句的作用也就明了了：如果我們沒有指定input_tensor，那麼就使用input_shape生成tensor。

如果我們指定了input_tensor，那麼判斷該input_tensor是否是keras格式，是的話就是用該tensor，不是的話使用Inputj將其轉換為Tensor格式。

OK，繼續看下一個條件語句。

if K.image_data_format() == channels_last: bn_axis = 3 else: bn_axis = 1

這裡的判斷條件是K.image_data_format() == channels_last，根據我們之前的分析，這裡的意思是檢查keras的圖片格式設置是否是channels_last，如果是的話，那麼圖片維度的格式如下(224,224,3)，這時候變數bn_axis = 3，否則圖片格式是(3,224,224)，那麼變數bn_axis = 1。雖然我們現在還不清楚bn_axis的作用，但是至少我們猜測，其代表的是圖片中『通道』所在的維度，這個變數有什麼用呢？我們先繼續往下探索。

接下來是6組形式一致的代碼，都是x = somefunction（），我們先按順序探索第一組。

x = ZeroPadding2D((3, 3))(img_input) x = Conv2D(64, (7, 7), strides=(2, 2), name=conv1)(x) x = BatchNormalization(axis=bn_axis, name=bn_conv1)(x) x = Activation(relu)(x) x = MaxPooling2D((3, 3), strides=(2, 2))(x)

第一行x = ZeroPadding2D((3, 3))(img_input)字面理解是擴展圖片img_input的長和寬（圖片的上下左右都擴展3），也就是將(?,224,224,3)擴展為(?,230,230,3)，230=224+3+3。下面是一張示意圖。

真實情況是不是這樣呢，我們可以測試一下：

from keras.layers import Inputfrom keras.layers import ZeroPadding2Dinput_shape = (224,224,3)img_input = Input(shape=input_shape)x = ZeroPadding2D((3, 3))(img_input)print(img_input)print(x)

上面代碼得到如下結果：

Using TensorFlow backend.Tensor("input_1:0", shape=(?, 224, 224, 3), dtype=float32)Tensor("zero_padding2d_1/Pad:0", shape=(?, 230, 230, 3), dtype=float32)

擴展的維度用什麼填充呢，正如函數的名稱：用0填充。

理解了這行代碼在做什麼，更重要的是為什麼要這樣做？。答案在下一步：卷積濾波。

x = Conv2D(64, (7, 7), strides=(2, 2), name=conv1)(x)

根據Conv2D的官方文檔說明，conv2D對x進行卷積濾波，64是卷積核的數目，(7, 7)是卷積核的大小，strides=(2, 2)是卷積在不同方向的步長。根據以上參數設置我們可以計算出，維度為(?,230,230,3)的tensor經過卷積濾波後，輸出的tensor維度是多少。計算公式如下(參考：A Beginners Guide To Understanding Convolutional Neural Networks Part 2)：

公式中的代表輸出圖片的長和寬，是輸入圖片的長和寬，是卷積核大小，是padding的大小，是卷積步長。

如果該公式中的P=(K-1)/2，那麼在S=1的情況下，O=W，保證輸入和輸出維度大小相同。

將我們的數值代入：

出現了小數，很奇怪，我們實際測試一下，以下是測試代碼：

from keras.layers import Inputfrom keras.layers import ZeroPadding2Dfrom keras.layers import Conv2Dinput_shape = (224,224,3)img_input = Input(shape=input_shape)x = ZeroPadding2D((3, 3))(img_input)x = Conv2D(64, (7, 7), strides=(2, 2), name=conv1)(x)print(img_input)print(x)

得到結果是(?, 112, 112, 64)也就是說輸出的tensor維度是112。跟我們計算出來的不符。

為什麼呢？經過筆者多次改變參數測試，總結規律：如果公式中分子是奇數，那麼該分子直接減1，也就是說我們的分子不是而是。將該值再次代入公式即得到了正確的結果。

繼續往下看。

x = BatchNormalization(axis=bn_axis, name=bn_conv1)(x)

這個函數有兩個參數，其中axis參數是前面的條件語句的返回值，讓我們來回顧一下：

if K.image_data_format() == channels_last: bn_axis = 3 else: bn_axis = 1

顯然，該參數的意義是告訴BatchNormalization()函數應該作用在哪個維度。另外一個參數name很好理解，就是給該函數返回的實例命名。

重要的是弄清楚BatchNormalization()函數的作用。我們先回到腳本的前面，看看該函數的引用。如下。

from keras.layers import BatchNormalization

查看官方文檔了解該函數的作用。文檔中描述該函數的作用是：

Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1.

也就是對於每批數據，對前一層的輸出進行正則化處理，使其接近均值為0，方差為1。

下圖直觀展示了輸入的分布由a圖經過正則化後變成d圖，從而加速得到圖b中紫色回歸線的過程。

感興趣的讀者可以閱讀關於BatchNormalization的論文了解更多細節。

論文地址：https://arxiv.org/pdf/1502.03167.pdf

根據根據論文中的說法，Batch Normalization的作用是通過正則化每一層的輸入，加快學習率。

Batch Normalization allows us to use much higher learning rates and be less careful about initialization.

到這裡，我們也理解了為什麼給BatchNormalization的參數是一個維度，因為我們希望正則化發生在正確的維度。

另外，正則化不會對輸入的維度大小有任何影響，因此輸出的維度依然是(?, 112, 112, 64)

繼續看下一行：

x = Activation(relu)(x)

這一句很簡單，不做過多解釋，就是使用『relu』函數作為該層的激活函數。同樣不會對輸入的維度大小產生影響。

下圖是relu函數的函數圖。

第一組代碼的最後一句來了：

x = MaxPooling2D((3, 3), strides=(2, 2))(x)

該代碼的作用是對x進行『二維池化』操作，參數(3,3)明確了該池化層的大小，strides=(2, 2)是池化的步進，MaxPooling中文是『最大池化』，簡單來說就是返回3 x 3窗口內的最大值。所以每一個3 x 3的9個像素將被該窗口內的1個最大值替換。

下圖以(2,2)窗口大小，步進為2舉例說明：

經過最大池化後，x的維度大小產生了變化！計算公式和卷積層的計算公式相同：

最後輸出的維度大小是(?, 55, 55, 64)。

至此，我們完成了對第一組代碼的分析，總結一下：

第一組代碼完成了以下動作：

對輸入圖片進行零填充
進行卷積濾波
進行正則化
輸入激活函數
最大池化處理

第一組代碼每一個動作的意義：

確保下一步的卷積濾波操作輸出維度大小是輸入的一半。
卷積濾波得到圖片的淺層特徵
正則化加速學習率
激活函數引入非線性因素
池化處理用於降維

再來看輸入維度變化：

輸入(?,224,224,3)，輸出(?,230,230,3)
輸入(?,230,230,3)，輸出(?,112,112,64)
輸入(?,112,112,64)，輸出(?,112,112,64)
輸入(?,112,112,64)，輸出(?,112,112,64)
輸入(?,112,112,64)，輸出(?,55,55,64)

還有一個需要讀者思考的問題，經過第一組代碼後，相當於經歷了神經網路的幾層呢？

答案是一層！因為零填充、正則化、池化等操作不算做神經網路的「一層」，只有經過卷積濾波和激活函數後可以看做是『一層』。

接下來我們先觀察第二組至第五組代碼，規律很簡單，都是不同數量不同參數的conv_block()和identity_block()組合。因此我們分析一組代碼，其他的代碼也就清楚了。

來看第二組代碼：

x = conv_block(x, 3, [64, 64, 256], stage=2, block=a, strides=(1, 1)) x = identity_block(x, 3, [64, 64, 256], stage=2, block=b) x = identity_block(x, 3, [64, 64, 256], stage=2, block=c)

這裡x作為輸入首先進入了卷積結構塊，跟我我們之前對該結構塊的參數的分析，x代表輸入tensor，3代表卷積核大小，[64,64,225]代表三個卷積核的數目，stage=2標記了目前的階段，block=『a標記了此結構塊的名稱。strides=(1,1)代表為卷積的步長。

其實通過參數，我們就能猜到conv_block()的內部結構，既然我們的卷積核數目是一個有三個元素的list，說明該結構塊里應該有3個卷積層，每一層的卷積核大小都是3，步進都是1。

下面來看一下conv_block()的內部是否跟我們想的一樣。

def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)): """conv_block is the block that has a conv layer at shortcut # Arguments input_tensor: input tensor kernel_size: defualt 3, the kernel size of middle conv layer at main path filters: list of integers, the filterss of 3 conv layer at main path stage: integer, current stage label, used for generating layer names block: a,b..., current block label, used for generating layer names # Returns Output tensor for the block. Note that from stage 3, the first conv layer at main path is with strides=(2,2) And the shortcut should have strides=(2,2) as well """ filters1, filters2, filters3 = filters if K.image_data_format() == channels_last: bn_axis = 3 else: bn_axis = 1 conv_name_base = res + str(stage) + block + _branch bn_name_base = bn + str(stage) + block + _branch x = Conv2D(filters1, (1, 1), strides=strides, name=conv_name_base + 2a)(input_tensor) x = BatchNormalization(axis=bn_axis, name=bn_name_base + 2a)(x) x = Activation(relu)(x) x = Conv2D(filters2, kernel_size, padding=same, name=conv_name_base + 2b)(x) x = BatchNormalization(axis=bn_axis, name=bn_name_base + 2b)(x) x = Activation(relu)(x) x = Conv2D(filters3, (1, 1), name=conv_name_base + 2c)(x) x = BatchNormalization(axis=bn_axis, name=bn_name_base + 2c)(x) shortcut = Conv2D(filters3, (1, 1), strides=strides, name=conv_name_base + 1)(input_tensor) shortcut = BatchNormalization(axis=bn_axis, name=bn_name_base + 1)(shortcut) x = layers.add([x, shortcut]) x = Activation(relu)(x) return x

有了前面分析第一組代碼的經驗，我們可以輕鬆了解conv_block()的內部結構。還記得我們說過，卷積神經網路的『一層』包括一個卷積層加一個激活層，所以按照這個規律，我們可以把conv_block()的內部結構簡化為輸入經過了3層。

該卷積結構塊唯一不同的地方是：快捷通道。

我們發現，輸入在經過最後一個卷積層之後，並沒有直接進入激活函數，而是跟快捷通道的輸出相加後再輸入激活函數，如下：

x = layers.add([x, shortcut]) x = Activation(relu)(x)

所以秘密就在這個快捷通道shortcut。

我們來看看這個shortcut經歷了什麼：

shortcut = Conv2D(filters3, (1, 1), strides=strides, name=conv_name_base + 1)(input_tensor) shortcut = BatchNormalization(axis=bn_axis, name=bn_name_base + 1)(shortcut)

這個shortcut經歷了卷積和BatchNormalization，值得注意的是，shortcut的輸入是input_tensor，也就是說conv_block()一路給了正常的3層卷積網路（也稱為主路），另一路輸入給了快捷通道。如下圖。

同樣的道理，我們看一下identity_block的內部結構：

def identity_block(input_tensor, kernel_size, filters, stage, block): """The identity block is the block that has no conv layer at shortcut. # Arguments input_tensor: input tensor kernel_size: defualt 3, the kernel size of middle conv layer at main path filters: list of integers, the filterss of 3 conv layer at main path stage: integer, current stage label, used for generating layer names block: a,b..., current block label, used for generating layer names # Returns Output tensor for the block. """ filters1, filters2, filters3 = filters if K.image_data_format() == channels_last: bn_axis = 3 else: bn_axis = 1 conv_name_base = res + str(stage) + block + _branch bn_name_base = bn + str(stage) + block + _branch x = Conv2D(filters1, (1, 1), name=conv_name_base + 2a)(input_tensor) x = BatchNormalization(axis=bn_axis, name=bn_name_base + 2a)(x) x = Activation(relu)(x) x = Conv2D(filters2, kernel_size, padding=same, name=conv_name_base + 2b)(x) x = BatchNormalization(axis=bn_axis, name=bn_name_base + 2b)(x) x = Activation(relu)(x) x = Conv2D(filters3, (1, 1), name=conv_name_base + 2c)(x) x = BatchNormalization(axis=bn_axis, name=bn_name_base + 2c)(x) x = layers.add([x, input_tensor]) x = Activation(relu)(x) return x

與卷積結構塊的不同之處是快捷通道上沒有卷積層。其他都一樣！

問題來了，為什麼有兩種不同的結構塊？

conv_block函數將輸入的維度(55, 55, 64)，變成(55, 55, 256)。而identical_block函數輸入、輸出的維度都是(55, 55, 256)。所以，兩種結構塊起到的作用是，當我們想增加輸出的維度時使用conv_block，想保持維度大小不變時，使用identical_block.

後面類似的結構塊我們就不重複探索了。

繼續看演算法結構的最後一層代碼:

x = AveragePooling2D((7, 7), name=avg_pool)(x) if include_top: x = Flatten()(x) x = Dense(classes, activation=softmax, name=fc1000)(x) else: if pooling == avg: x = GlobalAveragePooling2D()(x) elif pooling == max: x = GlobalMaxPooling2D()(x)

首先是AveragePooling2D，顧名思義，就是求平均池化。

最後是全連接層，也就是說需要把最後的x平坦化。使用的激活函數是softmax。得到的是2維的張量，第二個維度大小是全連接的數量1000.

如果沒有全連接層，根據參數pooling繼續進行池化處理。注意這裡使用的是全局池化，所以輸出的同樣是2維的張量。

# Ensure that the model takes into account # any potential predecessors of `input_tensor`. if input_tensor is not None: inputs = get_source_inputs(input_tensor) else: inputs = img_input

這裡考慮了input_tensor參數。

model = Model(inputs, x, name=resnet50)

最後生成模型的實例。

讀取模型權重部分很簡單，建議讀者自行探索，簡而言之就是根據參數選擇正確的預訓練權重。