【mask_rcnn】ResNet及mask_rcnn中使用的ResNet網路結構

06-26

寫在前面：這是自己的學習筆記，如有侵權，請聯繫。

上圖中展示了幾種常見深度的ResNet，分別是ResNet18，ResNet34，ResNet50，ResNet101，ResNet152。這幾種深度的網路都由conv1、conv2_x、conv3_x、conv4_x、conv5_x組成。其中conv2_x、conv3_x、conv4_x、conv5_x又稱為結構塊（building block）,裡面包含殘差網路，因此可以用res來表示。那麼ResNet50可以表示為[res3 res4 res6 res3]，ResNet101則表示為[res3 res4 res23 res3]。而每個building block又有相應的層組成。

以ResNet101為例，介紹其網路結構：

其中res3 表示res2a(即1X1,64)+res2b(即3X3,64)+res2c(即1X1,256), 而res2a又包含三層卷積，res2a = res2a_brach2a+res2a_brach2b+res2a_brach2c.。

以輸入圖片尺寸為1024*10243為例，經過resnet101後，其特徵為3*3*2048。詳細見代碼。

注意：tensorflow中，卷積過程的尺寸計算是按照（w-f）/s+1，而不是按照（w-f+2p）/s+1。

MASK RCNN中ResNet相關的代碼為：

def resnet_graph(input_image, architecture, stage5=False): ####函數5 殘差網路 assert architecture in ["resnet50", "resnet101"] # Stage 1 stage1是殘差網路開始的輸入 x = KL.ZeroPadding2D((3, 3))(input_image) ###ZeroPadding對2D輸入（如圖片）的邊界填充0，以控制卷積以後特徵圖的大小 print(Stage_1_x1,x) ##結果為Tensor("zero_padding2d_1/Pad:0", shape=(?, 1030, 1030, 3), dtype=float32, device=/device:CPU:0) x = KL.Conv2D(64, (7, 7), strides=(2, 2), name=conv1, use_bias=True)(x) #二維卷積層,64個卷積核，卷積核的長寬為7，卷積步長為2，use_bias=True表示使用偏置項 ##Conv2D的計算不是（W-F+2P）/S+1，沒有零填充的時候，其計算為（W-F+1）/S print(Stage_1_x2,x) ##結果為Tensor("conv1/BiasAdd:0", shape=(?, 512, 512, 64), dtype=float32, device=/device:CPU:0) x = BatchNorm(axis=3, name=bn_conv1)(x) ##規範化，axis制定要規範化的軸，通常為特徵軸 print(Stage_1_x3,x)##結果為Tensor("bn_conv1/batchnorm/add_1:0", shape=(?, 512, 512, 64), dtype=float32, device=/device:CPU:0) x = KL.Activation(relu)(x)##激活層對一個層的輸出施加激活函數relu print(Stage_1_x4,x) ##結果為Tensor("activation_1/Relu:0", shape=(?, 512, 512, 64), dtype=float32, device=/device:CPU:0) C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)##池化層，下採樣因子為(3,3)，步長為(2,2)，填充 print(Stage_1_x5,x)##結果為Tensor("max_pooling2d_1/MaxPool:0", shape=(?, 256, 256, 64), dtype=float32, device=/device:CPU:0) #############stage2-5才是殘差網路的核心！殘差就體現在conv_block的shortcut中 # Stage 2 ###conv_block下的resnet網路：x-->(1*1，64)-->(3*3，64)-->(1*1,256);short:x-->(1*1,256);add(x,short)-->relu x = conv_block(x, 3, [64, 64, 256], stage=2, block=a, strides=(1, 1)) print(Stage_2_x1,x) ##結果為 Tensor("res2a_out/Relu:0", shape=(?, 256, 256, 256), dtype=float32, device=/device:CPU:0) ### identity_block下的resnet網路層次為三層：（1*1，64）-->(3*3，64)-->(1*1,256)-->add-->relu x = identity_block(x, 3, [64, 64, 256], stage=2, block=b) print(Stage_2_x2,x) ##結果為Tensor("res2b_out/Relu:0", shape=(?, 256, 256, 256), dtype=float32, device=/device:CPU:0) C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block=c) print(Stage_2_x3,x)##結果為 Tensor("res2c_out/Relu:0", shape=(?, 256, 256, 256), dtype=float32, device=/device:CPU:0) # Stage 3 x = conv_block(x, 3, [128, 128, 512], stage=3, block=a) print(Stage_3_x1,x)##結果為Tensor("res3a_out/Relu:0", shape=(?, 128, 128, 512), dtype=float32, device=/device:CPU:0) x = identity_block(x, 3, [128, 128, 512], stage=3, block=b) print(Stage_3_x2,x)##結果為Tensor("res3b_out/Relu:0", shape=(?, 128, 128, 512), dtype=float32, device=/device:CPU:0) x = identity_block(x, 3, [128, 128, 512], stage=3, block=c) print(Stage_3_x3,x)##結果為Tensor("res3c_out/Relu:0", shape=(?, 128, 128, 512), dtype=float32, device=/device:CPU:0) C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block=d) print(Stage_3_x4,x)##結果為Tensor("res3d_out/Relu:0", shape=(?, 128, 128, 512), dtype=float32, device=/device:CPU:0) # Stage 4 x = conv_block(x, 3, [256, 256, 1024], stage=4, block=a) print(Stage_4_x1,x)##結果為Tensor("res4a_out/Relu:0", shape=(?, 64, 64, 1024), dtype=float32, device=/device:CPU:0) block_count = {"resnet50": 5, "resnet101": 22}[architecture] ## for i in range(block_count): x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i)) ##block為什麼是98+i? C4 = x print(Stage_4_x2,x)##結果為Tensor("res4w_out/Relu:0", shape=(?, 64, 64, 1024), dtype=float32, device=/device:CPU:0) # Stage 5 if stage5: x = conv_block(x, 3, [512, 512, 2048], stage=5, block=a) print(Stage_5_x1,x)##結果為Tensor("res5a_out/Relu:0", shape=(?, 32, 32, 2048), dtype=float32, device=/device:CPU:0) x = identity_block(x, 3, [512, 512, 2048], stage=5, block=b) print(Stage_5_x2,x)##結果為Tensor("res5b_out/Relu:0", shape=(?, 32, 32, 2048), dtype=float32, device=/device:CPU:0) C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block=c) print(Stage_5_x3,x)##結果為Tensor("res5c_out/Relu:0", shape=(?, 32, 32, 2048), dtype=float32, device=/device:CPU:0) else: C5 = None return [C1, C2, C3, C4, C5]def conv_block(input_tensor, kernel_size, filters, stage, block,###函數4 快捷途徑時有巻積層的塊。x為x+shortcut strides=(2, 2), use_bias=True): """conv_block is the block that has a conv layer at shortcut # Arguments input_tensor: input tensor kernel_size: defualt 3, the kernel size of middle conv layer at main path filters: list of integers, the nb_filters of 3 conv layer at main path stage: integer, current stage label, used for generating layer names block: a,b..., current block label, used for generating layer names Note that from stage 3, the first conv layer at main path is with subsample=(2,2) And the shortcut should have subsample=(2,2) as well """ nb_filter1, nb_filter2, nb_filter3 = filters conv_name_base = res + str(stage) + block + _branch bn_name_base = bn + str(stage) + block + _branch x = KL.Conv2D(nb_filter1, (1, 1), strides=strides, name=conv_name_base + 2a, use_bias=use_bias)(input_tensor) ##為什麼會有2a，2b，2c等之分？ x = BatchNorm(axis=3, name=bn_name_base + 2a)(x) x = KL.Activation(relu)(x) x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding=same, name=conv_name_base + 2b, use_bias=use_bias)(x) x = BatchNorm(axis=3, name=bn_name_base + 2b)(x) x = KL.Activation(relu)(x) x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + 2c, use_bias=use_bias)(x) x = BatchNorm(axis=3, name=bn_name_base + 2c)(x) shortcut = KL.Conv2D(nb_filter3, (1, 1), strides=strides, name=conv_name_base + 1, use_bias=use_bias)(input_tensor) ##shortcut的shape與2c之後的x的shape已知。但是這麼做的目的是什麼？ shortcut = BatchNorm(axis=3, name=bn_name_base + 1)(shortcut) x = KL.Add()([x, shortcut]) ##Add()接收相同的張量[x, shortcut]，並返回他們的和 x = KL.Activation(relu, name=res + str(stage) + block + _out)(x) return xdef identity_block(input_tensor, kernel_size, filters, stage, block,##函數3：快捷途徑使沒有巻積層的塊。identity_block函數與conv_block函數的區別就是conv_block有shortcut，而identity_block沒有 use_bias=True): """The identity_block is the block that has no conv layer at shortcut # Arguments input_tensor: input tensor kernel_size: defualt 3, the kernel size of middle conv layer at main path filters: list of integers, the nb_filters of 3 conv layer at main path stage: integer, current stage label, used for generating layer names block: a,b..., current block label, used for generating layer names """ nb_filter1, nb_filter2, nb_filter3 = filters conv_name_base = res + str(stage) + block + _branch bn_name_base = bn + str(stage) + block + _branch x = KL.Conv2D(nb_filter1, (1, 1), name=conv_name_base + 2a, use_bias=use_bias)(input_tensor) x = BatchNorm(axis=3, name=bn_name_base + 2a)(x) ##BatchNorm中的axis為要規範化的軸，通暢為特徵軸 x = KL.Activation(relu)(x) x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding=same, name=conv_name_base + 2b, use_bias=use_bias)(x) x = BatchNorm(axis=3, name=bn_name_base + 2b)(x) x = KL.Activation(relu)(x)#####Activation加入激活函數 x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + 2c, use_bias=use_bias)(x) x = BatchNorm(axis=3, name=bn_name_base + 2c)(x) x = KL.Add()([x, input_tensor])##將layer(）添加到模型中。 x = KL.Activation(relu, name=res + str(stage) + block + _out)(x) return x

總結並用流程圖表示的話，其中的conv_block塊和identity_block塊結構如下：

ResNet50和ResNet101結構分別如下：