






微軟的深度殘差網路ResNet源於2016年CVPR最佳論文---圖像識別中的深度殘差學習(Deep Residual Learning for Image Recognition)(leiphone.com/news/20160), 論文來源(link.jianshu.com/?),翻譯地址(tower.im/users/sign_in

這個152層ResNet架構深,除了在層數上面創紀錄,ResNet 的錯誤率也低得驚人,達到了3.6%,人類都大約在5%~10%的水平。這是目前為止最好的深度學習框架。可以看作人工神經網路領域的又一里程碑。





而在ResNet的這篇論文中,通過引入一個深度殘差學習框架,解決了這個退化問題。它不期望每一層能直接吻合一個映射,而是明確的讓這些層去吻合殘差映射。形式上看,就是用H(X)來表示最優解映射,但我們讓堆疊的非線性層去擬合另一個映射F(X):=H(X) - X, 此時原最優解映射H(X)就可以改寫成F(X)+X,我們假設殘差映射跟原映射相比更容易被優化。極端情況下,如果一個映射是可優化的,那也會很容易將殘差推至0,把殘差推至0和把此映射逼近另一個非線性層相比要容易的多。

F(X)+X的公式可以通過在前饋網路中做一個「快捷連接」來實現(如圖2) ,快捷連接跳過一個或多個層。在我們的用例中,快捷連接簡單的執行自身映射,它們的輸出被添加到疊加層的輸出中。自身快捷連接既不會添加額外的參數也不會增加計算複雜度。整個網路依然可以用SGD+反向傳播來做端到端的訓練。







def bottleneck(inputs, depth, depth_bottleneck, stride, rate=1,n outputs_collections=None, scope=None):nn with tf.variable_scope(scope, bottleneck_v1, [inputs]) as sc:n depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4) if depth == depth_in:n shortcut = resnet_utils.subsample(inputs, stride, shortcut) else:n shortcut = slim.conv2d(inputs, depth, [1, 1], stride=stride,n activation_fn=None, scope=shortcut)nn residual = slim.conv2d(inputs, depth_bottleneck, [1, 1], stride=1,n scope=conv1)n residual = resnet_utils.conv2d_same(residual, depth_bottleneck, 3, stride,n rate=rate, scope=conv2)n residual = slim.conv2d(residual, depth, [1, 1], stride=1,n activation_fn=None, scope=conv3)nn output = tf.nn.relu(shortcut + residual) return slim.utils.collect_named_outputs(outputs_collections,n sc.original_name_scope,n output)n









def resnet_v1_50(inputs,n num_classes=None,n is_training=True,n global_pool=True,n output_stride=None,n reuse=None,n scope=resnet_v1_50):n """ResNet-50 model of [1]. See resnet_v1() for arg and return description."""n blocks = [n resnet_utils.Block( block1, bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),n resnet_utils.Block( block2, bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),n resnet_utils.Block( block3, bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),n resnet_utils.Block( block4, bottleneck, [(2048, 512, 1)] * 3)n ] nreturn resnet_v1(inputs, blocks, num_classes, is_training,n global_pool=global_pool, output_stride=output_stride,n include_root_block=True, reuse=reuse, scope=scope)n


class Block(collections.namedtuple(Block, [scope, unit_fn, args])): n"""nA named tuple describing a ResNet block. nIts parts are: nscope: The scope of the `Block`. nunit_fn: The ResNet unit function which takes as input a `Tensor` and nreturns another `Tensor` with the output of the ResNet unit. nargs: A list of length equal to the number of units in the `Block`. The list ncontains one (depth, depth_bottleneck, stride) tuple for each unit in the n block to serve as argument to unit_fn.n


而將個元素為block的 LIst轉換為一個網路的函數,則是resnet_v1,這個函數是ResNet的核心,而不同層數的ResNet只需要改變上述函數blocks中block的個數就可以了。

class Block(collections.namedtuple(Block, [scope, unit_fn, args])): n"""nA named tuple describing a ResNet block. nIts parts are: nscope: The scope of the `Block`. n unit_fn: The ResNet unit function which takes as input a `Tensor` and nreturns another `Tensor` with the output of the ResNet unit. nargs: A list of length equal to the number of units in the `Block`. The list ncontains one (depth, depth_bottleneck, stride) tuple for each unit in the nblock to serve as argument to unit_fn.n"""ndef resnet_v1(inputs, nblocks, nnum_classes=None, nis_training=True, nglobal_pool=True, noutput_stride=None, ninclude_root_block=True, nreuse=None, nscope=None): with tf.variable_scope(scope, resnet_v1, [inputs], reuse=reuse) as sc: end_points_collection = sc.name + _end_points nwith slim.arg_scope([slim.conv2d, bottleneck, nresnet_utils.stack_blocks_dense], noutputs_collections=end_points_collection): nwith slim.arg_scope([slim.batch_norm], is_training=is_training): nnet = inputs nif include_root_block: nif output_stride is not None: nif output_stride % 4 != 0: nraise ValueError(The output_stride needs to be a multiple of 4.) noutput_stride /= 4 nnet = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope=conv1) nnet = slim.max_pool2d(net, [3, 3], stride=2, scope=pool1) nnet = resnet_utils.stack_blocks_dense(net, blocks, output_stride) nif global_pool: n# Global average pooling. n net = tf.reduce_mean(net, [1, 2], name=pool5, keep_dims=True) nif num_classes is not None: nnet = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope=logits) n# Convert end_points_collection into a dictionary of end_points. nend_points = slim.utils.convert_collection_to_dict(end_points_collection) nif num_classes is not None: nend_points[predictions] = slim.softmax(net, scope=predictions) nreturn net, nend_pointsn


net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)


def stack_blocks_dense(net, nblocks, noutput_stride=None, noutputs_collections=None): n# The current_stride variable keeps track of the effective stride of the n# activations. This allows us to invoke atrous convolution whenever applying n# the next residual unit would result in the activations having stride larger n# than the target output_stride. current_stride = 1 n# The atrous convolution rate parameter. rate = 1 for block in blocks: nwith variable_scope.variable_scope(block.scope, block, [net]) as sc: nfor i, unit in enumerate(block.args): nif output_stride is not None and current_stride > output_stride: nraise ValueError(The target output_stride cannot be reached.) nwith variable_scope.variable_scope(unit_%d % (i + 1), values=[net]): nunit_depth, unit_depth_bottleneck, unit_stride = unit n# If we have reached the target output_stride, then we need to employ n # atrous convolution with stride=1 and multiply the atrous rate by the n# current units stride for use in subsequent layers. nif output_stride is not None and current_stride == output_stride: nnet = block.unit_fn( nnet, ndepth=unit_depth, ndepth_bottleneck=unit_depth_bottleneck, nstride=1, n rate=rate) nrate *= unit_stride nelse: nnet = block.unit_fn( nnet, ndepth=unit_depth, ndepth_bottleneck=unit_depth_bottleneck, nstride=unit_stride, nrate=1) ncurrent_stride *= unit_stride nnet = utils.collect_named_outputs(outputs_collections, sc.name, net) nif output_stride is not None and current_stride != output_stride: nraise ValueError(The target output_stridencannot be reached.) nreturn netn

在這裡,代碼中提到了 atrous convolution這個結構,簡單來說,它是如圖6(b)所示的一個結構,可以起到在使用了步長為1的池化層後扔使得原結構保持相同的感受野。

圖6.atrous convolution


[1]Deep Residual Learning for Image Recognition






經典模型-1:《Deep Residual Learning for Image Recognition》論文翻譯

TAG:ResNet |