技術觀點 | CNN Matlab版學習筆記（四）

06-07

CNN matlab版學習筆記（四）：

Train a Deep Neural Network from Scratch

本文內容來自Matlab2017a Documents

本演示要求有Computer Vision System Toolbox?, Image Processing Toolbox?, Neural Network Toolbox?, and Statistics and Machine Learning Toolbox?.

Using a CUDA-capable NVIDIA? GPU with compute capability 3.0 or higher is highly recommended for running this example. Use of a GPU requires the Parallel Computing Toolbox?.）

要運行本例具有compute capability 3.0或更高的GPU被強烈對推薦。 GPU運算要求使用Parallel Computing Toolbox。

Step1:Download CIFAR-10 Image Data

(Download the CIFAR-10 data set [3]. This dataset contains 50,000 training images that will be used to train a CNN.)

下載CIFAR-10dataset,這個數據集包含50000訓練圖像，它們將用來訓練一個CNN.

代碼如下：

% Download CIFAR-10 data to a temporary directory

cifar10Data = tempdir;

url = https://www.cs.toronto.edu/~kriz/cifar-10-matlab.tar.gz;

helperCIFAR10Data.download(url, cifar10Data);

% Load the CIFAR-10 training and test data.

[trainingImages, trainingLabels, testImages, testLabels] = helperCIFAR10Data.load(cifar10Data);

（Each image is a 32x32 RGB image and there are 50,000 training samples.）

每幅圖像均是32x32 RGB圖像，有50000訓練圖像。

代碼如下：

size(trainingImages)

輸出結果：

ans = 32 32 3 50000

（CIFAR-10 has 10 image categories. List the image categories:）

CIFAR-10有10類圖像，列出具體類別：

代碼如下：

numImageCategories = 10;

categories(trainingLabels)

輸出結果：

ans =

10×1 cell array

airplane

automobile

bird

cat

deer

dog

frog

horse

ship

truck

展示一部分訓練圖像，調整尺寸便於展示，這裡是64*64.

代碼如下：

% Display a few of the training images, resizing them for display.

figure

thumbnails = trainingImages(:,:,:,1:100);

thumbnails = imresize(thumbnails, [64 64]);

montage(thumbnails)

輸出結果：

Step2:Create A Convolutional Neural Network (CNN)

(A CNN is composed of a series of layers, where each layer defines a specific computation. The Neural Network Toolbox? provides functionality to easily design a CNN layer-by-layer. In this example, the following layers are used to create a CNN:)

一個CNN由一系列層組成，每層定義一個指明的計算。The Neural Network Toolbox? 提供了函數很容易設計一個CNNlayer-by-layer. 在這個例子中，以下各層被使用構建一個CNN：主要包括：

· imageInputLayer - Image input layer

· convolutional2dLayer - 2D convolution layer for Convolutional Neural Networks

· reluLayer - Rectified linear unit (ReLU) layer

· maxPooling2dLayer - Max pooling layer

· fullyConnectedLayer - Fully connected layer

· softmaxLayer - Softmax layer

· classificationLayer - Classification output layer for a neural network

（The network defined here is similar to the one described in [4] and starts with an imageInputLayer. The input layer defines the type and size of data the CNN can process. In this example, the CNN is used to process CIFAR-10 images, which are 32x32 RGB images:）

Step2-1:

在此，定義的網路與文獻（http://code.google.com/p/cuda-convnet/）中相似，並且以an imageInputLayer.為開始。輸入層定義the type and size of data。在本演示中，CNN處理的CIFAR-10 images是32x32 RGB images。

Create the image input layer 的代碼如下：

% Create the image input layer for 32x32x3 CIFAR-10 images

[height, width, numChannels, ~] = size(trainingImages);

imageSize = [height width numChannels];

inputLayer = imageInputLayer(imageSize)

輸出結果

inputLayer =

ImageInputLayer with properties:

Name:

InputSize: [32 32 3]

Hyperparameters

DataAugmentation: none

Normalization: zerocenter

Step2-2:

(Next, define the middle layers of the network. The middle layers are made up of repeated blocks of convolutional, ReLU (rectified linear units), and pooling layers. These 3 layers form the core building blocks of convolutional neural networks. The convolutional layers define sets of filter weights, which are updated during network training. The ReLU layer adds non-linearity to the network, which allow the network to approximate non-linear functions that map image pixels to the semantic content of the image. The pooling layers downsample data as it flows through the network. In a network with lots of layers, pooling layers should be used sparingly to avoid downsampling the data too early in the network.)

本部分構建網路的中間層，中間層由很多的convolutional, ReLU (rectified linear units), and pooling layers構成。這三種形成了核心的CNN的架構。 The convolutional layers 定義了一系列的filter weights，這些filter weights在訓練過程中被更新。The ReLU layer 把非線性屬性加入到網路，The ReLU layer 使網路近似一個非線性函數，映射Image Pixel到圖像的semantic content 。當The pooling layers 在網路中溢出時，The pooling layers downsample數據。在眾多層的網路中，pooling layers 應該被盡量少用，防止過早地downsample數據。

代碼如下：

% Convolutional layer parameters

filterSize = [5 5];

numFilters = 32;

middleLayers = [

% The first convolutional layer has a bank of 32 5x5x3 filters. A

% symmetric padding of 2 pixels is added to ensure that image borders

% are included in the processing. This is important to avoid

% information at the borders being washed away too early in the

% network.

convolution2dLayer(filterSize, numFilters, Padding, 2)

% Note that the third dimension of the filter can be omitted because it

% is automatically deduced based on the connectivity of the network. In

% this case because this layer follows the image layer, the third

% dimension must be 3 to match the number of channels in the input

% image.

% Next add the ReLU layer:

reluLayer()

% Follow it with a max pooling layer that has a 3x3 spatial pooling area

% and a stride of 2 pixels. This down-samples the data dimensions from

% 32x32 to 15x15.

maxPooling2dLayer(3, Stride, 2)

% Repeat the 3 core layers to complete the middle of the network.

convolution2dLayer(filterSize, numFilters, Padding, 2)

reluLayer()

maxPooling2dLayer(3, Stride,2)

convolution2dLayer(filterSize, 2 * numFilters, Padding, 2)

reluLayer()

maxPooling2dLayer(3, Stride,2)

]

輸出結果：

middleLayers =

9x1 Layer array with layers:

1 Convolution 32 5x5 convolutions with stride [1 1] and padding [2 2]

2 ReLU ReLU

3 Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0]

4 Convolution 32 5x5 convolutions with stride [1 1] and padding [2 2]

5 ReLU ReLU

6 Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0]

7 Convolution 64 5x5 convolutions with stride [1 1] and padding [2 2]

8 ReLU ReLU

9 Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0]

（A deeper network may be created by repeating these 3 basic layers. However, the number of pooling layers should be reduced to avoid downsampling the data prematurely. Downsampling early in the network discards image information that is useful for learning.）

通過重複這三種網路構建深層網路，然而，為了避免過早地downsampling data, pooling layers 的數量應該少一些。過早地Downsampling容易刪除學習的有用信息。

Step2-3:

最後一層是由fully connected layers and a softmax loss layer構成。

（The final layers of a CNN are typically composed of fully connected layers and a softmax loss layer.）

代碼如下：

finalLayers = [

% Add a fully connected layer with 64 output neurons. The output size of

% this layer will be an array with a length of 64.

fullyConnectedLayer(64)

% Add an ReLU non-linearity.

reluLayer

% Add the last fully connected layer. At this point, the network must

% produce 10 signals that can be used to measure whether the input image

% belongs to one category or another. This measurement is made using the

% subsequent loss layers.

fullyConnectedLayer(numImageCategories)

% Add the softmax loss layer and classification layer. The final layers use

% the output of the fully connected layer to compute the categorical

% probability distribution over the image classes. During the training

% process, all the network weights are tuned to minimize the loss over this

% categorical distribution.

softmaxLayer

classificationLayer

]

輸出結果：

finalLayers =

5x1 Layer array with layers:

1 Fully Connected 64 fully connected layer

2 ReLU ReLU

3 Fully Connected 10 fully connected layer

4 Softmax softmax

5 Classification Output crossentropyex

Step2-4:

Combine the input, middle, and final layers.

代碼如下：

layers = [ inputLayer middleLayers finalLayers ]

輸出結果：

layers =

15x1 Layer array with layers:

1 Image Input 32x32x3 images with zerocenter normalization

2 Convolution 32 5x5 convolutions with stride [1 1] and padding [2 2]

3 ReLU ReLU

4 Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0]

5 Convolution 32 5x5 convolutions with stride [1 1] and padding [2 2]

6 ReLU ReLU

7 Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0]

8 Convolution 64 5x5 convolutions with stride [1 1] and padding [2 2]

9 ReLU ReLU

10 Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0]

11 Fully Connected 64 fully connected layer

12 ReLU ReLU

13 Fully Connected 10 fully connected layer

14 Softmax softmax

15 Classification Output crossentropyex

Step3:Train CNN Using CIFAR-10 Data

Step3-1:

採用標準差為0.0001的正態分布初始化第一個convolutional layer 權重，

這樣做有助於改善訓練的收斂性。

Initialize the first convolutional layer weights using normally distributed random numbers

with standard deviation of 0.0001. This helps improve the convergence of training.

代碼如下：

layers(2).Weights = 0.0001 * randn([filterSize numChannels numFilters]);

Step3-2:

訓練前基本說明和選項說明

（Now that the network architecture is defined, it can be trained using the CIFAR-10 training data.

First, set up the network training algorithm using thetrainingOptions function.

The network training algorithm uses Stochastic Gradient Descent with Momentum (SGDM)

with an initial learning rate of 0.001.

During training, the initial learning rate is reduced every 8 epochs (1 epoch is defined as one

complete pass through the entire training data set).

The training algorithm is run for 40 epochs.

Note that the training algorithm uses a mini-batch size of 128 images.

If using a GPU for training, this size may need to be lowered due to memory constraints on the GPU.）

（當網路架構定義後，開始在 CIFAR-10 training data. 上訓練。

首先，使用thetrainingOptions function設置網路訓練演算法。該網路訓練演算法採用Stochastic Gradient

Descent with Momentum (SGDM) ，並以0.001為初始學習率

在訓練中，每8epochs，the initial learning rate is reduced ，（1 epoch表示計算完整個training dataset）

此訓練演算法運行40epochs。在此說明的一點，訓練演算法使用a mini-batch size of 128 Images.

如果在GPU訓練，由於GPU上的memory constraints限制，尺寸需要更低。）

設置參數的代碼如下：

% Set the network training options

opts = trainingOptions(sgdm, ...

Momentum, 0.9, ...

InitialLearnRate, 0.001, ...

LearnRateSchedule, piecewise, ...

LearnRateDropFactor, 0.1, ...

LearnRateDropPeriod, 8, ...

L2Regularization, 0.004, ...

MaxEpochs, 40, ...

MiniBatchSize, 128, ...

Verbose, true);

Step 3-3 :訓練網路

（Train the network using the trainNetwork function. This is a computationally intensive process that takes 20-30 minutes to complete.

To save time while running this example, a pre-trained network is loaded from disk.

If you wish to train the network yourself, set the doTraining variable shown below to true.

Note that a CUDA-capable NVIDIA? GPU with compute capability 3.0 or higher is highly

recommeded for training.）

使用trainNetwork function訓練網路。整個訓練過程需要20-30分鐘，

（本例中，直接下載了以前訓練過的網路，如果想重新訓練需要將 doTraining = false;更改為doTraining = 1;）

代碼如下：

% A trained network is loaded from disk to save time when running the

% example. Set this flag to true to train the network.

doTraining = false;

if doTraining

% Train a network.

cifar10Net = trainNetwork(trainingImages, trainingLabels, layers, opts);

else

% Load pre-trained detector for the example.

load(rcnnStopSigns.mat,cifar10Net)

end

Step4:Validate CIFAR-10 Network Training

Step 4-1:

首先，第一個convolutional layers filter weights 的可視化有助於識別在訓練過程中立即出現的問題。

代碼如下：

% Extract the first convolutional layer weights

w = cifar10Net.Layers(2).Weights;

% rescale and resize the weights for better visualization

w = mat2gray(w);

w = imresize(w, [100 100]);

figure

montage(w)

第一層權重有較好地定義結構，如果權重看起來隨機，則需要重新訓練網路。在本演示中，就像上述結果，第一層layer filter已經從CIFAR-10 training data學習出了像邊緣的特徵

（The first layer weights should have some well defined structure. If the weights still look random, then that is an indication that the network may require additional training. In this case, as shown above, the first layer filters have learned edge-like features from the CIFAR-10 training data）

實驗結果如下：

Step4-2 測試訓練出來的網路的accuracy.

為了驗證訓練的結果，使用the CIFAR-10 test data 計算分類精度。如果有較低的精度，表明需要重新訓練或者增加訓練數據。本例的目標不是獲取100%的準確率，為了訓練一個網路能夠有效訓練一個object detector.

(To completely validate the training results, use the CIFAR-10 test data to measure the classification accuracy of the network. A low accuracy score indicates additional training or additional training data is required. The goal of this example is not necessarily to achieve 100% accuracy on the test set, but to sufficiently train a network for use in training an object detector.)

代碼如下：

% Run the network on the test set.

YTest = classify(cifar10Net, testImages);

% Calculate the accuracy.

accuracy = sum(YTest == testLabels)/numel(testLabels)

輸出結果：

accuracy =

0.7456

(After the network is trained, it should be validated to ensure that training was successful. First, a quick visualization of the first convolutional layers filter weights can help identify any immediate issues with training.)

訓練網路結束後，驗證確保訓練是成功

後續有CNN的連載筆記，敬請關注。

（一）工具箱的安裝與測試

（二） Feature extraction using CNN

（三）Perform Transfer Learning to fine-tune a network with your data

（四）Train a Deep Neural Network from Scratch

（五）Object Detection Using Deep Learning

（六）Alexnet各層的解釋與作用

如有任何問題請聯繫我們

您可發送郵件至

dataintellagr@126.com

或關注微博/知乎/微信後台留言

我們期待您的提問！

微博：數據智農

知乎：數據智農

郵箱：dataintellagr@126.com

編輯|楊揚

關注數據智農「CNN Matlab 學習系列」

作者|吳秋峰

新浪微博|neauqfwu