Caffe2 教程--5. A Toy Regression

04-01

tags: [Deep Learning]

Caffe2 教程

本教程來自Caffe2官網

Caffe2 官網：https://caffe2.ai/

Caffe2 github： https://github.com/caffe2/caffe2

翻譯與整理：張天亮

郵箱：tianliangjay@gmail.com

Blog：https://xingkongliang.github.io/blog/

本教程包含6個部分：

Caffe2常用函數（workspace，operators，nets）
圖像載入和預處理
載入預訓練的模型
Python Op教程
一個簡單的回歸模型
MNIST數據集的LeNet網路

5. A Toy Regression

這是一個簡單的回歸教程。

我們處理的這個問題非常簡單，有兩維的數據輸入x和一維的數據輸出y，權重向量w=[2.0, 1.5]和偏差b=0.5。這個等式產生ground truth：

$y=wx+b$

我們將在Caffe2 Operator中寫出每一個數學運算。如果你的演算法是相對標準的，比如CNN模型，這往往是一種過分注重細節的行為。在MNIST教程中，我們將演示如何使用CNN模型helper更容易地構建模型。

from caffe2.python import core, cnn, net_drawer, workspace, visualizeimport numpy as npfrom IPython import displayfrom matplotlib import pyplot

聲明計算圖

我們聲明了兩個計算圖：一個用於初始化我們將用於計算中的各種參數和常量，另一個用於運行隨機梯度下降的主圖。

首先我們初始化網路：注意這個名字並不重要，我們基本上想把初始化代碼放在一個網路中，這樣我們就可以調用RunNetOnce()去執行它。我們有一個單獨的init_net的原因是，這些operators不需要為整個訓練過程運行多次。

init_net = core.Net("init")# The ground truth parameters.W_gt = init_net.GivenTensorFill( [], "W_gt", shape=[1, 2], values=[2.0, 1.5])B_gt = init_net.GivenTensorFill([], "B_gt", shape=[1], values=[0.5])# Constant value ONE is used in weighted sum when updating parameters.ONE = init_net.ConstantFill([], "ONE", shape=[1], value=1.)# ITER is the iterator count.ITER = init_net.ConstantFill([], "ITER", shape=[1], value=0, dtype=core.DataType.INT32)# For the parameters to be learned: we randomly initialize weight# from [-1, 1] and init bias with 0.0.W = init_net.UniformFill([], "W", shape=[1, 2], min=-1., max=1.)B = init_net.ConstantFill([], "B", shape=[1], value=0.0)print(Created init net.)

主要的訓練網路定義如下。我們將通過多個步驟展示創建的內容。

- 產生損失的正向傳播

- 通過自動求導產生的反向傳播

- 參數更新部分，這是一個標準的SGD

train_net = core.Net("train")# First, we generate random samples of X and create the ground truth.X = train_net.GaussianFill([], "X", shape=[64, 2], mean=0.0, std=1.0, run_once=0)Y_gt = X.FC([W_gt, B_gt], "Y_gt")# We add Gaussian noise to the ground truthnoise = train_net.GaussianFill([], "noise", shape=[64, 1], mean=0.0, std=1.0, run_once=0)Y_noise = Y_gt.Add(noise, "Y_noise")# Note that we do not need to propagate the gradients back through Y_noise,# so we mark StopGradient to notify the auto differentiating algorithm# to ignore this path.Y_noise = Y_noise.StopGradient([], "Y_noise")# Now, for the normal linear regression prediction, this is all we need.Y_pred = X.FC([W, B], "Y_pred")# The loss function is computed by a squared L2 distance, and then averaged# over all items in the minibatch.dist = train_net.SquaredL2Distance([Y_noise, Y_pred], "dist")loss = dist.AveragedLoss([], ["loss"])

網路可視化

現在，我們看一眼整個網路。從下圖，可以看出它主要由四個部分組成：

- 為次批次隨機生成X（GaussianFill生成X）

- 使用Wgt，Bgt和FC operator來生成ground truth Y_gt

- 使用當前的參數W和B進行預測

- 比較輸出並計算損失

graph = net_drawer.GetPydotGraph(train_net.Proto().op, "train", rankdir="LR")display.Image(graph.create_png(), width=800)

現在，類似於所有其他框架，Caffe2允許我們自動生成梯度operators。我們可視化看看。

# Get gradients for all the computations above.gradient_map = train_net.AddGradientOperators([loss])graph = net_drawer.GetPydotGraph(train_net.Proto().op, "train", rankdir="LR")display.Image(graph.create_png(), width=800)

一旦我們獲得了參數的梯度，我們將添加圖的SGD部分：獲取當前步的學習率，然後進行參數更新。在這個例子中，我們沒有做任何事情：只是簡單的SGDs。

# Increment the iteration by one.train_net.Iter(ITER, ITER)# Compute the learning rate that corresponds to the iteration.LR = train_net.LearningRate(ITER, "LR", base_lr=-0.1, policy="step", stepsize=20, gamma=0.9)# Weighted sum train_net.WeightedSum([W, ONE, gradient_map[W], LR], W)train_net.WeightedSum([B, ONE, gradient_map[B], LR], B)# Lets show the graph again.graph = net_drawer.GetPydotGraph(train_net.Proto().op, "train", rankdir="LR")display.Image(graph.create_png(), width=800)

創建網路

現在我們已經創建了這個網路，讓我們運行他們。

workspace.RunNetOnce(init_net)workspace.CreateNet(train_net)

在我們開始任何訓練迭代之前，讓我們看看參數。

print("Before training, W is: {}".format(workspace.FetchBlob("W")))print("Before training, B is: {}".format(workspace.FetchBlob("B")))

Output:

Before training, W is: [[-0.905963 -0.21433014]]Before training, B is: [0.]

for i in range(100): workspace.RunNet(train_net.Proto().name)

現在，讓我們看一下訓練之後的參數。

print("After training, W is: {}".format(workspace.FetchBlob("W")))print("After training, B is: {}".format(workspace.FetchBlob("B")))print("Ground truth W is: {}".format(workspace.FetchBlob("W_gt")))print("Ground truth B is: {}".format(workspace.FetchBlob("B_gt")))

Output:

After training, W is: [[2.011532 1.4848436]]After training, B is: [0.49105117]Ground truth W is: [[2. 1.5]]Ground truth B is: [0.5]

運行網路權重變化可視化

看起來很簡單吧？然我們仔細看看訓練步驟中參數更新的進展情況。為此，讓我們重新初始化參數，並且查看在迭代中參數的變化。請記住，我們可以隨時從工作區中獲取Blob。

workspace.RunNetOnce(init_net)w_history = []b_history = []for i in range(50): workspace.RunNet(train_net.Proto().name) w_history.append(workspace.FetchBlob("W")) b_history.append(workspace.FetchBlob("B"))w_history = np.vstack(w_history)b_history = np.vstack(b_history)pyplot.plot(w_history[:, 0], w_history[:, 1], r)pyplot.axis(equal)pyplot.xlabel(w_0)pyplot.ylabel(w_1)pyplot.grid(True)pyplot.figure()pyplot.plot(b_history)pyplot.xlabel(iter)pyplot.ylabel(b)pyplot.grid(True)pyplot.show()

你可以觀察到隨機梯度下降的非常典型的行為：由於雜訊，整個訓練中參數波動很大。