重拾基礎 - 神經網路和反向傳播

重拾基礎 - 神經網路和反向傳播

人工神經網路

本文以及對應代碼和jupyter-notebook在:

Ceruleanacg/Descent?

github.com圖標

問題設定

考慮一個四分類問題,我們首先生成他們的圖像如下圖所示:

%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as npdata_count = 25x1_points = np.linspace(0, 10, data_count).reshape((-1, 1))x2_points = np.multiply(2, x1_points) + np.random.randint(-10, 10, size=(data_count,)).reshape((-1, 1))x1 = np.concatenate((x1_points, x2_points), axis=1)y1 = np.array([[1, 0, 0, 0]] * data_count)x1_points = np.linspace(1, 10, data_count).reshape((-1, 1))x2_points = np.multiply(-2, x1_points) + np.random.randint(-10, 10, size=(data_count,)).reshape((-1, 1))x2 = np.concatenate((x1_points, x2_points), axis=1)y2 = np.array([[0, 1, 0, 0]] * data_count)x1_points = np.linspace(-1, -10, data_count).reshape((-1, 1))x2_points = np.multiply(2, x1_points) + np.random.randint(-10, 10, size=(data_count,)).reshape((-1, 1))x3 = np.concatenate((x1_points, x2_points), axis=1)y3 = np.array([[0, 0, 1, 0]] * data_count)x1_points = np.linspace(-1, -10, data_count).reshape((-1, 1))x2_points = np.multiply(-2, x1_points) + np.random.randint(-10, 10, size=(data_count,)).reshape((-1, 1))x4 = np.concatenate((x1_points, x2_points), axis=1)y4 = np.array([[0, 0, 0, 1]] * data_count)x_data = np.concatenate((x1, x2, x3, x4))y_data = np.concatenate((y1, y2, y3, y4))plt.figure(figsize=(16, 9))plt.scatter(x1[:, 0], x1[:, 1], marker=x)plt.scatter(x2[:, 0], x2[:, 1], marker=o)plt.scatter(x3[:, 0], x3[:, 1], marker=*)plt.scatter(x4[:, 0], x4[:, 1], marker=p)plt.show()

對於數據集X, YX 是一個 (N, 2) 維的數組,Y 是一個經過獨熱處理(One-hot)的 (N, 4) 維的數組,對於某一個y_i ,它形如:

y_i = [0, 1, 0, 0]

我們希望設計一個人工全連接神經網路,它具有多個隱含層,每個隱含層有多個神經元,在這個問題中,我們設定兩個隱含層,每個隱含層有6個神經元。

Softmax Cross Entropy

對於一個四分類問題,神經網路的輸出層將有4個輸出神經元,將這4個輸出通過一個Softmax函數後計算交叉熵,我們可以寫出如下損失函數:

SoftmaxCrossEntropy left( w^3_{i, j}, w^2_{i, j}, w^1_{i, j} 
ight) = sum^{N}_{i=0} left (sum^{4}_{j=1} y_j cdot - ln p(z_j) 
ight)

其中 p_j(z) 是Softmax函數:

p(z_j) = frac{e^{a^3_j}}{sum^{4}_{k=1}e^{a^3_k}}

與線性回歸與Logistic Regression直接求各個權重的梯度不同,在神經網路中,使用反向傳播演算法求梯度。

反向傳播

我們設計的神經網路是具有一個輸入層,兩個隱含層,一個輸出層的網路,考慮輸出層的第一個神經元的權重w^3_{1, 1} 的梯度,我們將Softmax記作 p(z)

egin{aligned} frac{partial l(W)}{partial w^3_{1, 1}} &= - sum^{N}_{i=0} left( sum^{4}_{j=1} frac{partial l(W)}{partial ln p} cdot frac{partial ln p}{partial a^3_j} cdot frac{partial a^3_j}{partial w^3_{1, 1}} 
ight) \ &= - sum^{N}_{i=0} left( frac{y_1 cdot sum^{4}_{k=1}e^{a^3_k}}{e^{a^3_1}} cdot frac{e^{a^3_1} cdot sum^{4}_{k=1}e^{a^3_k} - left( e^{a^3_1} 
ight )^2}{left (sum^{4}_{k=1}e^{a^3_k} 
ight)^2} + sum^{4}_{j=2} frac{y_j cdot sum^{4}_{k=1}e^{a^3_k}}{e^{a^3_j}} cdot frac{0 - a^3_j cdot a^3_1}{(sum^{4}_{k=1} e^{a^3_k})^2} 
ight) cdot z^3_1 \ &= - sum^{N}_{i=0} left( y_1 cdot (1 - frac{a^3_1}{sum^{4}_{k=1}e^{a^3_k}}) - sum^{4}_{j=2} y_j cdot frac{a^3_1}{sum^{4}_{k=1}e^{a^3_k}} 
ight) cdot z^3_1 \ &= - sum^{N}_{i=0} left( y_1 - (y_1 + y_2 + y_3 + y_4) cdot - frac{a^3_1}{sum^{4}_{k=1}e^{a^3_k}} 
ight) cdot z^3_1 \ &= - sum^{N}_{i=0} left( y_1 - frac{a^3_1}{sum^{4}_{k=1}e^{a^3_k}} 
ight) cdot z^3_1 \ end{aligned}

然後我們考慮第二個隱含層第一個神經元權重 w^2_{1, 1} 的梯度:

sum^{4}_{i=1} left ( frac{partial l(W)}{partial ln p} cdot frac{partial ln p}{partial z^3_i} cdot frac{partial z^3_i}{partial a^2_1} cdot frac{partial a^2_1}{partial z^2_1} 
ight ) cdot frac{partial z^2_1}{partial w^2_{1, 1}}

通過觀察可以發現此項可以在計算輸出的前向傳播過程計算並緩存:

frac{partial a^2_1}{partial z^2_1} cdot frac{partial z^2_1}{partial w^2_{1, 1}}

各層權重的更新是逆層序的,而在計算w^2_{1, 1} 前,已經完成了關於w^3_{i, j} 的計算,而項中:

sum^{4}_{i=1} left ( frac{partial l(W)}{partial ln p} cdot frac{partial ln p}{partial z^3_i} cdot frac{partial z^3_i}{partial a^2_1} 
ight )

即是誤差的反向傳播項,這些項已經在上層的梯度計算中完成並緩存,故在更新本層權重時,直接提取緩存即可。

一個支持:

- 自定義隱含層數

- 自定義神經元數量

- 自定義每層激活函數(Sigmoid、tanh、ReLU、Linear)

- 自定義損失函數(Softmax Cross Entropy、MSE)

- 自定義超參數(學習率、迭代次數、Batch Size)

- 模型的本地化

低效率的模型在倉庫里實現:

Artifical Neuron Network (ANN)?

github.com

我們將直接使用這個模型模擬實驗結果。

import syssys.path.append(../)from sklearn.preprocessing import StandardScalerfrom utility import functionfrom nn.dense import Densex_train = StandardScaler().fit_transform(x_data)y_train = y_dataactivation_funcs = [function.relu] * 2# activation_funcs = [function.sigmoid] * 1activation_funcs.append(function.linear)dense = Dense(x_space=2, y_space=4, hidden_units_list=[6, 6], **{ "loss_func": function.softmax_cross_entropy, "activation_funcs": activation_funcs, "learning_rate": 0.003, "enable_logger": True, "model_name": base, "batch_size": 100, "max_epoch": 1000, model: train,})dense.train(x_data, y_data)

Model saved.Accuracy: 0.390 Epoch: 0 | loss: 138.629275Model saved.Accuracy: 0.610 Epoch: 100 | loss: 127.327139Model saved.Accuracy: 0.860 Epoch: 200 | loss: 103.538487Model saved.Accuracy: 0.710 Epoch: 300 | loss: 96.939174Model saved.Accuracy: 0.920 Epoch: 400 | loss: 92.262520Model saved.Accuracy: 0.910 Epoch: 500 | loss: 92.282685Model saved.Accuracy: 0.910 Epoch: 600 | loss: 92.112034Model saved.Accuracy: 0.910 Epoch: 700 | loss: 91.908351Model saved.Accuracy: 0.910 Epoch: 800 | loss: 91.895093Model saved.Accuracy: 0.910 Epoch: 900 | loss: 91.658254

我們繪製出分類器的結果:

dense.evaluate(x_data, y_data)x1_test = np.linspace(-20, 20, 300)x2_test = np.linspace(-30, 30, 300)x1_mesh, x2_mesh = np.meshgrid(x1_test, x2_test)x_test = np.array([x1_mesh.ravel(), x2_mesh.ravel()]).Ty_test = np.argmax(dense.predict(x_test), axis=1)plt.figure(figsize=(16, 9))plt.pcolormesh(x1_mesh, x2_mesh, y_test.reshape(x1_mesh.shape))plt.scatter(x1[:, 0], x1[:, 1], marker=x)plt.scatter(x2[:, 0], x2[:, 1], marker=o)plt.scatter(x3[:, 0], x3[:, 1], marker=*)plt.scatter(x4[:, 0], x4[:, 1], marker=p)plt.show()

Accuracy: 0.920

後續

  • 說好的梯度下降的改進

推薦閱讀:

《麻省理工科技評論》全球十大突破性技術,阿里巴巴正研究其中4項
CS224N Lecture1 筆記
神經網路告訴我,誰是世界上最「美」的人?
不懂word2vec,還敢說自己是做NLP?

TAG:神經網路 | 機器學習 | 深度學習DeepLearning |