純Python實現鳶尾屬植物數據集神經網路模型

08-16

來自專欄我是程序員7 人贊了文章

摘要：本文以Python代碼完成整個鸞尾花圖像分類任務，沒有調用任何的數據包，適合新手閱讀理解，並動手實踐體驗下機器學習方法的大致流程。

嘗試使用過各大公司推出的植物識別APP嗎？比如微軟識花、花伴侶等這些APP。當你看到一朵不知道學名的花時，只需要打開植物識別APP，拍攝一張你所想辨認的植物照片並上傳，APP會自動識別出該花的品種及詳細介紹，感覺手機中裝了一個知識淵博的生物學家，是不是很神奇？其實，背後的原理很簡單，是一個圖像分類的過程，將上傳的圖像與手機中預存的數據集或聯網數據進行匹配，將其分類到對應的類別即可。隨著深度學習方法的應用，圖像分類的精度越來越高，在部分數據集上已經超越了人眼的能力。

相對於傳統神經網路的方法而言，深度學習方法一般對數據集規模、硬體平台有著比較高的要求，如果只是單純的想嘗試了解圖像分類任務的基本流程，建議採用小數據集樣本及傳統的神經網路方法實現。本文將帶領讀者採用鳶尾屬植物數據集（Iris Data Set）來實現一個分類任務，整個鳶尾屬植物數據集是機器學習中歷史悠久的數據集，比現在常用的數字手寫體數據集（Mnist Data Set）數據集還要早得多，該數據集來源於英國著名的統計學家、生物學家Ronald Fiser。本文在不使用相關軟體庫的情況下，從頭開始構建針對鳶尾屬植物數據的神經網路模型，對其進行訓練並獲得好的結果。

鳶尾屬植物數據集是用於測試機器學習演算法的最常用數據集。該數據包含四種特徵，萼片長度、萼片寬度、花瓣長度和花瓣寬度，用於鳶尾屬植物的不同物種（versicolor,virginica和setosa）。此外，每個物種有50個實例（數據行），下面讓我們看看樣本數據分布情況。

我們將在這個數據集上使用神經網路構建分類模型。為了簡單起見，使用花瓣長度和花瓣寬度作為特徵，且只有兩類物種：versicolor和virginica。下面就讓我們在Python中逐步訓練針對該樣本數據集的神經網路：

步驟1：準備鳶尾屬植物數據集

將Iris數據集導入python並對數據進行子集劃分以保留行之間的相關性：

#import librariesimport osimport pandas as pd#Set working directory and load dataos.chdir(C:\Users\rohan\Documents\Analytics\Data)iris = pd.read_csv(iris.csv)#Create numeric classes for species (0,1,2) iris.loc[iris[Name]==virginica,species]=0iris.loc[iris[Name]==versicolor,species]=1iris.loc[iris[Name]==setosa,species] = 2iris = iris[iris[species]!=2]#Create Input and Output columnsX = iris[[PetalLength, PetalWidth]].values.TY = iris[[species]].values.TY = Y.astype(uint8)#Make a scatter plotplt.scatter(X[0, :], X[1, :], c=Y[0,:], s=40, cmap=plt.cm.Spectral);plt.title("IRIS DATA | Blue - Versicolor, Red - Virginica ")plt.xlabel(Petal Length)plt.ylabel(Petal Width)plt.show()

藍色點代表Versicolor物種，紅色點代表Virginica物種。本文構建的神經網路將在這些數據上進行訓練，以期最後能正確地分類物種。

步驟2：初始化參數（權重和偏置）

下面構建一個具有單個隱藏層的神經網路。此外，將隱藏圖層的大小設置為6：

def initialize_parameters(n_x, n_h, n_y): np.random.seed(2) # we set up a seed so that our output matches ours although the initialization is random. W1 = np.random.randn(n_h, n_x) * 0.01 #weight matrix of shape (n_h, n_x) b1 = np.zeros(shape=(n_h, 1)) #bias vector of shape (n_h, 1) W2 = np.random.randn(n_y, n_h) * 0.01 #weight matrix of shape (n_y, n_h) b2 = np.zeros(shape=(n_y, 1)) #bias vector of shape (n_y, 1) #store parameters into a dictionary parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters

步驟3：前向傳播（forward propagation）

在前向傳播過程中，使用tanh激活函數作為第一層的激活函數，使用sigmoid激活函數作為第二層的激活函數：

def forward_propagation(X, parameters):#retrieve intialized parameters from dictionary W1 = parameters[W1] b1 = parameters[b1] W2 = parameters[W2] b2 = parameters[b2] # Implement Forward Propagation to calculate A2 (probability) Z1 = np.dot(W1, X) + b1 A1 = np.tanh(Z1) #tanh activation function Z2 = np.dot(W2, A1) + b2 A2 = 1/(1+np.exp(-Z2)) #sigmoid activation function cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2} return A2, cache

步驟4：計算代價函數（cost function）

目標是使得計算的代價函數小化，本文採用交叉熵（cross-entropy）作為代價函數：

def compute_cost(A2, Y, parameters): m = Y.shape[1] # number of training examples # Retrieve W1 and W2 from parameters W1 = parameters[W1] W2 = parameters[W2] # Compute the cross-entropy cost logprobs = np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)) cost = - np.sum(logprobs) / m return cost

步驟5：反向傳播（back propagation）

計算反向傳播過程，主要是計算代價函數的導數：

def backward_propagation(parameters, cache, X, Y):# Number of training examples m = X.shape[1] # First, retrieve W1 and W2 from the dictionary "parameters".W1 = parameters[W1] W2 = parameters[W2] ### END CODE HERE ### # Retrieve A1 and A2 from dictionary "cache". A1 = cache[A1] A2 = cache[A2] # Backward propagation: calculate dW1, db1, dW2, db2. dZ2= A2 - Y dW2 = (1 / m) * np.dot(dZ2, A1.T) db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True) dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2)) dW1 = (1 / m) * np.dot(dZ1, X.T) db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)grads = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2} return grads

步驟6：更新參數

使用反向傳播過程中計算的梯度來更新權重和偏置：

def update_parameters(parameters, grads, learning_rate=1.2):# Retrieve each parameter from the dictionary "parameters"W1 = parameters[W1] b1 = parameters[b1] W2 = parameters[W2] b2 = parameters[b2] # Retrieve each gradient from the dictionary "grads" dW1 = grads[dW1] db1 = grads[db1] dW2 = grads[dW2] db2 = grads[db2] # Update rule for each parameter W1 = W1 - learning_rate * dW1 b1 = b1 - learning_rate * db1 W2 = W2 - learning_rate * dW2 b2 = b2 - learning_rate * db2 parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters

步驟7：建立神經網路

將以上所有函數組合起來以創建設計的神經網路模型。總而言之，下面是模型函數的整體順序：

初始化參數
前向傳播
計算代價函數
反向傳播
更新參數

def nn_model(X, Y, n_h, num_iterations=10000, print_cost=False):np.random.seed(3) n_x = layer_sizes(X, Y)[0] n_y = layer_sizes(X, Y)[2] # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".parameters = initialize_parameters(n_x, n_h, n_y) W1 = parameters[W1] b1 = parameters[b1] W2 = parameters[W2] b2 = parameters[b2] # Loop (gradient descent)for i in range(0, num_iterations): # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache". A2, cache = forward_propagation(X, parameters) # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost". cost = compute_cost(A2, Y, parameters) # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads". grads = backward_propagation(parameters, cache, X, Y) # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters". parameters = update_parameters(parameters, grads) ### END CODE HERE ### # Print the cost every 1000 iterations if print_cost and i % 1000 == 0: print ("Cost after iteration %i: %f" % (i, cost))return parameters,n_h

步驟8：跑動模型

將隱藏層節點設置為6，最大迭代次數設置為10,000次，並每隔1000次列印出訓練的結果：

parameters = nn_model(X,Y , n_h = 6, num_iterations=10000, print_cost=True)

步驟9：畫出分類邊界

def plot_decision_boundary(model, X, y): # Set min and max values and give it some padding x_min, x_max = X[0, :].min() - 0.25, X[0, :].max() + 0.25 y_min, y_max = X[1, :].min() - 0.25, X[1, :].max() + 0.25 h = 0.01 # Generate a grid of points with distance h between them xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) # Predict the function value for the whole grid Z = model(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) # Plot the contour and training examples plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral) plt.ylabel(x2) plt.xlabel(x1) plt.scatter(X[0, :], X[1, :], c=y, cmap=plt.cm.Spectral)plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y[0,:])plt.title("Decision Boundary for hidden layer size " + str(6))plt.xlabel(Petal Length)plt.ylabel(Petal Width)

從圖中可以觀察到，只有四個點被錯誤分類。雖然我們可以調整模型來進一步地提高模型訓練精度，但該些操作顯然會導致過擬合現象的出現。

資源

https://www.coursera.org/specializations/deep-learning

以上為譯文，由阿里云云棲社區組織翻譯。

譯文鏈接

文章原標題《Neural network on iris data》

譯者：海棠，審校：Uncle_LLD。

文章為簡譯，更為詳細的內容，請查看原文。

更多技術乾貨敬請關注云棲社區知乎機構號：阿里云云棲社區 - 知乎

本文為雲棲社區原創內容，未經允許不得轉載。