Cousera deeplearning.ai筆記 — 淺層神經網路(Shallow neural network)


抽象來說,每一個神經元都可以歸納為, z=W^{T}X+b, a=sigma(z) ,接受前部分輸入,進行線性運算,之後通過激活函數運算,得到此神經元的輸出,為所連接的後方神經元服務。



但在這裡我寫一個我自己的東西,在Andrew的Machine Learning課程里,我其實學過神經網路,但是我一直將神經網路的反向傳播理解成了,誤差的反向傳導調整各參數,然而並不是那麼簡單滴,這確實是我之前的淺見。

其實,細心的朋友(我不是)可以看得出神經網路預測,即正向傳播,的過程其實是一種套了一層一層有一層的複合函數求解過程,每一層都帶了一個 W, b 參數。那麼,反向傳播要去優化 W, b 。通過對loss函數求 W, b 等參數的偏導得到能使得loss函數導數為零,達到預測值和實際值差距最小的參數組合。那麼求 W, b 等參數的偏導過程,總得一層一層剖開來求吧。所以,反向傳播的實質更像複合函數求導,但不是像我們高數中常見的拆分到底,只拆到對應 W, b 等參數的那層。


好了,After that,還有重要的兩件事情



第二個問題:如何選擇,總結課程「除非你的輸出做零一分類,永遠別用sigmoid了,用tanh。默認的是relu,或者leaky relu也挺好。」

第三件重要的事情:初始化,1. 堅決不能將各個參數初始化為零。 2. 盡量初始化得很小,因為在那個時候,激活函數值的斜率很大,能夠很好地正傳反傳。但是,有時候也會考慮很大的值哦,以後的課再講。



### START CODE HERE ### (≈ 3 lines of code)shape_X = X.shapeshape_Y = Y.shapem = shape_X[1] # training set size### END CODE HERE ###def layer_sizes(X,Y): ### START CODE HERE ### (≈ 3 lines of code) n_x = X.shape[0] # size of input layer n_h = 4 n_y = Y.shape[0] # size of output layer ### END CODE HERE ###return (n_x, n_h, n_y)def initialize_parameters(n_x, n_h, n_y): ### START CODE HERE ### (≈ 4 lines of code) W1 = np.random.randn(n_h,n_x)*0.01 b1 = np.zeros((n_h,1))*0.01 W2 = np.random.randn(n_y,n_h)*0.01 b2 = np.zeros((n_y,1))*0.01 ### END CODE HERE ### return parametersdef forward_propagation(X, parameters): ### START CODE HERE ### (≈ 4 lines of code) W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] ### END CODE HERE ### # Implement Forward Propagation to calculate A2 (probabilities) ### START CODE HERE ### (≈ 4 lines of code) Z1 =,X)+b1 A1 = np.tanh(Z1) Z2 =,A1)+b2 A2 = sigmoid(Z2) ### END CODE HERE ### return A2, cachedef compute_cost(A2, Y, parameters): ### START CODE HERE ### (≈ 2 lines of code) logprobs = None cost = -1*(,Y.T),(1-Y).T))/m ### END CODE HERE ### return costdef backward_propagation(parameters, cache, X, Y): # First, retrieve W1 and W2 from the dictionary "parameters". ### START CODE HERE ### (≈ 2 lines of code) W1 = parameters[W1] W2 = parameters[W2] ### END CODE HERE ### # Retrieve also A1 and A2 from dictionary "cache". ### START CODE HERE ### (≈ 2 lines of code) A1 = cache[A1] A2 = cache[A2] ### END CODE HERE ### # Backward propagation: calculate dW1, db1, dW2, db2. ### START CODE HERE ### (≈ 6 lines of code, corresponding to 6 equations on slide above) dZ2 = A2-Y dW2 =,A1.T)/m db2 = np.sum(dZ2,axis=1,keepdims=True)/m dZ1 =,dZ2)*(1-np.power(A1,2)) dW1 =,X.T)/m db1 = np.sum(dZ1,axis=1,keepdims=True)/m ### END CODE HERE ### return gradsdef update_parameters(parameters, grads, learning_rate = 1.2): # Retrieve each gradient from the dictionary "grads" ### START CODE HERE ### (≈ 4 lines of code) dW1 = grads["dW1"] db1 = grads["db1"] dW2 = grads["dW2"] db2 = grads["db2"] ## END CODE HERE ### # Update rule for each parameter ### START CODE HERE ### (≈ 4 lines of code) W1 = W1-learning_rate*dW1 b1 = b1-learning_rate*db1 W2 = W2-learning_rate*dW2 b2 = b2-learning_rate*db2 ### END CODE HERE ### return parametersdef nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False): """ Arguments: X -- dataset of shape (2, number of examples) Y -- labels of shape (1, number of examples) n_h -- size of the hidden layer num_iterations -- Number of iterations in gradient descent loop print_cost -- if True, print the cost every 1000 iterations Returns: parameters -- parameters learnt by the model. They can then be used to predict. """ np.random.seed(3) n_x = layer_sizes(X, Y)[0] n_y = layer_sizes(X, Y)[2] # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters". ### START CODE HERE ### (≈ 5 lines of code) parameters = initialize_parameters(n_x,n_h,n_y) W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] ### END CODE HERE ### # Loop (gradient descent) for i in range(0, num_iterations): ### START CODE HERE ### (≈ 4 lines of code) # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache". A2, cache = forward_propagation(X, parameters) # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost". cost = compute_cost(A2, Y, parameters) # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads". grads = backward_propagation(parameters, cache, X, Y) # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters". parameters = update_parameters(parameters, grads) ### END CODE HERE ### # Print the cost every 1000 iterations if print_cost and i % 1000 == 0: print ("Cost after iteration %i: %f" %(i, cost)) return parametersdef predict(parameters, X): # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold. ### START CODE HERE ### (≈ 2 lines of code) A2, cache = forward_propagation(X,parameters) predictions = (A2>0.5) ### END CODE HERE ### return predictions


