Cousera deeplearning.ai筆記 — 深度神經網路（Deep neural network）

02-27

第一門課終於到了尾聲，到此已經學會簡單的構建一個簡單的深度神經網路了，完整地走了一遍深度神經網路流程，雖然實際應用中一般還是掉包不會具體去實現，因為用python實現的也沒法用~太慢~但走一趟這個流程，不僅更加清晰了解神經網路的機理還有助於日後的調參！！

Like Andrew said「Applying deep learning is a very empirical process.」，下面看看這周講了點什麼。

深度神經網路是指是層數的增多，而不是把節點數增加，其實說通俗點就是層數上去了，能夠把輸入的特徵篩選/集成/分化得更好。Andrew提到，其實邏輯回歸就是淺的不能再淺的神經網路模型，然而沒個五層以上真是不好意思說自己是深度神經網路。

那為什麼要深度（層數）而不要寬度（節點）呢？

circuit theory（電路理論？）很好的解釋了這個問題，如果你增加寬度，需要計算的節點是增加深度的指數倍，極難優化，而且個人感覺，深度對特徵的篩選和放大更加有效。So~問題回答了。

對於深層神經網路的forward propagation、backward propagation和淺層神經網路的是一致的，只是嵌套多了幾層 $W(X)+b$ ,而反向傳播也是相同的，可以得出通用的範式。

Forward propagation

Backward propagation

然而，其中需要特別注意！！！

反向傳播的時候經常要用到正向傳播的A和Z，所以在正向傳播過程中用cache保存起來，這個在編程作業里涉及很多。哈哈，可能你會發現我沒有按照課程的順序寫，但沒法子，我覺得這樣子寫更合適呢。

對了，一直強調向量化，向量化，向量化！！！但這次深層神經網路，多層神經網路傳遞的時候是他認為唯一可以用for loop的地方。

對於矩陣的維度，我心裡默念了一個口訣（右邊進左邊出，進來放右邊），所以W的維度是（本層*上一層），b就是（本層*1）。

好了，Andrew強勢預告一波下次課程的超參數tuning.

超參數：影響了其他參數的參數。Such as 迭代次數，隱藏層數，隱藏節點數，激活函數的選擇。

好了，最後Andrew來一波輕鬆愉快，跟人腦有什麼關係呢？他表達出，他現在也越來越不願意將深度學習和人腦扯上聯繫，當然可能神經網路最開始的時候，如感知機，有參考神經元，受其啟發。很顯然，他趕緊打了那幫瞎說的媒體的臉。個人也非常無奈某些所謂仿生智能演算法，幾條簡單的數學公式扣上一個接地氣有玄乎的名頭，就成新演算法了，有興趣的朋友點贊跟評，我再豐富一下這方面的看法哈哈。

----------------作業內容----------------

這次作業還挺多消息小地方要注意的，代碼里會有我遇到的困難的解釋，但跟上節課重複的我就不展示了~

# Programing assignment 1 Step by stepdef initialize_parameters_deep(layer_dims): for l in range(1, L): ### START CODE HERE ### (≈ 2 lines of code) parameters[W + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*0.01 parameters[b + str(l)] = np.zeros((layer_dims[l],1))*0.01 ### END CODE HERE ### assert(parameters[W + str(l)].shape == (layer_dims[l], layer_dims[l-1])) assert(parameters[b + str(l)].shape == (layer_dims[l], 1)) return parametersdef linear_activation_forward(A_prev, W, b, activation): if activation == "sigmoid": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". ### START CODE HERE ### (≈ 2 lines of code) Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = sigmoid(Z) ### END CODE HERE ### elif activation == "relu": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". ### START CODE HERE ### (≈ 2 lines of code) Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = relu(Z) ### END CODE HERE ### assert (A.shape == (W.shape[0], A_prev.shape[1])) cache = (linear_cache, activation_cache) return A, cachedef L_model_forward(X, parameters): caches = [] A = X L = len(parameters) // 2 # number of layers in the neural network # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list. for l in range(1, L): # 第一次學Python的同學比如我就要注意了，這裡只從1到2！！ A_prev = A ### START CODE HERE ### (≈ 2 lines of code) A, cache = linear_activation_forward(A_prev, parameters[W + str(l)], parameters[b + str(l)], relu) caches.append(cache) ### END CODE HERE ### # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list. ### START CODE HERE ### (≈ 2 lines of code) AL, cache = linear_activation_forward(A, parameters[W + str(l+1)], parameters[b + str(l+1)], sigmoid); caches.append(cache) ### END CODE HERE ### assert(AL.shape == (1,X.shape[1])) return AL, cachesdef linear_activation_backward(dA, cache, activation): if activation == "relu": ### START CODE HERE ### (≈ 2 lines of code) dZ = relu_backward(dA,activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) ### END CODE HERE ### elif activation == "sigmoid": ### START CODE HERE ### (≈ 2 lines of code) dZ = sigmoid_backward(dA,activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) ### END CODE HERE ### return dA_prev, dW, dbdef L_model_backward(AL, Y, caches): # Initializing the backpropagation ### START CODE HERE ### (1 line of code) dAL = -1 * (np.divide(Y,AL) - np.divide(1 - Y, 1 - AL)) ### END CODE HERE ### # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"] ### START CODE HERE ### (approx. 2 lines) current_cache = caches[L-1] grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, activation = "sigmoid") ### END CODE HERE ### for l in reversed(range(L-1)): # lth layer: (RELU -> LINEAR) gradients. # Inputs: "grads["dA" + str(l + 2)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] ### START CODE HERE ### (approx. 5 lines) current_cache = caches[l] dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l + 2)], current_cache, activation = "relu") grads["dA" + str(l + 1)] = dA_prev_temp grads["dW" + str(l + 1)] = dW_temp grads["db" + str(l + 1)] = db_temp ### END CODE HERE ### return gradsdef update_parameters(parameters, grads, learning_rate): L = len(parameters) // 2 # number of layers in the neural network # Update rule for each parameter. Use a for loop. ### START CODE HERE ### (≈ 3 lines of code) for l in range(1,L+1): parameters["W" + str(l)] = parameters["W" + str(l)] - learning_rate*grads["dW" + str(l)] parameters["b" + str(l)] = parameters["b" + str(l)] - learning_rate*grads["db" + str(l)] ### END CODE HERE ### return parameters

第二個就是簡單調用一下第一份作業寫得函數，很easy~~

# Programing assignment 2 Applicationdef L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009 """ Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID. Arguments: X -- data, numpy array of shape (number of examples, num_px * num_px * 3) Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) layers_dims -- list containing the input size and each layer size, of length (number of layers + 1). learning_rate -- learning rate of the gradient descent update rule num_iterations -- number of iterations of the optimization loop print_cost -- if True, it prints the cost every 100 steps Returns: parameters -- parameters learnt by the model. They can then be used to predict. """ np.random.seed(1) costs = [] # keep track of cost # Parameters initialization. ### START CODE HERE ### parameters = initialize_parameters_deep(layers_dims) ### END CODE HERE ### # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID. ### START CODE HERE ### (≈ 1 line of code) AL, caches = L_model_forward(X, parameters) ### END CODE HERE ### # Compute cost. ### START CODE HERE ### (≈ 1 line of code) cost = compute_cost(AL, Y) ### END CODE HERE ### # Backward propagation. ### START CODE HERE ### (≈ 1 line of code) grads = L_model_backward(AL, Y, caches) ### END CODE HERE ### # Update parameters. ### START CODE HERE ### (≈ 1 line of code) parameters = update_parameters(parameters, grads, learning_rate) ### END CODE HERE ### # Print the cost every 100 training example if print_cost and i % 100 == 0: print ("Cost after iteration %i: %f" %(i, cost)) if print_cost and i % 100 == 0: costs.append(cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel(cost) plt.xlabel(iterations (per tens)) plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters

第一波課程筆記走完了，關注的朋友越來越多。看到這還不贊一下~~~哈哈謝謝支持