Cousera deeplearning.ai筆記 — 深度神經網路(Deep neural network)

第一門課終於到了尾聲,到此已經學會簡單的構建一個簡單的深度神經網路了,完整地走了一遍深度神經網路流程,雖然實際應用中一般還是掉包不會具體去實現,因為用python實現的也沒法用~太慢~但走一趟這個流程,不僅更加清晰了解神經網路的機理還有助於日後的調參!!

Like Andrew said「Applying deep learning is a very empirical process.」,下面看看這周講了點什麼。

深度神經網路是指是層數的增多,而不是把節點數增加,其實說通俗點就是層數上去了,能夠把輸入的特徵篩選/集成/分化得更好。Andrew提到,其實邏輯回歸就是淺的不能再淺的神經網路模型,然而沒個五層以上真是不好意思說自己是深度神經網路。

那為什麼要深度(層數)而不要寬度(節點)呢?

circuit theory(電路理論?)很好的解釋了這個問題,如果你增加寬度,需要計算的節點是增加深度的指數倍,極難優化,而且個人感覺,深度對特徵的篩選和放大更加有效。So~問題回答了。

對於深層神經網路的forward propagation、backward propagation和淺層神經網路的是一致的,只是嵌套多了幾層 W(X)+b ,而反向傳播也是相同的,可以得出通用的範式。

Forward propagation

Backward propagation

然而,其中需要特別注意!!!

反向傳播的時候經常要用到正向傳播的A和Z,所以在正向傳播過程中用cache保存起來,這個在編程作業里涉及很多。哈哈,可能你會發現我沒有按照課程的順序寫,但沒法子,我覺得這樣子寫更合適呢。

對了,一直強調向量化,向量化,向量化!!!但這次深層神經網路,多層神經網路傳遞的時候是他認為唯一可以用for loop的地方。

對於矩陣的維度,我心裡默念了一個口訣(右邊進左邊出,進來放右邊),所以W的維度是(本層*上一層),b就是(本層*1)。

好了,Andrew強勢預告一波下次課程的超參數tuning.

超參數:影響了其他參數的參數。Such as 迭代次數,隱藏層數,隱藏節點數,激活函數的選擇。

好了,最後Andrew來一波輕鬆愉快,跟人腦有什麼關係呢?他表達出,他現在也越來越不願意將深度學習和人腦扯上聯繫,當然可能神經網路最開始的時候,如感知機,有參考神經元,受其啟發。很顯然,他趕緊打了那幫瞎說的媒體的臉。個人也非常無奈某些所謂仿生智能演算法,幾條簡單的數學公式扣上一個接地氣有玄乎的名頭,就成新演算法了,有興趣的朋友點贊跟評,我再豐富一下這方面的看法哈哈。

----------------作業內容----------------

這次作業還挺多消息小地方要注意的,代碼里會有我遇到的困難的解釋,但跟上節課重複的我就不展示了~

# Programing assignment 1 Step by stepdef initialize_parameters_deep(layer_dims): for l in range(1, L): ### START CODE HERE ### (≈ 2 lines of code) parameters[W + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*0.01 parameters[b + str(l)] = np.zeros((layer_dims[l],1))*0.01 ### END CODE HERE ### assert(parameters[W + str(l)].shape == (layer_dims[l], layer_dims[l-1])) assert(parameters[b + str(l)].shape == (layer_dims[l], 1)) return parametersdef linear_activation_forward(A_prev, W, b, activation): if activation == "sigmoid": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". ### START CODE HERE ### (≈ 2 lines of code) Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = sigmoid(Z) ### END CODE HERE ### elif activation == "relu": # Inputs: "A_prev, W, b". Outputs: "A, activation_cache". ### START CODE HERE ### (≈ 2 lines of code) Z, linear_cache = linear_forward(A_prev, W, b) A, activation_cache = relu(Z) ### END CODE HERE ### assert (A.shape == (W.shape[0], A_prev.shape[1])) cache = (linear_cache, activation_cache) return A, cachedef L_model_forward(X, parameters): caches = [] A = X L = len(parameters) // 2 # number of layers in the neural network # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list. for l in range(1, L): # 第一次學Python的同學比如我就要注意了,這裡只從1到2!! A_prev = A ### START CODE HERE ### (≈ 2 lines of code) A, cache = linear_activation_forward(A_prev, parameters[W + str(l)], parameters[b + str(l)], relu) caches.append(cache) ### END CODE HERE ### # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list. ### START CODE HERE ### (≈ 2 lines of code) AL, cache = linear_activation_forward(A, parameters[W + str(l+1)], parameters[b + str(l+1)], sigmoid); caches.append(cache) ### END CODE HERE ### assert(AL.shape == (1,X.shape[1])) return AL, cachesdef linear_activation_backward(dA, cache, activation): if activation == "relu": ### START CODE HERE ### (≈ 2 lines of code) dZ = relu_backward(dA,activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) ### END CODE HERE ### elif activation == "sigmoid": ### START CODE HERE ### (≈ 2 lines of code) dZ = sigmoid_backward(dA,activation_cache) dA_prev, dW, db = linear_backward(dZ, linear_cache) ### END CODE HERE ### return dA_prev, dW, dbdef L_model_backward(AL, Y, caches): # Initializing the backpropagation ### START CODE HERE ### (1 line of code) dAL = -1 * (np.divide(Y,AL) - np.divide(1 - Y, 1 - AL)) ### END CODE HERE ### # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"] ### START CODE HERE ### (approx. 2 lines) current_cache = caches[L-1] grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, activation = "sigmoid") ### END CODE HERE ### for l in reversed(range(L-1)): # lth layer: (RELU -> LINEAR) gradients. # Inputs: "grads["dA" + str(l + 2)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] ### START CODE HERE ### (approx. 5 lines) current_cache = caches[l] dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l + 2)], current_cache, activation = "relu") grads["dA" + str(l + 1)] = dA_prev_temp grads["dW" + str(l + 1)] = dW_temp grads["db" + str(l + 1)] = db_temp ### END CODE HERE ### return gradsdef update_parameters(parameters, grads, learning_rate): L = len(parameters) // 2 # number of layers in the neural network # Update rule for each parameter. Use a for loop. ### START CODE HERE ### (≈ 3 lines of code) for l in range(1,L+1): parameters["W" + str(l)] = parameters["W" + str(l)] - learning_rate*grads["dW" + str(l)] parameters["b" + str(l)] = parameters["b" + str(l)] - learning_rate*grads["db" + str(l)] ### END CODE HERE ### return parameters

第二個就是簡單調用一下第一份作業寫得函數,很easy~~

# Programing assignment 2 Applicationdef L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009 """ Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID. Arguments: X -- data, numpy array of shape (number of examples, num_px * num_px * 3) Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) layers_dims -- list containing the input size and each layer size, of length (number of layers + 1). learning_rate -- learning rate of the gradient descent update rule num_iterations -- number of iterations of the optimization loop print_cost -- if True, it prints the cost every 100 steps Returns: parameters -- parameters learnt by the model. They can then be used to predict. """ np.random.seed(1) costs = [] # keep track of cost # Parameters initialization. ### START CODE HERE ### parameters = initialize_parameters_deep(layers_dims) ### END CODE HERE ### # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID. ### START CODE HERE ### (≈ 1 line of code) AL, caches = L_model_forward(X, parameters) ### END CODE HERE ### # Compute cost. ### START CODE HERE ### (≈ 1 line of code) cost = compute_cost(AL, Y) ### END CODE HERE ### # Backward propagation. ### START CODE HERE ### (≈ 1 line of code) grads = L_model_backward(AL, Y, caches) ### END CODE HERE ### # Update parameters. ### START CODE HERE ### (≈ 1 line of code) parameters = update_parameters(parameters, grads, learning_rate) ### END CODE HERE ### # Print the cost every 100 training example if print_cost and i % 100 == 0: print ("Cost after iteration %i: %f" %(i, cost)) if print_cost and i % 100 == 0: costs.append(cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel(cost) plt.xlabel(iterations (per tens)) plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters

第一波課程筆記走完了,關注的朋友越來越多。看到這還不贊一下~~~哈哈謝謝支持


推薦閱讀:

TAG:機器學習 | 深度學習DeepLearning | 人工智慧 |