Cousera deeplearning.ai筆記 — 超參數調試、批標準化、多分類、深度學習框架

02-17

這一周的課真是拖拖拉拉學了太久了，所以在大年初一一鼓作氣把筆記也給寫了。課程的內容涉及：超參數調試（Hyperparameter tuning）、批標準化（Batch Normalization）、多分類（Multi-class classification）、深度學習框架（Introduction to programming frameworks）。

超參數調試

Deep learning 涉及的參數非常多，learning rate，layer，hidden unit，batch sizes，當用了rms pro這些還有β要去調整……等等。對於各個超參數的重要性，Andrew給出了他的紅腰型評級，如下圖所示，按照紅色，黃色，橙色，從重要到次之來排序。

然而，事情並不會就這麼簡單的結束了，當參數多了，用嘴粗暴的grid search的話，需要測試的模型兩將等於所有可能參數設置的累乘，正如下圖所示。所以，目前隨機搜索先給出一定的範圍再進一步考慮會更加合適，這裡可以提一句個人的看法，其實在深度學習模型應用的時候，並不要求一定要得到最優的參數才可以，差不多就行了，因為數據量才是真正影響的，換句話來說，在實際生產中，A參數調整10次後可以將命中率提高3%，然而現在已經90%了，這意義就已經不大了，所以當找到較為自信的區間之後，就可以進行局部高密度抽樣。

在對一些如learning rate，β，這些經常是小數範圍的，就不要0.9，0.899……這樣子去調了，直接按數量級來調，調0.0幾的這種調發，不會有太大區別的。哪怕是layer和hidden unit，也不要整個4，5，6這樣子去調，5，10，20，30。注意通過隨機search，找到較優秀的局部進行密集搜索就好。

當然，調參現在分兩種，一種是panda，一種是caviar，換句話來說，前者是時時刻刻守著盯著，看效果怎麼樣，後者就是整很多歌模型，並行計算，說白了這就看各位老闆的財力了，計算資源相對豐富的當然可以用後者。

Batch Normalization

標準化涉及的並不只是標準化輸入數據（每一批數據單獨標準化），是每一層計算的時候都要標準化，但是標準化a還是z，在學術界還有爭議，但是現在來看標準化z的更多。其中，一個batch一個batch的nor，不止梯度下降，其他的優化方法也是同樣奏效的。對於其work的解釋，如同輸入層的時候，輸入的是標準化過得feature而不是feature，降低了隱藏層的分布不穩定性。這裡每一層的均值，都將會用指數平均記錄（按照mini-batch的順序來執行），根據實驗來asure。

對於test time的標準化如何實現，比如如果我們只有很少甚至一個測試集合的時候如何標準化，其實就是講過去每一個批次的標準化結果求和，每一層分別集合每個Batch的最新的均值方差（keep track，其實是模型成熟的時候）求和。這個方法非常robust，所以Andrew並不擔心我們會征程什麼樣子呢。

Multi-class classification

用softmax。這個網上講解非常多了。

對於深度學習框架，Andrew用tenserflow來做介紹，理由如下圖。具體的操作其實編程作業雖然有講解，但是有點簡單了，期待下一門課程~~

這周的編程作業挺有問題的，我把我做的過程中遇到的有問題的部分貼出來，大家可以多去討論區看看具體問題是什麼，也可以評論我來解答。

這周內容特別多，寫的比較粗糙，大家不明白之處可以評論交流哦。

def linear_function(): """ Implements a linear function: Initializes W to be a random tensor of shape (4,3) Initializes X to be a random tensor of shape (3,1) Initializes b to be a random tensor of shape (4,1) Returns: result -- runs the session for Y = WX + b """ np.random.seed(1) ### START CODE HERE ### (4 lines of code) X = tf.constant(np.random.randn(3,1), name = "X") W = tf.constant(np.random.randn(4,3), name = "W") b = tf.constant(np.random.randn(4,1), name = "b") Y = tf.constant(np.random.randn(4,1), name = "Y") ### END CODE HERE ### # Create the session using tf.Session() and run it with sess.run(...) on the variable you want to calculate ### START CODE HERE ### sess = tf.Session() result = sess.run(tf.add(tf.matmul(W,X),b)) ### END CODE HERE ### # close the session sess.close() return resultdef cost(logits, labels): """ Computes the cost using the sigmoid cross entropy Arguments: logits -- vector containing z, output of the last linear unit (before the final sigmoid activation) labels -- vector of labels y (1 or 0) Note: What weve been calling "z" and "y" in this class are respectively called "logits" and "labels" in the TensorFlow documentation. So logits will feed into z, and labels into y. Returns: cost -- runs the session of the cost (formula (2)) """ ### START CODE HERE ### # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines) z = tf.placeholder(tf.float32, name = logits) y = tf.placeholder(tf.float32, name = labels) # Use the loss function (approx. 1 line) cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = z, labels = y) # Create a session (approx. 1 line). See method 1 above. sess = tf.Session() # Run the session (approx. 1 line). cost = sess.run(cost, feed_dict={z:logits,y:labels}) # Close the session (approx. 1 line). See method 1 above. sess.close() ### END CODE HERE ### return costdef one_hot_matrix(labels, C): """ Creates a matrix where the i-th row corresponds to the ith class number and the jth column corresponds to the jth training example. So if example j had a label i. Then entry (i,j) will be 1. Arguments: labels -- vector containing the labels C -- number of classes, the depth of the one hot dimension Returns: one_hot -- one hot matrix """ ### START CODE HERE ### # Create a tf.constant equal to C (depth), name it C. (approx. 1 line) C = tf.constant(C, name = "C") # Use tf.one_hot, be careful with the axis (approx. 1 line) one_hot_matrix = tf.one_hot(labels, C, axis = 0) # Create the session (approx. 1 line) sess = tf.Session() # Run the session (approx. 1 line) one_hot = sess.run(one_hot_matrix) # Close the session (approx. 1 line). See method 1 above. sess.close() ### END CODE HERE ### return one_hotdef forward_propagation(X, parameters): """ Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX Arguments: X -- input dataset placeholder, of shape (input size, number of examples) parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3" the shapes are given in initialize_parameters Returns: Z3 -- the output of the last LINEAR unit """ # Retrieve the parameters from the dictionary "parameters" W1 = parameters[W1] b1 = parameters[b1] W2 = parameters[W2] b2 = parameters[b2] W3 = parameters[W3] b3 = parameters[b3] ### START CODE HERE ### (approx. 5 lines) # Numpy Equivalents: Z1 = tf.add(tf.matmul(W1, X), b1) # Z1 = np.dot(W1, X) + b1 A1 = tf.nn.relu(Z1) # A1 = relu(Z1) Z2 = tf.add(tf.matmul(W2, A1), b2) # Z2 = np.dot(W2, a1) + b2 A2 = tf.nn.relu(Z2) # A2 = relu(Z2) Z3 = tf.add(tf.matmul(W3, A2), b3) # Z3 = np.dot(W3,Z2) + b3 ### END CODE HERE ### return Z3