神經網路的學習 / 訓練過程

01-25

以下為部分學習筆記。

具體實現代碼參考: https://github.com/wrymax/machine-learning-assignments/tree/master/week5/machine-learning-ex4/ex4

Cost Function 代價函數
1. Important Parameters:
  1. L => Total number of layers in network
  2. Sl => Number of units ( not counting bias unit ) in layer l
  3. As below, L = 4, S1 = 3, S2 = 5, S4 = SL = 4
2. Two Classification Methods
  1. Binary Classification 二元分類
    1. y = 0 or 1
    2. SL = K = 1 ( One output unit )
  2. Multi-class Classification 多元分類
    1. y is logical vectors, which uses 1 to denote the class
    2. SL = K, K >= 3 ( K output units )
3. The Cost Function
  1. J(theta) sum up the cost function in logistic regression of ALL Layers.
  2. Regularisation sum up all Theta elements between each two layers.
Back Propagation 向後傳播
1. Compute Gradient 用於計算梯度( CostFunction對Theta的偏微分 )
2. Algorithm 演算法解釋
Back Propagation in Practice 向後傳播實踐技巧
1. Learning Algorithm 學習演算法
  1. initialTheta
  2. costFunction
2. Unrolling Parameters 展開參數
  1. Change matrices into vectors
  2. Change vectors into matrices
3. Gradient Checking 梯度檢查
  1. Use numerical estimate method to compute derivatives
  2. Pros:
    1. It can check is derivatives are correct
  3. Cons:
    1. It is super slow.
    2. When you make sure back propagation gives similar values as gradient, just turn off it.
    3. Be sure to disable gradient checking code before training your classifier. Or the training process would be super slow.
4. Random Initialisation 隨機初始化
  1. 「Zero Initialisation" does not work in neural network.
  2. Random Initialisation: Symmetry breaking
5. Put things together
  1. Training a neural network
    1. Pick a network architecture
      1. Number of input units: Dimension of features x(i)
      2. Number of output units: Number of Classes
      3. Layers:
        
        Number of layers
        
        Units in each layer
        
        Same units number in each layer
        
        Usually the more units the better
    2. Randomly initialise weights
      1. Small values near zero
    3. Implement forward propagation to get prediction for any x(i)
    4. Implement code to compote cose function J(theta)
    5. Implement backprop to compute partial derivatives of J(theta)
      1. for i = 1:m
        
        Perform forward propagation and back-propagation using example (x(i), y(i))
        
        Get activations a(l) and delta(l) for l = 2,…,L