神經網路的學習 / 訓練過程
01-25
以下為部分學習筆記。
具體實現代碼參考: https://github.com/wrymax/machine-learning-assignments/tree/master/week5/machine-learning-ex4/ex4
- Cost Function 代價函數
- Important Parameters:
- L => Total number of layers in network
- Sl => Number of units ( not counting bias unit ) in layer l
- As below, L = 4, S1 = 3, S2 = 5, S4 = SL = 4
- Two Classification Methods
- Binary Classification 二元分類
- y = 0 or 1
- SL = K = 1 ( One output unit )
- Multi-class Classification 多元分類
- y is logical vectors, which uses 1 to denote the class
- SL = K, K >= 3 ( K output units )
- Binary Classification 二元分類
- The Cost Function
- J(theta) sum up the cost function in logistic regression of ALL Layers.
- Regularisation sum up all Theta elements between each two layers.
- Important Parameters:
- Back Propagation 向後傳播
- Compute Gradient 用於計算梯度( CostFunction對Theta的偏微分 )
- Algorithm 演算法解釋
- Back Propagation in Practice 向後傳播實踐技巧
- Learning Algorithm 學習演算法
- initialTheta
- costFunction
- Unrolling Parameters 展開參數
- Change matrices into vectors
- Change vectors into matrices
- Gradient Checking 梯度檢查
- Use numerical estimate method to compute derivatives
- Pros:
- It can check is derivatives are correct
- Cons:
- It is super slow.
- When you make sure back propagation gives similar values as gradient, just turn off it.
- Be sure to disable gradient checking code before training your classifier. Or the training process would be super slow.
- Random Initialisation 隨機初始化
- 「Zero Initialisation" does not work in neural network.
- Random Initialisation: Symmetry breaking
- Put things together
- Training a neural network
- Pick a network architecture
- Number of input units: Dimension of features x(i)
- Number of output units: Number of Classes
- Layers:
- Number of layers
- Units in each layer
- Same units number in each layer
- Usually the more units the better
- Randomly initialise weights
- Small values near zero
- Implement forward propagation to get prediction for any x(i)
- Implement code to compote cose function J(theta)
- Implement backprop to compute partial derivatives of J(theta)
- for i = 1:m
- Perform forward propagation and back-propagation using example (x(i), y(i))
- Get activations a(l) and delta(l) for l = 2,…,L
- for i = 1:m
- Pick a network architecture
- Training a neural network
- Learning Algorithm 學習演算法
推薦閱讀:
※第四周筆記:神經網路是什麼
※卷積神經網路(CNN)的參數優化方法
※[Matlab]BP神經網路預測[1]
※淺談神經網路與數學之間的關係