BP神經網路演算法：將參數矩陣向量化

04-28

上一篇《機器學習：神經網路的代價函數及反向傳播演算法》記錄了如何使用反向傳播演算法計算代價函數的導數，其中一個細節就是需要把參數的矩陣表達式展開成向量的形式，以便在後來使用高級的優化演算法。Ng 老師在講展開參數（Unrolling Parameters）這部分時，比較粗略。自己補了一下視頻里提到的內容，在這裡總結記錄一下～

基於 Matlab 實現 unrolling parameters，有以下步驟：

function[jVal, gradient] = costFunction(theta)

optTheta = fminunc(@costFunction, initialTheta, options)

其中，fminunc() 是 Matlab 中內置的一個函數，其官方介紹如下：

Find minimum of unconstrained multivariable function
Nonlinear programming solver.
Finds the minimum of a problem specified by $minf(x)$ ,

where f(x) is a function that returns a scalar.
x is a vector or a matrix.

fminunc() 可以用來解決無約束非線性優化問題，幫助我們找到多變數函數的最小值，其中一種形式為：

x = fminunc(fun,x0,options)

那麼再回頭看 optTheta = fminunc(@costFunction, initialTheta, options) 這個式子，其中第一個輸入參數 fun 定義為 @costFunction （@ 是Matlab中的句柄函數的標誌符，即間接的函數調用方法）；第二個參數定義為 initialTheta，它是一個向量，是需要用戶來自定義的，使用前需要初始化；第三個參數 options 是一個結構體，可以通過 optimset 來設置它，包括 GradObj 和 Maxlter 兩個參數，GradObj 指明知否使用用戶自定義的梯度下降公式，Maxlter 用來設置迭代次數。

??設置 options 的小例子：

options = optimset(GradObj, on, MaxIter, 100);

介紹完了 fminunc() 方法，再看這兩個式子：