機器學習筆記16 —— 編程作業5線性回歸演算法的評估

01-31

我們要處理的數據：

既然是線性回歸那麼首先我們把線性回歸代價函數和梯度下降函數補充完整，打開linearRegCostFunction.m：

代價函數：

梯度下降函數：

function [J, grad] = linearRegCostFunction(X, y, theta, lambda)n%LINEARREGCOSTFUNCTION Compute cost and gradient for regularized linear n%regression with multiple variablesn% [J, grad] = LINEARREGCOSTFUNCTION(X, y, theta, lambda) computes the n% cost of using theta as the parameter for linear regression to fit the n% data points in X and y. Returns the cost in J and the gradient in gradnn% Initialize some useful valuesnm = length(y); % number of training examplesnn% You need to return the following variables correctly nJ = 0;ngrad = zeros(size(theta));nn% ====================== YOUR CODE HERE ======================n% Instructions: Compute the cost and gradient of regularized linear n% regression for a particular choice of theta.n%n% You should set J to the cost and grad to the gradient.n%nh = X * theta; nJ = (X * theta - y). * (X * theta - y) / (2*m)+(lambda/(2*m)) * sum(theta(2:end).^2); nngrad = grad(:); ngrad(1) = (X(:, 1). * (h - y)) /m;ngrad(2:end) = (X(:, 2:end). * (h - y)) /m + (lambda/m) * theta(2:end);nn% =========================================================================nngrad = grad(:);nnendn

運行測試結果：

因為trainLinearReg.m已經寫好了，所以我們可以算出 $theta$ ，從而就可以得到假設曲線：

這裡面程序讓 $lambda=0$ 。在圖上可以看到欠擬合的情況。

接下來我們繪製學習曲線，判斷演算法的正確性，打開learningCurve.m：

function [error_train, error_val] = ...n learningCurve(X, y, Xval, yval, lambda)n%LEARNINGCURVE Generates the train and cross validation set errors needed n%to plot a learning curven% [error_train, error_val] = ...n% LEARNINGCURVE(X, y, Xval, yval, lambda) returns the train andn% cross validation set errors for a learning curve. In particular, n% it returns two vectors of the same length - error_train and n% error_val. Then, error_train(i) contains the training error forn% i examples (and similarly for error_val(i)).n%n% In this function, you will compute the train and test errors forn% dataset sizes from 1 up to m. In practice, when working with largern% datasets, you might want to do this in larger intervals.n%nn% Number of training examplesnm = size(X, 1);nn% You need to return these values correctlynerror_train = zeros(m, 1);nerror_val = zeros(m, 1);nn% ====================== YOUR CODE HERE ======================n% Instructions: Fill in this function to return training errors in n% error_train and the cross validation errors in error_val. n% i.e., error_train(i) and n% error_val(i) should give you the errorsn% obtained after training on i examples.n%n% Note: You should evaluate the training error on the first i trainingn% examples (i.e., X(1:i, and y(1:i)).n%n% For the cross-validation error, you should instead evaluate onn% the _entire_ cross validation set (Xval and yval).n%n% Note: If you are using your cost function (linearRegCostFunction)n% to compute the training and cross validation error, you should n% call the function with the lambda argument set to 0. n% Do note that you will still need to use lambda when runningn% the training to obtain the theta parameters.n%n% Hint: You can loop over the examples with the following:n%n% for i = 1:mn% % Compute train/cross validation errors using training examples n% % X(1:i, and y(1:i), storing the result in n% % error_train(i) and error_val(i)n% ....n% n% endn%nn% ---------------------- Sample Solution ----------------------nnfor i = 1:m,n [theta] = trainLinearReg(X(1:i,:),y(1:i),lambda);n error_train(i) = linearRegCostFunction(X(1:i,:),y(1:i),theta,0);n error_val(i) = linearRegCostFunction(Xval,yval,theta,0);nend;nnn% -------------------------------------------------------------nn% =========================================================================nnendn

經過訓練樣本增加對每一個數據誤的計算，我們可以得到其學習曲線圖像如下：

顯然這跟我們之前所學習到欠擬合的圖像大致輪廓基本相似。

既然我們知道是欠擬合，那麼我們就要採用一些解決方法咯。我們可以採用正價多項式的辦法，

按照要求，將特徵值映射到更高的維度，打開polyFeatures.m：

function [X_poly] = polyFeatures(X, p)n%POLYFEATURES Maps X (1D vector) into the p-th powern% [X_poly] = POLYFEATURES(X, p) takes a data matrix X (size m x 1) andn% maps each example into its polynomial features wheren% X_poly(i, = [X(i) X(i).^2 X(i).^3 ... X(i).^p];n%nnn% You need to return the following variables correctly.nX_poly = zeros(numel(X), p);nn% ====================== YOUR CODE HERE ======================n% Instructions: Given a vector X, return a matrix X_poly where the p-th n% column of X contains the values of X to the p-th power.n%n% nfor i=1:pn X_poly(:,i) = X.^i;nend;nn% =========================================================================nnendn

經過調整，我們可以看到假設函數與數據的關係：

這個是 $J_{train}(theta)$ 和 $J_{cv}(theta)$ 的關係曲線（藍色曲線 $J_{train}(theta)$ 一直為0），我們可以從中得知現在欠擬合的問題解決了，但是好像有點過擬合了喔。

既然是過擬合，而且之前 $lambda=0$ ，所以接下來自然而然我們想到使用正則化來解決過擬合的問題，打開validationCurve.m：

function [lambda_vec, error_train, error_val] = ...n validationCurve(X, y, Xval, yval)n%VALIDATIONCURVE Generate the train and validation errors needed ton%plot a validation curve that we can use to select lambdan% [lambda_vec, error_train, error_val] = ...n% VALIDATIONCURVE(X, y, Xval, yval) returns the trainn% and validation errors (in error_train, error_val)n% for different values of lambda. You are given the training set (X,n% y) and validation set (Xval, yval).n%nn% Selected values of lambda (you should not change this)nlambda_vec = [0 0.001 0.003 0.01 0.03 0.1 0.3 1 3 10];nn% You need to return these variables correctly.nerror_train = zeros(length(lambda_vec), 1);nerror_val = zeros(length(lambda_vec), 1);nn% ====================== YOUR CODE HERE ======================n% Instructions: Fill in this function to return training errors in n% error_train and the validation errors in error_val. The n% vector lambda_vec contains the different lambda parameters n% to use for each calculation of the errors, i.e, n% error_train(i), and error_val(i) should give n% you the errors obtained after training with n% lambda = lambda_vec(i)n%n% Note: You can loop over lambda_vec with the following:n%n% for i = 1:length(lambda_vec)n% lambda = lambda_vec(i);n% % Compute train / val errors when training linear n% % regression with regularization parameter lambdan% % You should store the result in error_train(i)n% % and error_val(i)n% ....n% n% endn%n%nfor i = 1:length(lambda_vec),n lambda = lambda_vec(i);n theta = trainLinearReg(X, y, lambda);n error_train(i) = linearRegCostFunction(X, y, theta, 0);n error_val(i) = linearRegCostFunction(Xval, yval, theta, 0);nend;n% =========================================================================nnendn

最後隨著 $lambda$ 增加 $J_{train}(theta)$ 和 $J_{cv}(theta)$ 曲線，也可以驗證我們之前學的 $lambda$ 與過擬合欠擬合之間的關係：

最後提交一下：

筆記整理自Coursera吳恩達機器學習課程。

避免筆記的冗雜，翻閱時不好找，所以分成幾個部分寫，有興趣的同學可以關注一下其它的筆記。

機器學習筆記1 —— 機器學習定義、有監督學習和無監督學習

機器學習筆記2 —— 線性模型、價值函數和梯度下降演算法

機器學習筆記3 —— 線性代數基礎

機器學習筆記4 —— 多特徵量線性回歸

機器學習筆記5 —— 正規方程

機器學習筆記6 —— Matlab編程基礎

機器學習筆記7 —— 編程作業1

機器學習筆記8 —— 邏輯回歸模型的代價函數和梯度下降演算法

機器學習筆記9 —— 過擬合和正則化

機器學習筆記10 —— 編程作業2

機器學習筆記11 —— 神經網路

機器學習筆記12 —— 編程作業3

機器學習筆記13 —— 神經網路的代價函數和反向傳播演算法(BP演算法)

機器學習筆記14 —— BP演算法相關編程與編程作業4

機器學習筆記15 —— 演算法性能的評估