機器學習筆記7 —— 編程作業1代價函數和梯度下降函數

01-30

寫在前面：希望大家先自己做一遍再看操作，畢竟看過就忘了，但是自己做一遍感受深刻好多。

首先我們先將文件都下好。

然後打開ex1。

第一個作業就是讓你熟悉一下提交作業的操作。建立一個函數，返回一個5 x 5的單位矩陣。

%% Initializationnclear ; close all; clcnn%% ==================== Part 1: Basic Function ====================n% Complete warmUpExercise.mnfprintf(Running warmUpExercise ... n);nfprintf(5x5 Identity Matrix: n);nwarmUpExercise()nnfprintf(Program paused. Press enter to continue.n);npause;n

右鍵打開warmUpExercise()函數：

function A = warmUpExercise()n%WARMUPEXERCISE Example function in octaven% A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrixnnA = [];n% ============= YOUR CODE HERE ==============n% Instructions: Return the 5x5 identity matrix n% In octave, we return values by defining which variablesn% represent the return values (at the top of the file)n% and then set them accordingly. nnA = eye(5); %這是要我們自己寫上去的n% ===========================================nendn

然後run一下就可以了。

>> warmUpExercisennans =nn 1 0 0 0 0n 0 1 0 0 0n 0 0 1 0 0n 0 0 0 1 0n 0 0 0 0 1n

輸入submit()，在你的提交作業版面那裡輸入你的郵箱和密碼。就會看到你作業的結果了。很6也很強。

第二個作業說了很多，其實就是讓你熟悉一下plot的操作：

% ======================= Part 2: Plotting =======================nfprintf(Plotting Data ...n)ndata = load(ex1data1.txt);nX = data(:, 1); y = data(:, 2);nm = length(y); % number of training examplesnn% Plot Datan% Note: You have to complete the code in plotData.mnplotData(X, y);nnfprintf(Program paused. Press enter to continue.n);npause;n

打開plotData.m：

function plotData(x, y)n%PLOTDATA Plots the data points x and y into a new figure n% PLOTDATA(x,y) plots the data points and gives the figure axes labels ofn% population and profit.nnfigure; % open a new figure windownn% ====================== YOUR CODE HERE ======================n% Instructions: Plot the training data into a figure using the n% "figure" and "plot" commands. Set the axes labels usingn% the "xlabel" and "ylabel" commands. Assume the n% population and revenue data have been passed inn% as the x and y arguments of this function.n%n% Hint: You can use the rx option with plot to have the markersn% appear as red crosses. Furthermore, you can make then% markers larger by using plot(..., rx, MarkerSize, 10);nnplot(x,y,rx, MarkerSize, 10); %這是要我們自己寫上去的nxlabel(the population of a city in 10,000s); %這是要我們自己寫上去的nylabel(the profit of a food truck in that city in $10,000s); %這是要我們自己寫上去的n% ============================================================nendn

在你的命令窗口按下回車鍵：

這裡詳細說一下：

plot(x,y,rx, MarkerSize, 10); n

x當然是x軸的變數了，y是y軸變數，那麼rx是什麼呢？

看到我們上面的圖像數據都是用一個紅叉表示的，那麼其實rx代表的正是紅叉的意思。

MarkerSize和後面的10是表示表示符的大小，這裡也就是紅叉的大小：

第三個作業，代價函數和梯度下降函數：

%% =================== Part 3: Cost and Gradient descent ===================nnX = [ones(m, 1), data(:,1)]; % Add a column of ones to xntheta = zeros(2, 1); % initialize fitting parametersnn% Some gradient descent settingsniterations = 1500;nalpha = 0.01;nnfprintf(nTesting the cost function ...n)n% compute and display initial costnJ = computeCost(X, y, theta);nfprintf(With theta = [0 ; 0]nCost computed = %fn, J);nfprintf(Expected cost value (approx) 32.07n);nn% further testing of the cost functionnJ = computeCost(X, y, [-1 ; 2]);nfprintf(nWith theta = [-1 ; 2]nCost computed = %fn, J);nfprintf(Expected cost value (approx) 54.24n);nnfprintf(Program paused. Press enter to continue.n);npause;nnfprintf(nRunning Gradient Descent ...n)n% run gradient descentntheta = gradientDescent(X, y, theta, alpha, iterations);nn% print theta to screennfprintf(Theta found by gradient descent:n);nfprintf(%fn, theta);nfprintf(Expected theta values (approx)n);nfprintf( -3.6303n 1.1664nn);nn% Plot the linear fitnhold on; % keep previous plot visiblenplot(X(:,2), X*theta, -)nlegend(Training data, Linear regression)nhold off % dont overlay any more plots on this figurenn% Predict values for population sizes of 35,000 and 70,000npredict1 = [1, 3.5] *theta;nfprintf(For population = 35,000, we predict a profit of %fn,...n predict1*10000);npredict2 = [1, 7] * theta;nfprintf(For population = 70,000, we predict a profit of %fn,...n predict2*10000);nnfprintf(Program paused. Press enter to continue.n);npause;n

說了很多很多，這都不用我們管的。

打開我們的computeCost.m，其實就是讓我們將我們的代價函數用代碼的形式表示出來，代價函數在我們的筆記2，代碼在筆記6是都有提過的哦：

代價函數：

因為這裡我們是要對線性擬合，所以採用的線性模型只需要兩個 $theta$ ，也就是說：

function J = computeCost(X, y, theta)n%COMPUTECOST Compute cost for linear regressionn% J = COMPUTECOST(X, y, theta) computes the cost of using theta as then% parameter for linear regression to fit the data points in X and ynn% Initialize some useful valuesnm = length(y); % number of training examplesnn% You need to return the following variables correctly nJ = 0;nn% ====================== YOUR CODE HERE ======================n% Instructions: Compute the cost of a particular choice of thetan% You should set J to the cost.nnJ = sum((X * theta - y).^2)/(2 * m); %這是要我們自己寫上去的nn% =========================================================================nnendn

結果就是：

代價函數寫完，就到梯度下降函數了：

那麼這裡只有 $theta_{0}$ 和 $theta_{1}$ ：

m = length(y); % number of training examplesnJ_history = zeros(num_iters, 1);nnfor iter = 1:num_itersnn % ====================== YOUR CODE HERE ======================n % Instructions: Perform a single gradient step on the parameter vectorn % theta. n %n % Hint: While debugging, it can be useful to print out the valuesn % of the cost function (computeCost) and gradient here.n %nn theta = theta - (alpha / m)*(X*(X * theta - y));%這是要我們自己寫上去的nn % ============================================================nn % Save the cost J in every iteration n J_history(iter) = computeCost(X, y, theta);nnendnnendn

結果如下：

當我們用代價函數和梯度下降函數就可以模擬出該線性直線了：

這裡可能搞不懂為什麼X倒置要寫在前面，我們舉個簡單的例子：

$theta=left[ begin{array}{ccc} theta_{0} theta_{1} end{array} right]$ ， $X=left[ begin{array}{ccc} 1&x_{0} 1& x_{1} end{array} right]$ ， $y=left[ begin{array}{ccc} y_{0} y_{1} end{array} right]$ ,

首先我們運算的是： $h_{theta}(x) -y$ ,程序X * theta的結果就是：

$X*theta=left[ begin{array}{ccc} (theta_{0}+x_0theta_{1})-y_0 (theta_{0}+x_1theta_{1})-y_1 end{array} right]$ ，這是一個2 x 1的矩陣，X是一個2 x 2的矩陣，所以要將其倒置放在前面相乘：

$X』=left[ begin{array}{ccc} 1&1 x_{0}& x_{1} end{array} right]$ ，

相乘後結果：

$X』*X*theta=left[ begin{array}{ccc} ((theta_{0}+x_0theta_{1})-y_0)+((theta_{0}+x_1theta_{1})-y_1) ((theta_{0}+x_0theta_{1})-y_0)x_0+((theta_{0}+x_1theta_{1})-y_1)x_1 end{array} right]$ ，

第一個元素不就是我們要求的 $theta_0$ ，第二個就是 $theta_1$ 了。不懂的可以看看筆記4。

第四個作業：為了讓我們更好了解代價函數，我們將其可視化：

%% ============= Part 4: Visualizing J(theta_0, theta_1) =============nfprintf(Visualizing J(theta_0, theta_1) ...n)nn% Grid over which we will calculate Jntheta0_vals = linspace(-10, 10, 100);ntheta1_vals = linspace(-1, 4, 100);nn% initialize J_vals to a matrix of 0snJ_vals = zeros(length(theta0_vals), length(theta1_vals));nn% Fill out J_valsnfor i = 1:length(theta0_vals)n for j = 1:length(theta1_vals)nt t = [theta0_vals(i); theta1_vals(j)];nt J_vals(i,j) = computeCost(X, y, t);n endnendnnn% Because of the way meshgrids work in the surf command, we need ton% transpose J_vals before calling surf, or else the axes will be flippednJ_vals = J_vals;n% Surface plotnfigure;nsurf(theta0_vals, theta1_vals, J_vals)nxlabel(theta_0); ylabel(theta_1);nn% Contour plotnfigure;n% Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100ncontour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))nxlabel(theta_0); ylabel(theta_1);nhold on;nplot(theta(1), theta(2), rx, MarkerSize, 10, LineWidth, 2);n

這裡程序人家早就寫好了，只需按一下回車就好：

最後submit()看看分數就好了：

上面都是單變數的，筆記4我們有說，有時候我們的變數不止一個。所以附加題給了我們多變數的訓練一下。

首先打開我們的ex1_multi.m：

不知道你們是否還記得特徵縮放這個操作，當兩個特徵量相差比較大的時候，我們就要進行適當的特徵縮放了：

%% ================ Part 1: Feature Normalization ================nn%% Clear and Close Figuresnclear ; close all; clcnnfprintf(Loading data ...n);nn%% Load Datandata = load(ex1data2.txt);nX = data(:, 1:2);ny = data(:, 3);nm = length(y);nn% Print out some data pointsnfprintf(First 10 examples from the dataset: n);nfprintf( x = [%.0f %.0f], y = %.0f n, [X(1:10,:) y(1:10,:)]);nnfprintf(Program paused. Press enter to continue.n);npause;nn% Scale features and set them to zero meannfprintf(Normalizing Features ...n);nn[X mu sigma] = featureNormalize(X);nn% Add intercept term to XnX = [ones(m, 1) X];n

打開featureNormalize.m：

function [X_norm, mu, sigma] = featureNormalize(X)n%FEATURENORMALIZE Normalizes the features in X n% FEATURENORMALIZE(X) returns a normalized version of X wheren% the mean value of each feature is 0 and the standard deviationn% is 1. This is often a good preprocessing step to do whenn% working with learning algorithms.nn% You need to set these values correctlynX_norm = X;nmu = zeros(1, size(X, 2));nsigma = zeros(1, size(X, 2));nn% ====================== YOUR CODE HERE ======================n% Instructions: First, for each feature dimension, compute the meann% of the feature and subtract it from the dataset,n% storing the mean value in mu. Next, compute the n% standard deviation of each feature and dividen% each feature by its standard deviation, storingn% the standard deviation in sigma. n%n% Note that X is a matrix where each column is a n% feature and each row is an example. You need n% to perform the normalization separately for n% each feature. n%n% Hint: You might find the mean and std functions useful.n% n%============================是要我們自己寫上去的==========================%nmu = mean(X); %求均值nsigma = std(X); %求標準偏差nnfor iter = 1:size(X, 2) n X_norm(:,iter) = (X(:,iter) - mu(iter)) / sigma(iter); nend n%============================是要我們自己寫上去的==========================%n% ============================================================nendn

接下來是我們的代價函數和梯度下降函數了：

%% ================ Part 2: Gradient Descent ================nn% ====================== YOUR CODE HERE ======================n% Instructions: We have provided you with the following startern% code that runs gradient descent with a particularn% learning rate (alpha). n%n% Your task is to first make sure that your functions - n% computeCost and gradientDescent already work with n% this starter code and support multiple variables.n%n% After that, try running gradient descent with n% different values of alpha and see which one givesn% you the best result.n%n% Finally, you should complete the code at the endn% to predict the price of a 1650 sq-ft, 3 br house.n%n% Hint: By using the hold on command, you can plot multiplen% graphs on the same figure.n%n% Hint: At prediction, make sure you do the same feature normalization.n%nnfprintf(Running gradient descent ...n);nn% Choose some alpha valuenalpha = 0.01;nnum_iters = 400;nn% Init Theta and Run Gradient Descent ntheta = zeros(3, 1);n[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);nn% Plot the convergence graphnfigure;nplot(1:numel(J_history), J_history, -b, LineWidth, 2);nxlabel(Number of iterations);nylabel(Cost J);nn% Display gradient descents resultnfprintf(Theta computed from gradient descent: n);nfprintf( %f n, theta);nfprintf(n);nn% Estimate the price of a 1650 sq-ft, 3 br housen% ====================== YOUR CODE HERE ======================n% Recall that the first column of X is all-ones. Thus, it doesn% not need to be normalized.nprice = 0; % You should change thisnnn% ============================================================nnfprintf([Predicted price of a 1650 sq-ft, 3 br house ...n (using gradient descent):n $%fn], price);nnfprintf(Program paused. Press enter to continue.n);npause;n

先打開我們的computeCostMulti.m：

function J = computeCostMulti(X, y, theta)n%COMPUTECOSTMULTI Compute cost for linear regression with multiple variablesn% J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as then% parameter for linear regression to fit the data points in X and ynn% Initialize some useful valuesnm = length(y); % number of training examplesnn% You need to return the following variables correctly nJ = 0;nn% ====================== YOUR CODE HERE ======================n% Instructions: Compute the cost of a particular choice of thetan% You should set J to the cost.nnJ = 1 / (2 * m) * sum(((X * theta) - y).^2); %這是要我們自己寫上去的nn% =========================================================================nnendn

其實沒什麼區別，接下來我們打開gradientDescentMulti.m：

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)n%GRADIENTDESCENTMULTI Performs gradient descent to learn thetan% theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta byn% taking num_iters gradient steps with learning rate alphann% Initialize some useful valuesnm = length(y); % number of training examplesnJ_history = zeros(num_iters, 1);nnfor iter = 1:num_itersnn % ====================== YOUR CODE HERE ======================n % Instructions: Perform a single gradient step on the parameter vectorn % theta. n %n % Hint: While debugging, it can be useful to print out the valuesn % of the cost function (computeCostMulti) and gradient here.n %n delta = (1 / m * (X * theta - y) * X); %這是要我們自己寫上去的n theta = theta - alpha * delta; %這是要我們自己寫上去的n % ============================================================nn % Save the cost J in every iteration n J_history(iter) = computeCostMulti(X, y, theta);nnendnendn

其實跟單變數的程序是一樣的，主要還是我們在之前初始化的時候不同而已。

可以看到我們的代價函數是隨著迭代的次數而下降的，這說明我們的學習速率是對的。

結果顯示給我們看三個 $theta$ 是很大的，預測的結果為1650平米和3個房間的。

不知道你們是否還記得筆記5的正規方程？這不就是用來簡化梯度下降演算法的迭代次數的嗎？

正規方程：

$theta=（X^{T}X）^{-1}X^{T}y$

%% ================ Part 3: Normal Equations ================nnfprintf(Solving with normal equations...n);nn% ====================== YOUR CODE HERE ======================n% Instructions: The following code computes the closed form n% solution for linear regression using the normaln% equations. You should complete the code in n% normalEqn.mn%n% After doing so, you should complete this code n% to predict the price of a 1650 sq-ft, 3 br house.n%nn%% Load Datandata = csvread(ex1data2.txt);nX = data(:, 1:2);ny = data(:, 3);nm = length(y);nn% Add intercept term to XnX = [ones(m, 1) X];nn% Calculate the parameters from the normal equationntheta = normalEqn(X, y);nn% Display normal equations resultnfprintf(Theta computed from the normal equations: n);nfprintf( %f n, theta);nfprintf(n);nnn% Estimate the price of a 1650 sq-ft, 3 br housen% ====================== YOUR CODE HERE ======================nprice = 0; % You should change thisnnn% ============================================================nnfprintf([Predicted price of a 1650 sq-ft, 3 br house ...n (using normal equations):n $%fn], price);n

打開我們的normalEqn.m：

function [theta] = normalEqn(X, y)n%NORMALEQN Computes the closed-form solution to linear regression n% NORMALEQN(X,y) computes the closed-form solution to linear n% regression using the normal equations.nntheta = zeros(size(X, 2), 1);nn% ====================== YOUR CODE HERE ======================n% Instructions: Complete the code to compute the closed form solutionn% to linear regression and put the result in theta.n%n% ---------------------- Sample Solution ----------------------nntheta = pinv(X * X) * X * y %這是要我們自己寫上去的nn% -------------------------------------------------------------nn% ============================================================nendn

好了，我們來看看結果：

其預測的結果跟上面的梯度函數是一樣的。

然後submit()一下：

附加題是沒有分數的，但是後面有個Nice work。

好了，第二章的內容就到此為止了。

筆記整理自Coursera吳恩達機器學習課程。

避免筆記的冗雜，翻閱時不好找，所以分成幾個部分寫，有興趣的同學可以關注一下其它的筆記。

機器學習筆記1 —— 機器學習定義、有監督學習和無監督學習

機器學習筆記2 —— 線性模型、價值函數和梯度下降演算法

機器學習筆記3 —— 線性代數基礎

機器學習筆記4 —— 多特徵量線性回歸

機器學習筆記5 —— 正規方程

機器學習筆記6 —— Matlab編程基礎