機器學習筆記25 —— 編程作業8異常檢測演算法和推薦系統

01-29

這次的編程作業主要是這章所學的異常檢測演算法和推薦系統。

第一個作業是我們的異常檢測演算法，首先打開estimateGaussian.m，將高斯分布寫上：

function [mu sigma2] = estimateGaussian(X)n%ESTIMATEGAUSSIAN This function estimates the parameters of a n%Gaussian distribution using the data in Xn% [mu sigma2] = estimateGaussian(X), n% The input X is the dataset with each n-dimensional data point in one rown% The output is an n-dimensional vector mu, the mean of the data setn% and the variances sigma^2, an n x 1 vectorn% nn% Useful variablesn[m, n] = size(X);nn% You should return these values correctlynmu = zeros(n, 1);nsigma2 = zeros(n, 1);nn% ====================== YOUR CODE HERE ======================n% Instructions: Compute the mean of the data and the variancesn% In particular, mu(i) should contain the mean ofn% the data for the i-th feature and sigma2(i)n% should contain variance of the i-th feature.n%nnmu = mean(X); %求均值nsigma2 = var(X, 1); %求方差nn% =============================================================nnnendn

運行結果：

高斯分布圖：

然後選擇閾值 $varepsilon$ 進行判斷，打開selectThreshold.m：

function [bestEpsilon bestF1] = selectThreshold(yval, pval)n%SELECTTHRESHOLD Find the best threshold (epsilon) to use for selectingn%outliersn% [bestEpsilon bestF1] = SELECTTHRESHOLD(yval, pval) finds the bestn% threshold to use for selecting outliers based on the results from an% validation set (pval) and the ground truth (yval).n%nnbestEpsilon = 0;nbestF1 = 0;nF1 = 0;nnstepsize = (max(pval) - min(pval)) / 1000;nfor epsilon = min(pval):stepsize:max(pval)n n % ====================== YOUR CODE HERE ======================n % Instructions: Compute the F1 score of choosing epsilon as then % threshold and place the value in F1. The code at then % end of the loop will compare the F1 score for thisn % choice of epsilon and set it to be the best epsilon ifn % it is better than the current choice of epsilon.n % n % Note: You can use predictions = (pval < epsilon) to get a binary vectorn % of 0s and 1s of the outlier predictionsncvPredictions = pval < epsilon; %預測概率小於epsilon則異常ntruePositives = sum( (cvPredictions == 1) & ( yval == 1));nfalsePositives = sum( (cvPredictions == 1) & (yval == 0));ntrueNegatives = sum( (cvPredictions == 0) & (yval == 0));nfalseNegatives = sum( (cvPredictions == 0) & (yval == 1));nnprec = truePositives / (truePositives + falsePositives);nrec = truePositives / (truePositives + falseNegatives);nF1 = 2 * prec * rec / (prec + rec); %求F1值nn % =============================================================nn if F1 > bestF1n bestF1 = F1;n bestEpsilon = epsilon;n endnendnnendn

運行結果：

第二個是推薦系統，首先打開cofiCostFunc.m，將代價函數和梯度下降函數補上：

function [J, grad] = cofiCostFunc(params, Y, R, num_users, num_movies, ...n num_features, lambda)n%COFICOSTFUNC Collaborative filtering cost functionn% [J, grad] = COFICOSTFUNC(params, Y, R, num_users, num_movies, ...n% num_features, lambda) returns the cost and gradient for then% collaborative filtering problem.n%nn% Unfold the U and W matrices from paramsnX = reshape(params(1:num_movies*num_features), num_movies, num_features);nTheta = reshape(params(num_movies*num_features+1:end), ...n num_users, num_features);nn n% You need to return the following values correctlynJ = 0;nX_grad = zeros(size(X));nTheta_grad = zeros(size(Theta));nn% ====================== YOUR CODE HERE ======================n% Instructions: Compute the cost function and gradient for collaborativen% filtering. Concretely, you should first implement the costn% function (without regularization) and make sure it isn% matches our costs. After that, you should implement the n% gradient and use the checkCostFunction routine to checkn% that the gradient is correct. Finally, you should implementn% regularization.n%n% Notes: X - num_movies x num_features matrix of movie featuresn% Theta - num_users x num_features matrix of user featuresn% Y - num_movies x num_users matrix of user ratings of moviesn% R - num_movies x num_users matrix, where R(i, j) = 1 if the n% i-th movie was rated by the j-th usern%n% You should set the following variables correctly:n%n% X_grad - num_movies x num_features matrix, containing the n% partial derivatives w.r.t. to each element of Xn% Theta_grad - num_users x num_features matrix, containing the n% partial derivatives w.r.t. to each element of Thetan%nnOtX = (X * Theta) - Y;nJ_temp = (OtX).^2;nJ = sum(J_temp(R==1)) * 1/2; %代價函數J求和部分nnX_grad = (OtX .* R) * Theta + lambda * X;nTheta_grad = (OtX .* R) * X + lambda * Theta; %正則化n%with regularization nnJ = J + lambda/2 * sum(sum(Theta.^2)) + lambda/2 * sum(sum(X.^2)); %總的代價函數Jnnn% =============================================================nngrad = [X_grad(:); Theta_grad(:)];nnendn

運行結果：

均值歸一化：

代價函數檢驗：

梯度下降檢驗（無正則化）：

相關程度和預測評分：

利用協同過濾演算法計算的結果：

最後提交一下：

筆記整理自Coursera吳恩達機器學習課程。

避免筆記的冗雜，翻閱時不好找，所以分成幾個部分寫，有興趣的同學可以關注一下其它的筆記。

機器學習筆記1 —— 機器學習定義、有監督學習和無監督學習

機器學習筆記2 —— 線性模型、價值函數和梯度下降演算法

機器學習筆記3 —— 線性代數基礎

機器學習筆記4 —— 多特徵量線性回歸

機器學習筆記5 —— 正規方程

機器學習筆記6 —— Matlab編程基礎

機器學習筆記7 —— 編程作業1

機器學習筆記8 —— 邏輯回歸模型的代價函數和梯度下降演算法

機器學習筆記9 —— 過擬合和正則化

機器學習筆記10 —— 編程作業2

機器學習筆記11 —— 神經網路

機器學習筆記12 —— 編程作業3

機器學習筆記13 —— 神經網路的代價函數和反向傳播演算法(BP演算法)

機器學習筆記14 —— BP演算法相關編程與編程作業4

機器學習筆記15 —— 演算法性能的評估

機器學習筆記16 —— 編程作業5線性回歸演算法的評估

機器學習筆記17 —— 垃圾郵件分類器、查准率和召回率

機器學習筆記18 —— 支持向量機、核函數

機器學習筆記19 —— 支持向量機作業和編程作業6支持向量機和垃圾郵件

機器學習筆記20 —— K均值聚類演算法

機器學習筆記21 —— 維數約簡的PCA演算法

機器學習筆記22 —— 編程作業7 K均值聚類演算法和PCA演算法

機器學習筆記23 —— 異常檢測演算法

機器學習筆記24 —— 推薦系統