求助一個問題！！！大佬們，幫幫我！！！Coursera Machine Learning疑惑與解答-第3篇-Week5 Assignments

05-22

來自專欄 Excalibur

好，進入第5周，開始接觸神經網路，前向後向傳播演算法，說實話，這周我看了以後，感覺有些細節問題並沒有搞懂，大概明白了為什麼要去做前向後向傳播演算法以及意義。仔細的研究一下筆記還是很有好處的，至少知道了神經網路計算的順序步驟。

第一道題是要實現two-layer神經網路的cost function,

不帶正則項的公式是這樣的

function [J grad] = nnCostFunction(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda)%NNCOSTFUNCTION Implements the neural network cost function for a two layer%neural network which performs classification% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...% X, y, lambda) computes the cost and gradient of the neural network. The% parameters for the neural network are "unrolled" into the vector% nn_params and need to be converted back into the weight matrices. % % The returned parameter grad should be a "unrolled" vector of the% partial derivatives of the neural network.%% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices% for our 2 layer neural networkTheta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1));Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1));% Setup some useful variablesm = size(X, 1); % You need to return the following variables correctly J = 0;Theta1_grad = zeros(size(Theta1));Theta2_grad = zeros(size(Theta2));% ====================== YOUR CODE HERE ======================% Instructions: You should complete the code by working through the% following parts.%% Part 1: Feedforward the neural network and return the cost in the% variable J. After implementing Part 1, you can verify that your% cost function computation is correct by verifying the cost% computed in ex4.m%% Part 2: Implement the backpropagation algorithm to compute the gradients% Theta1_grad and Theta2_grad. You should return the partial derivatives of% the cost function with respect to Theta1 and Theta2 in Theta1_grad and% Theta2_grad, respectively. After implementing Part 2, you can check% that your implementation is correct by running checkNNGradients%% Note: The vector y passed into the function is a vector of labels% containing values from 1..K. You need to map this vector into a % binary vector of 1s and 0s to be used with the neural network% cost function.%% Hint: We recommend implementing backpropagation using a for-loop% over the training examples if you are implementing it for the % first time.%% Part 3: Implement regularization with the cost function and gradients.%% Hint: You can implement this around the code for% backpropagation. That is, you can compute the gradients for% the regularization separately and then add them to Theta1_grad% and Theta2_grad from Part 2.%X = [ones(m, 1), X]; % adding the bias unit for the input valuesa_2 = sigmoid(X * Theta1); % define the second layer activation unitsa_2 = [ones(m, 1), a_2]; % add the bias unit to the second activation layerh = sigmoid(a_2 * Theta2); % define the output layer activation units% recode the labels as vectors containing only values 0 or 1% Now we need to define Y to use in our cost function. Y should be a matrix consisted of% m rows (m training examples) and K - "num_labels" - columns% (10 possible classifications per training example)% 0 0 0 0 0 0 0 0 0 1% 0 0 0 0 0 0 0 0 0 1% . . . . . . . . . .% 0 1 0 0 0 0 0 0 0 0% . . . . . . . . . .% 1 0 0 0 0 0 0 0 0 0% 1 0 0 0 0 0 0 0 0 0 % It is exactly same as y but is just an m - dimmensional vector of K - dimmensional vectors if it makes sense. % converting y into Y I = eye(num_labels); % define an identity matrixY = zeros(m, num_labels); % define a default Y matrix of zerosfor i = 1:m, Y(i, = I(y(i), :); % construct the Y matrixendJ = - (1 / m) * sum( sum( Y .* log(h) + (1 - Y) .* log(1 - h)));temp_theta_1 = Theta1;temp_theta_2 = Theta2;temp_theta_1(:, 1) = 0;temp_theta_2(:, 1) = 0;% -------------------------------------------------------------% =========================================================================% Unroll gradientsgrad = [Theta1_grad(:) ; Theta2_grad(:)];end

根據上圖計算的時候要注意把bias unit加入進去，然後我主要就是根本沒注意要去轉換y,這裡是我比較困惑的地方，看了答案才明白大概要向量化，

轉換成這種模式，

然後又去看了一下講義，這邊提醒了要去重新編碼一下，但還是要在跑程序的時候，看一下重新編碼後的Y的dimension到底是什麼樣子的，接下來寫不帶正則項的損失函數就不難了。

接下來要實現帶正則項的損失函數

但是要注意這段話，Note that you should not be regularizing the terms that correspond to the bias.也就是說theta當中的bias term(始終為1的那個)應該不被正則化。接下來照常寫就對了。

function [J grad] = nnCostFunction(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda)%NNCOSTFUNCTION Implements the neural network cost function for a two layer%neural network which performs classification% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...% X, y, lambda) computes the cost and gradient of the neural network. The% parameters for the neural network are "unrolled" into the vector% nn_params and need to be converted back into the weight matrices. % % The returned parameter grad should be a "unrolled" vector of the% partial derivatives of the neural network.%% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices% for our 2 layer neural networkTheta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1));Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1));% Setup some useful variablesm = size(X, 1); % You need to return the following variables correctly J = 0;Theta1_grad = zeros(size(Theta1));Theta2_grad = zeros(size(Theta2));% ====================== YOUR CODE HERE ======================% Instructions: You should complete the code by working through the% following parts.%% Part 1: Feedforward the neural network and return the cost in the% variable J. After implementing Part 1, you can verify that your% cost function computation is correct by verifying the cost% computed in ex4.m%% Part 2: Implement the backpropagation algorithm to compute the gradients% Theta1_grad and Theta2_grad. You should return the partial derivatives of% the cost function with respect to Theta1 and Theta2 in Theta1_grad and% Theta2_grad, respectively. After implementing Part 2, you can check% that your implementation is correct by running checkNNGradients%% Note: The vector y passed into the function is a vector of labels% containing values from 1..K. You need to map this vector into a % binary vector of 1s and 0s to be used with the neural network% cost function.%% Hint: We recommend implementing backpropagation using a for-loop% over the training examples if you are implementing it for the % first time.%% Part 3: Implement regularization with the cost function and gradients.%% Hint: You can implement this around the code for% backpropagation. That is, you can compute the gradients for% the regularization separately and then add them to Theta1_grad% and Theta2_grad from Part 2.%X = [ones(m, 1) X]; % adding the bias unit for the input valuesa_2 = sigmoid(X * Theta1); % define the second layer activation unitsa_2 = [ones(m, 1) a_2]; % add the bias unit to the second activation layerh = sigmoid(a_2 * Theta2); % define the output layer activation units% recode the labels as vectors containing only values 0 or 1% Now we need to define Y to use in our cost function. Y should be a matrix consisted of% m rows (m training examples) and K - "num_labels" - columns% (10 possible classifications per training example)% 0 0 0 0 0 0 0 0 0 1% 0 0 0 0 0 0 0 0 0 1% . . . . . . . . . .% 0 1 0 0 0 0 0 0 0 0% . . . . . . . . . .% 1 0 0 0 0 0 0 0 0 0% 1 0 0 0 0 0 0 0 0 0 % It is exactly same as y but is just an m - dimmensional vector of K - dimmensional vectors if it makes sense. % converting y into Y I = eye(num_labels); % define an identity matrixY = zeros(m, num_labels); % define a default Y matrix of zerosfor i = 1:m, Y(i, = I(y(i), :); % construct the Y matrixendJ = - (1 / m) * sum( sum( Y .* log(h) + (1 - Y) .* log(1 - h)));temp_theta_1 = Theta1;temp_theta_2 = Theta2;temp_theta_1(:, 1) = 0;temp_theta_2(:, 1) = 0;regularization_term = (lambda / (2 * m)) * (sum(sum(temp_theta_1 .* temp_theta_1)) + sum(sum(temp_theta_2 .* temp_theta_2)));J = J + regularization_term;% -------------------------------------------------------------% =========================================================================% Unroll gradientsgrad = [Theta1_grad(:) ; Theta2_grad(:)];end

但這裡其實我是有疑惑的，J = - (1 / m) * sum( sum( Y .* log(h) + (1 - Y) .* log(1 - h))); 按理說應該與J = - (1 / m) * sum( sum( Y * log(h) + (1 - Y) * log(1 - h)));等價的啊，但並不是，前面的才是對的，前面的章節我其實就提到過這個問題了，不搞清楚這個問題感覺問題還是挺大的，這樣就不能明白什麼時候應該怎麼去寫，但我問了很多人，都沒能給我一個很好的解釋，特別是從數學層面上。所以我絕對去論壇上問一下，看有沒有大神能給我一個合理的解釋。。。

接下來實現sigmoidGradient，這題還是比較簡單的，

公式以及寫好之後如何測試都已經寫好了，代碼如下，

function g = sigmoidGradient(z)%SIGMOIDGRADIENT returns the gradient of the sigmoid function%evaluated at z% g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function% evaluated at z. This should work regardless if z is a matrix or a% vector. In particular, if z is a vector or matrix, you should return% the gradient for each element.g = zeros(size(z));% ====================== YOUR CODE HERE ======================% Instructions: Compute the gradient of the sigmoid function evaluated at% each value of z (z can be a matrix, vector or scalar).g = sigmoid(z) .* (1 - sigmoid(z));% =============================================================end

接下來最關鍵的步驟，要實現反向傳播演算法，這裡我看了好久才算明白一點了。。。

這兩張圖很關鍵，其實順序就是先實現正向傳播，再實現反向傳播，這裡講義比我寫的好。。。我直接放講義了2333。。。

也就是說通過前向傳播演算法，可以去算所有結點的激活值，包括假設的輸出值，同時對每層的結點可以去計算誤差項，通過誤差項可以知道那個結點該為輸出的誤差負多少責任。

對於輸出結點，我們可以直接測出激活函數與真正的值之間的誤差，對於隱藏層的單元(第l層)，計算誤差項是通過第l+1層誤差項的權重平均值。

接下來這裡寫了5步具體的步驟，放代碼，有兩種寫法

function [J grad] = nnCostFunction(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda)%NNCOSTFUNCTION Implements the neural network cost function for a two layer%neural network which performs classification% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...% X, y, lambda) computes the cost and gradient of the neural network. The% parameters for the neural network are "unrolled" into the vector% nn_params and need to be converted back into the weight matrices.%% The returned parameter grad should be a "unrolled" vector of the% partial derivatives of the neural network.%% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices% for our 2 layer neural networkTheta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1));Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1));% Setup some useful variablesm = size(X, 1);% You need to return the following variables correctlyJ = 0;Theta1_grad = zeros(size(Theta1));Theta2_grad = zeros(size(Theta2));% ====================== YOUR CODE HERE ======================% Instructions: You should complete the code by working through the% following parts.%% Part 1: Feedforward the neural network and return the cost in the% variable J. After implementing Part 1, you can verify that your% cost function computation is correct by verifying the cost% computed in ex4.m%% Part 2: Implement the backpropagation algorithm to compute the gradients% Theta1_grad and Theta2_grad. You should return the partial derivatives of% the cost function with respect to Theta1 and Theta2 in Theta1_grad and% Theta2_grad, respectively. After implementing Part 2, you can check% that your implementation is correct by running checkNNGradients%% Note: The vector y passed into the function is a vector of labels% containing values from 1..K. You need to map this vector into a% binary vector of 1s and 0s to be used with the neural network% cost function.%% Hint: We recommend implementing backpropagation using a for-loop% over the training examples if you are implementing it for the% first time.%% Part 3: Implement regularization with the cost function and gradients.%% Hint: You can implement this around the code for% backpropagation. That is, you can compute the gradients for% the regularization separately and then add them to Theta1_grad% and Theta2_grad from Part 2.%% recode y to YI = eye(num_labels);Y = zeros(m, num_labels);for i=1:m Y(i, :)= I(y(i), :);end% feedforwarda1 = [ones(m, 1) X];z2 = a1*Theta1;a2 = [ones(size(z2, 1), 1) sigmoid(z2)];z3 = a2*Theta2;a3 = sigmoid(z3);h = a3;% calculte penaltyp = sum(sum(Theta1(:, 2:end).^2, 2))+sum(sum(Theta2(:, 2:end).^2, 2));% calculate JJ = sum(sum((-Y).*log(h) - (1-Y).*log(1-h), 2))/m + lambda*p/(2*m);% calculate sigmassigma3 = a3.-Y;sigma2 = (sigma3*Theta2).*sigmoidGradient([ones(size(z2, 1), 1) z2]);sigma2 = sigma2(:, 2:end);% accumulate gradientsdelta_1 = (sigma2*a1);delta_2 = (sigma3*a2);% calculate regularized gradientp1 = (lambda/m)*[zeros(size(Theta1, 1), 1) Theta1(:, 2:end)];p2 = (lambda/m)*[zeros(size(Theta2, 1), 1) Theta2(:, 2:end)];Theta1_grad = delta_1./m + p1;Theta2_grad = delta_2./m + p2;% -------------------------------------------------------------% =========================================================================% Unroll gradientsgrad = [Theta1_grad(:) ; Theta2_grad(:)];end

這裡直接根據

這張圖來寫的，接下來，

是相對應的，最後

這兩個是相對應的，只不過下面的公式沒有加上正則項，上面的這種寫法是向量化的寫法，還有一種寫法是按照上面的5個文字描述的步驟來寫的。

function [J grad] = nnCostFunction(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda)%NNCOSTFUNCTION Implements the neural network cost function for a two layer%neural network which performs classification% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...% X, y, lambda) computes the cost and gradient of the neural network. The% parameters for the neural network are "unrolled" into the vector% nn_params and need to be converted back into the weight matrices. % % The returned parameter grad should be a "unrolled" vector of the% partial derivatives of the neural network.%% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices% for our 2 layer neural networkTheta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1));Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1));% Setup some useful variablesm = size(X, 1);% ====================== YOUR CODE HERE ======================% Instructions: You should complete the code by working through the% following parts.%% Part 1: Feedforward the neural network and return the cost in the% variable J. After implementing Part 1, you can verify that your% cost function computation is correct by verifying the cost% computed in ex4.my_matrix = [];for i = 1:max(y) y_matrix = [y_matrix y==(size(y_matrix,2)+1)];enda1 = [ones(size(X,1),1) X];z2 = Theta1 * a1;a2 = [ones(1, size(z2,2)); sigmoid(z2)]; % add bias unit for hidden layerz3 = Theta2 * a2;h3 = sigmoid(z3);J_unreg = (1/m) * sum(sum( (-y_matrix .* log(h3)) - ((1-y_matrix) .* log(1-h3))));Theta1_reg = Theta1;Theta1_reg(:,1) = 0;Theta2_reg = Theta2;Theta2_reg(:,1) = 0;J = J_unreg + (lambda/(2*m)) * ( sum(sum(Theta1_reg .* Theta1_reg)) + sum(sum(Theta2_reg .* Theta2_reg)) );%% Part 2: Implement the backpropagation algorithm to compute the gradients% Theta1_grad and Theta2_grad. You should return the partial derivatives of% the cost function with respect to Theta1 and Theta2 in Theta1_grad and% Theta2_grad, respectively. After implementing Part 2, you can check% that your implementation is correct by running checkNNGradientsdelta3 = zeros(size(y_matrix,2), 1);Theta1_grad = zeros(size(Theta1));Theta2_grad = zeros(size(Theta2));for t = 1:m a1 = [1; X(t,:)]; z2 = Theta1 * a1; a2 = [ones(1, size(z2,2)); sigmoid(z2)]; z3 = Theta2 * a2; a3 = sigmoid(z3); for k = 1:size(y_matrix,2) delta3(k,:) = a3(k) - y_matrix(t,k); end delta2 = Theta2(:,2:end) * delta3 .* sigmoidGradient(z2); %delta2 = [0; delta2(2:end)]; Theta1_grad = Theta1_grad + delta2 * a1; Theta2_grad = Theta2_grad + delta3 * a2;endTheta1_grad = (1/m) * Theta1_grad + (lambda/m) * Theta1_reg;Theta2_grad = (1/m) * Theta2_grad + (lambda/m) * Theta2_reg;% Unroll gradientsgrad=[Theta1_grad(:) ; Theta2_grad(:)];% -------------------------------------------------------------% =========================================================================end

接下來是不用自己實現的Gradinet checking,

這是用來檢查反向傳播演算法是否正確，即梯度是否正確，因為近似梯度與真正的梯度是很接近的

原諒我的靈魂畫風。。。

這裡兩個小技巧，一個是要注意gradient checking這個演算法的迭代速率是非常慢的，所以在檢查完反向傳播的梯度計算正確後，一定要關閉這個演算法，還有就是gradient checking這個演算法是可以用在任何計算損失函數和梯度的機器學習演算法上面的，例如之前學的logistic regression的損失函數。

關於之前的神經網路的正則項，要求如下

只是要注意MATLAB的下標是從1開始的，做了這麼多作業，這一點應該已經了解了，說實話，這個跟其它編程語言的從0開始的特點相比，我是很不習慣的。。。

然後就是fmincg這個優化迭代函數，也就是去調參尋找最優參數，這個前面也有介紹

主要就是調整MaxIter這個迭代次數，還有防止過擬合的正則項 $lambda$ ,

由於神經網路是能學習劃分非常複雜的邊界的，所以一定要加正則項，防止過擬合。所謂的過擬合就是訓練時表現很好，但學習新的例子的時候表現就很差。