CS231n:Assignment1(Q1-Q3,梯度圖解)
來自專欄 AI研習社社區4 人贊了文章
Assignment1(Q1-Q3) 主要用到以下知識點:
一、距離度量
常用的有:
L1範數(曼哈頓距離):
L2範數(歐幾里得距離):
二、KNN演算法
三、訓練集、驗證集、測試集的劃分,以及超參數的調優
超參數調優一般使用交叉驗證。例如KNN演算法中k的選擇,即使用交叉驗證來獲取。
當數據量較大時,交叉驗證會耗費較多的計算資源。因此一般直接把訓練集按照50%-90%的比例分成訓練集和驗證集。
但是當數據集較小,或者超參數的數量較多時,最好還是用交叉驗證吧。
四、Score function
線性分類器:
將權重和偏差合併後為:
五、Loss function
5.1 Multiclass Support Vector Machine Loss
對於某一個圖像 ,其對應的loss為:
其中函數,通常被稱為合頁損失(hinge loss)。
正則化(Regularization)
最常用的正則化懲罰是L2範式,L2範式通過對所有參數進行逐元素的平方懲罰來抑制大數值的權重:
加入正則項後,總的loss為:
和 都是用於權衡data loss和regularization loss相對大小的超參數,通過改變 的大小,不同類別score之間的差異也會改變。因此 一般設置為1,而通過交叉驗證獲取。
加入正則項的意義:對較大數值的權重進行懲罰,從而使得最終的分類器不依賴於某一兩個維度,這樣所有權重的係數會更加分散,從而提高泛化性能。
各矩陣的維數與編程作業中的矩陣維數保持一致(為方便,只給了5個分類類別,即C=5)
梯度求解
作業中,最困難的是以向量的形式求解梯度。
雖然講義中給出了梯度公式,但是不是很明晰。這裡給出我自己的理解,希望對大家有所幫助。
設對於圖像 ,其對應的正確分類為第 類,則與 對應的 為:
上式中的每一行表示類別 的分數對的貢獻(比如,第一行表示類別 對的貢獻,第二行表示類別 對的貢獻,等等),而正確分類的分數對最終的 沒有貢獻(沒有第行)。
令 (Margins,表示邊界)。
那麼,當 時, 才存在。
下面開始對 求偏導:
...
...
因此SVM Loss的梯度為:
當時,
當 時,
其中:
- 這裡的 為指示函數,當括弧中的表達式為真時,其值為1,否則為0。大家可以把它理解為一個非0即1的整數。
- 對 的偏導與其他的 不同。從(1)式可以看出,每一行的求和項中均存在,而只在第 行中存在。
5.2 Softmax Loss
對於圖像數據 ,其對應的loss為:
考慮正則項後,總的loss為:
對於Softmax Loss,理解以下幾點:
第一點:Softmax僅僅是一個數值壓縮函數:
輸入向量 ,則輸入Softmax函數後,其輸出也是一個向量,其中每個元素值在0到1之間,且所有元素之和為1
第二點:Softmax損失認為 → 未歸一化的對數概率,則:
→ 未歸一化的概率
→ 歸一化概率
→ 負對數歸一化概率
第三點:在實際操作中, 中有 函數存在,當分數較大時(如999),計算機無法計算。講義中給出了正確做法:
例如輸入向量為: ,令 ,此時向量中所有的數值均≤0,而 。
梯度求解
對於圖像 ,其對應的正確分類為第 類,則與 對應的 為:
...
...
其中
因此Softmax Loss的梯度為:
六、梯度下降
包括批量梯度下降,小批量數據梯度下降(Mini-batch gradient descent),以及隨機梯度下降(Stochastic Gradient Descent 簡稱SGD,也稱為在線梯度下降)。
七、其他
圖像數據預處理:一般對圖像進行零均值的中心化。
注意:這裡的平均值僅僅是訓練集的平均值。
# first: compute the image mean based on the training datamean_image = np.mean(X_train, axis=0)# second: subtract the mean image from train and test dataX_train -= mean_imageX_val -= mean_imageX_test -= mean_imageX_dev -= mean_image
Q1作業代碼
k_nearest_neighbor.py
import numpy as npfrom scipy.stats import modeclass KNearestNeighbor(object): def __init__(self): pass def train(self, X, y): self.X_train = X self.y_train = y def predict(self, X, k=1, num_loops=0): if num_loops == 0: dists = self.compute_distances_no_loops(X) elif num_loops == 1: dists = self.compute_distances_one_loop(X) elif num_loops == 2: dists = self.compute_distances_two_loops(X) else: raise ValueError(Invalid value %d for num_loops % num_loops) return self.predict_labels(dists, k=k) def compute_distances_two_loops(self, X): num_test = X.shape[0] num_train = self.X_train.shape[0] dists = np.zeros((num_test, num_train)) for i in range(num_test): for j in range(num_train): dists[i][j] = np.sqrt(np.sum((X[i] - self.X_train[j]) ** 2)) return dists def compute_distances_one_loop(self, X): num_test = X.shape[0] num_train = self.X_train.shape[0] dists = np.zeros((num_test, num_train)) for i in range(num_test): dists[i] = np.sqrt(np.sum((X[i, :] - self.X_train) ** 2, axis=1)) return dists def compute_distances_no_loops(self, X): num_test = X.shape[0] num_train = self.X_train.shape[0] dists = np.zeros((num_test, num_train)) dists = np.dot(X, self.X_train.T) * -2 sq1 = np.sum(np.square(X), axis=1, keepdims=True) sq2 = np.sum(np.square(self.X_train), axis=1) dists = np.add(dists, sq1) dists = np.add(dists, sq2) dists = np.sqrt(dists) return dists def predict_labels(self, dists, k=1): y_pred = mode(self.y_train[np.argsort(dists)[:, :k]], axis=1)[0] return y_pred.ravel()
其中,以向量形式計算測試圖像與訓練圖像之間的距離(函數compute_distances_no_loops),主要有兩種方法:
u = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])v = np.array([[1, 0, 1], [4, 6, 9]])# way1dists = np.multiply(np.dot(u, v.T), -2)sq1 = np.sum(np.square(u), axis=1, keepdims=True)sq2 = np.sum(np.square(v), axis=1)dists = np.add(dists, sq1)dists = np.add(dists, sq2)dists = np.sqrt(dists)# way2diff = u[np.newaxis, :, :] - v[:, np.newaxis, :]distance = np.sqrt(np.sum(diff ** 2, axis=-1))ind = np.argmin(distance, axis=0)
其中第二種方法,是將二維數組的維度擴展至三維,形式更加簡便,也更容易理解,參考地址。但是缺點是,非常耗內存。我在作業中使用該方法時,內存直接溢出。這可能就是維度打擊吧。
knn.ipynb中關於Cross Validation的代碼如下:
num_folds = 5k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]X_train_folds = []y_train_folds = []X_train_folds = np.array_split(X_train, num_folds)y_train_folds = np.array_split(y_train, num_folds)k_to_accuracies = {}classifier = KNearestNeighbor()for k in k_choices: accuracy = [] for n in range(num_folds): temp_X = X_train_folds[:] temp_y = y_train_folds[:] X_val = temp_X.pop(n) y_val = temp_y.pop(n) X_train_ = np.vstack((temp_X)) y_train_ = np.vstack((temp_y)).ravel() # 每次訓練時重新設置訓練集 classifier.train(X_train_, y_train_) y_test_pred = classifier.predict(X_val, k=k) # Compute and print the fraction of correctly predicted examples num_correct = np.sum(y_test_pred == y_val) accuracy.append(float(num_correct) / len(y_val)) k_to_accuracies[k] = accuracy# Print out the computed accuraciesfor k in sorted(k_to_accuracies): for accuracy in k_to_accuracies[k]: print(k = %d, accuracy = %f % (k, accuracy))
Q2作業代碼
linear_svm.py
import numpy as npdef svm_loss_naive(W, X, y, reg): dW = np.zeros(W.shape) # initialize the gradient as zero # compute the loss and the gradient num_classes = W.shape[1] num_train = X.shape[0] loss = 0.0 for i in range(num_train): scores = X[i].dot(W) correct_class_score = scores[y[i]] for j in range(num_classes): # 參考公式(1),這裡的y[i]即正確分類k # 而正確分類對loss是沒有貢獻的 if j == y[i]: continue # 即M_j margin = scores[j] - correct_class_score + 1 # note delta = 1 # margin>0才需要計算損失 if margin > 0: # 這裡可以參考下公式(1) # 兩層循環,其實是Li中的每一行分別對w_j和w_k求偏導 # 由於Li中每一行都與w_k有關,因此dW[:, y[i]]每一次都需要更新 dW[:, y[i]] += -X[i].T dW[:, j] += X[i].T loss += margin loss /= num_train loss += 0.5 * reg * np.sum(W * W) dW /= num_train dW += reg * W return loss, dWdef svm_loss_vectorized(W, X, y, reg): num_train = X.shape[0] # 得到所有的分數 scores = X.dot(W) # 正確類別對應的分數 correct_class_score = scores[range(num_train), y].reshape(-1, 1) # 計算loss margins = scores - correct_class_score + 1 margins[margins < 0] = 0 # 所有的loss loss = np.sum(margins) / num_train + 0.5 * reg * np.sum(W * W) # 計算梯度 margins[margins > 0] = 1.0 row_sum = np.sum(margins, axis=1) # 1 by N margins[range(num_train), y] -= row_sum dW = np.dot(X.T, margins) / num_train + reg * W # D by C return loss, dW
上文給出了SMV Loss的梯度求解公式,對於編寫svm_loss_naive函數應該是足夠了。但是在進行向量化時,還是遠遠不夠。我花了很長時間在梯度的向量化求解上,下面用圖表進行大致說明,希望對於理解有所裨益。
圖2中的 與圖1中的 有所不同,經過了以下轉換:
# 得到所有的分數scores = X.dot(W)# 正確類別對應的分數correct_class_score = scores[range(num_train), y].reshape(-1, 1)# 計算邊界margins = scores - correct_class_score + 1margins[margins < 0] = 0
現在我們已經知道了 的公式,在進行向量化時,比較麻煩的是當 時梯度公式與其他情況下不一致。
如上圖所示,假設正確分類的類別為第3類,即 ,那麼 :
當 時, ;
而 時,
但是總的形式其實沒有變,即 對 的梯度是一個係數與 相乘:
關鍵是要得到該係數矩陣。
仔細想一下該係數矩陣的含義,當時,係數應該為1(前提是 );當 時,係數應該是 中所有大於0的元素個數,然後再加一個負號。
理解了以上的話,通過以下三步就可以得到係數矩陣:
然後用該係數矩陣與數據矩陣 相乘(相乘時注意維度)就可以得到最終的梯度。
svm.ipynb中關於交叉驗證的代碼:
learning_rates = [1e-7, 5e-5]regularization_strengths = [2.5e4, 5e4]results = {}best_val = -1 # The highest validation accuracy that we have seen so far.best_svm = None # The LinearSVM object that achieved the highest validation rate.for lr in np.linspace(learning_rates[0], learning_rates[1], 5): for reg in np.linspace(regularization_strengths[0], regularization_strengths[1], 5): svm = LinearSVM() loss_hist = svm.train(X_train, y_train, learning_rate=lr, reg=reg, num_iters=1500, verbose=False) y_train_pred = svm.predict(X_train) y_val_pred = svm.predict(X_val) accuracy_train = np.mean(y_train.ravel() == y_train_pred) accuracy_val = np.mean(y_val.ravel() == y_val_pred) results[(lr, reg)] = [accuracy_train, accuracy_val] if(best_val < accuracy_val): best_val = accuracy_val best_svm = svm# Print out results.for lr, reg in sorted(results): train_accuracy, val_accuracy = results[(lr, reg)] print(lr %e reg %e train accuracy: %f val accuracy: %f % ( lr, reg, train_accuracy, val_accuracy)) print(best validation accuracy achieved during cross-validation: %f % best_val)
Q3作業代碼
softmax.py
import numpy as npdef softmax_loss_naive(W, X, y, reg): loss = 0.0 dW = np.zeros_like(W) num_classes = W.shape[1] num_train = X.shape[0] for i in range(num_train): scores = X[i].dot(W) scores -= np.max(scores) loss += -np.log(np.exp(scores[y[i]]) / np.sum(np.exp(scores))) for j in range(num_classes): if j == y[i]: dW[:, j] += X[i].T * np.exp(scores[j]) / np.sum(np.exp(scores)) - X[i].T else: dW[:, j] += X[i].T * np.exp(scores[j]) / np.sum(np.exp(scores)) loss /= num_train loss += 0.5 * reg * np.sum(W ** 2) dW /= num_train dW += reg * W return loss, dWdef softmax_loss_vectorized(W, X, y, reg): num_train = X.shape[0] # 求解loss scores = X.dot(W) scores -= np.max(scores, axis=1).reshape(-1, 1) loss = - np.sum(np.log(np.exp(scores[range(num_train), y]) / np.sum(np.exp(scores), axis=1))) loss /= num_train loss += 0.5 * reg * np.sum(W ** 2) # 求解grad p = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True) t = np.zeros_like(p) t[range(num_train), y] = 1 dW = X.T.dot(p - t) dW /= num_train dW += reg * W return loss, dW
softmax loss梯度的向量化求解比svm loss簡單很多,關鍵問題也是求係數矩陣,這裡就不再描述。
softmax.ipynb中關於交叉驗證的代碼:
from cs231n.classifiers import Softmaxresults = {}best_val = -1best_softmax = Nonelearning_rates = [1e-7, 5e-7]regularization_strengths = [2.5e4, 5e4]for lr in np.linspace(learning_rates[0], learning_rates[1], 5): for reg in np.linspace(regularization_strengths[0], regularization_strengths[1], 5): softmax = Softmax() loss_hist = softmax.train(X_train, y_train, learning_rate=lr, reg=reg, num_iters=1500, verbose=True) y_train_pred = softmax.predict(X_train) y_val_pred = softmax.predict(X_val) accuracy_train = np.mean(y_train.ravel() == y_train_pred) accuracy_val = np.mean(y_val.ravel() == y_val_pred) results[(lr, reg)] = [accuracy_train, accuracy_val] if(best_val < accuracy_val): best_val = accuracy_val best_softmax = softmax # Print out results.for lr, reg in sorted(results): train_accuracy, val_accuracy = results[(lr, reg)] print(lr %e reg %e train accuracy: %f val accuracy: %f % ( lr, reg, train_accuracy, val_accuracy)) print(best validation accuracy achieved during cross-validation: %f % best_val)
linear_classifier.py
import numpy as npfrom cs231n.classifiers.linear_svm import *from cs231n.classifiers.softmax import *class LinearClassifier(object): def __init__(self): self.W = None def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100, batch_size=200, verbose=False): num_train, dim = X.shape num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes if self.W is None: # lazily initialize W self.W = 0.001 * np.random.randn(dim, num_classes) # Run stochastic gradient descent to optimize W loss_history = [] for it in range(num_iters): indices = np.random.choice(num_train, size=batch_size, replace=True) X_batch = X[indices] y_batch = y[indices] # evaluate loss and gradient loss, grad = self.loss(X_batch, y_batch, reg) loss_history.append(loss) self.W -= learning_rate * grad if verbose and it % 100 == 0: print(iteration %d / %d: loss %f % (it, num_iters, loss)) return loss_history def predict(self, X): y_pred = np.argmax(X.dot(self.W), axis=1) return y_pred def loss(self, X_batch, y_batch, reg): passclass LinearSVM(LinearClassifier): """ A subclass that uses the Multiclass SVM loss function """ def loss(self, X_batch, y_batch, reg): return svm_loss_vectorized(self.W, X_batch, y_batch, reg)class Softmax(LinearClassifier): """ A subclass that uses the Softmax + Cross-entropy loss function """ def loss(self, X_batch, y_batch, reg): return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)
推薦閱讀:
TAG:深度學習DeepLearning | 計算機視覺 | 人工智慧 |