CS231n之線性分類器
知識儲備:
評分函數:
定義評分函數為:(原始圖像像素到分類分值的映射) 在本模型中,我們取:
損失函數:
1 SVM:
即: 通常為1.0
其中,1(·)是示性函數,其取值規則為:1(表達式為真) =1;1(表達式為假) =0。
2 Softmax:
通常為了計算過程中的數值穩定性,我們做:
通常將設為
其中
其中
兩者區別:
正則化:
其中:
數據預處理:
對於圖像數據,我們最常用的預處理方法就是:均值減法
注意:預處理過程中應該先分成訓練/驗證/測試集,只是從訓練集中求圖片平均值,然後各個集(訓練/驗證/測試集)中的圖像再減去這個平均值。作業:(Assignment 1 ):
二 訓練一個SVM:
steps:
- 完成一個完全向量化的SVM損失函數
- 完成一個用解析法向量化求解梯度的函數
- 再用數值法計算梯度,驗證解析法求得結果
- 使用驗證集調優學習率與正則化強度
- 用SGD(隨機梯度下降)方法進行最優化
- 將最終學習到的權重可視化
svm.ipynb(主程序)
# 將數據集分割為 train,val,test # 此外我們還從訓練集中分割一個小的 」development「 集# 我們使用development集來使得代碼運行速度更快num_training = 49000num_validation = 1000num_test = 1000num_dev = 500 #從訓練集中分割而來 mask = range(num_training, num_training + num_validation)X_val = X_train[mask]y_val = y_train[mask]mask = range(num_training)X_train = X_train[mask]y_train = y_train[mask]mask = np.random.choice(num_training, num_dev, replace=False)X_dev = X_train[mask]y_dev = y_train[mask]mask = range(num_test)X_test = X_test[mask]y_test = y_test[mask]print Train data shape: , X_train.shapeprint Train labels shape: , y_train.shapeprint Validation data shape: , X_val.shapeprint Validation labels shape: , y_val.shapeprint Test data shape: , X_test.shapeprint Test labels shape: , y_test.shapeTrain data shape: (49000, 32, 32, 3)Train labels shape: (49000,)Validation data shape: (1000, 32, 32, 3)Validation labels shape: (1000,)Test data shape: (1000, 32, 32, 3)Test labels shape: (1000,)# 將圖像數據拉伸為行向量X_train = np.reshape(X_train, (X_train.shape[0], -1))X_val = np.reshape(X_val, (X_val.shape[0], -1))X_test = np.reshape(X_test, (X_test.shape[0], -1))X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))# 列印一下數據尺寸 確保操作成功print Training data shape: , X_train.shapeprint Validation data shape: , X_val.shapeprint Test data shape: , X_test.shapeprint dev data shape: , X_dev.shapeTraining data shape: (49000, 3072) #訓練集Validation data shape: (1000, 3072) #驗證集Test data shape: (1000, 3072) #測試集dev data shape: (500, 3072)#數據預處理(注意)mean_image = np.mean(X_train, axis=0)#只對訓練集求均值 (1,3072)X_train -= mean_imageX_val -= mean_imageX_test -= mean_imageX_dev -= mean_image#將偏置與權重整合到一個矩陣中X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])print X_train.shape, X_val.shape, X_test.shape, X_dev.shape(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)
下面完善SVM函數:
def svm_loss_vectorized(W, X, y, reg): """ 輸入: - W: (D, C) - X: (N, D) - y: (N,) - reg: (float) 正則化強度 返回值: - loss (單精度) - dW """ loss = 0.0 num_train = X.shape[0] num_classes = W.shape[1] dW = np.zeros(W.shape) scores = X.dot(W) #各類別得分 correct_class_scores = scores[range(num_train), list(y)].reshape(-1,1) #正確類別得分 (N,1) margins = np.maximum(0, scores - np.tile(correct_class_scores, (1,num_classes)) + 1) margins[range(num_train), list(y)] = 0 loss = np.sum(margins) loss /= num_train # 添加正則項 loss += 0.5 * reg * np.sum(W * W) #計算梯度 margins[margins > 0] = 1.0 row_sum = np.sum(margins, axis=1) margins[np.arange(num_train), y] = -row_sum dW += np.dot(X.T, margins)/num_train + reg * W return loss, dW
損失函數的另一個選擇Softmax:
import numpy as npdef softmax_loss_vectorized(W, X, y, reg): loss = 0.0 dW = np.zeros_like(W) # (D,C) num_train, dim = X.shape f = X.dot(W) # (N,C) f_max = np.reshape(np.max(f, axis=1), (num_train, 1)) # (N,1) prob = np.exp(f - f_max) / np.sum(np.exp(f - f_max), axis=1, keepdims=True) y_trueClass = np.zeros_like(prob) y_trueClass[range(num_train), y] = 1.0 # N by C loss += -np.sum(y_trueClass * np.log(prob)) / num_train + 0.5 * reg * np.sum(W * W) dW += -np.dot(X.T, y_trueClass - prob) / num_train + reg * W return loss, dW
檢驗一下梯度gradient_check.py:
def eval_numerical_gradient_array(f, x, df, h=1e-5):"""數值法計算梯度"""grad = np.zeros_like(x)it = np.nditer(x, flags=[multi_index], op_flags=[readwrite])while not it.finished:ix = it.multi_indexoldval = x[ix]x[ix] = oldval + hpos = f(x).copy()x[ix] = oldval - hneg = f(x).copy()x[ix] = oldvalgrad[ix] = np.sum((pos - neg) * df) / (2 * h)it.iternext()return grad#梯度檢驗def grad_check_sparse(f, x, analytic_grad, num_checks=10, h=1e-5): """隨機抽取num_checks個數,檢驗它的全部維度 """ for i in xrange(num_checks): ix = tuple([randrange(m) for m in x.shape]) oldval = x[ix] x[ix] = oldval + h fxph = f(x) x[ix] = oldval - h fxmh = f(x) x[ix] = oldval grad_numerical = (fxph - fxmh) / (2 * h) grad_analytic = analytic_grad[ix] rel_error = abs(grad_numerical - grad_analytic) / (abs(grad_numerical) + abs(grad_analytic)) print numerical: %f analytic: %f, relative error: %e % (grad_numerical, grad_analytic, rel_error)
回到主程序:
#首先計算一下損矢值from cs231n.classifiers.linear_svm import svm_loss_naive# 隨機生成權重W = np.random.randn(3073, 10) * 0.0001 loss, grad = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)print loss: %f % (loss, )
loss: 9.391001
#再檢驗一下梯度from cs231n.gradient_check import grad_check_sparse#關閉正則化f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]grad_numerical = grad_check_sparse(f, W, grad)#開啟正則化loss, grad = svm_loss_naive(W, X_dev, y_dev, 1e2)f = lambda w: svm_loss_naive(w, X_dev, y_dev, 1e2)[0]grad_numerical = grad_check_sparse(f, W, grad)
注意:有些維度的梯度可能會不匹配,因為有些數值點處,梯度不存在
最小化損失值
補充完整linear_classifier.py
import numpy as npfrom cs231n.classifiers.linear_svm import *from cs231n.classifiers.softmax import *class LinearClassifier(object): def __init__(self): self.W = None def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100, batch_size=200, verbose=False): """ 使用SGD方法訓練這個線性分類器 輸入: - X: (N,D) - y: (N,) - learning_rate: (float)學習率 - reg: (float) 正則化強度 - num_iters: (integer) 最優化時採取的步驟數 - batch_size: (integer) 最優化時每一步接收的數據數 - verbose: (boolean) If true, print progress during optimization. """ num_train, dim = X.shape num_classes = np.max(y) + 1 if self.W is None: # 隨意的初始化 後續會有更適宜的初始化辦法 self.W = 0.001 * np.random.randn(dim, num_classes) loss_history = [] for it in xrange(num_iters): X_batch = None y_batch = None idx = np.random.choice(num_train, batch_size, replace=True) X_batch = X[idx] y_batch = y[idx] loss, grad = self.loss(X_batch, y_batch, reg) loss_history.append(loss) self.W = self.W - learning_rate*grad #SGD if verbose and it % 100 == 0: print iteration %d / %d: loss %f % (it, num_iters, loss) return loss_history def predict(self, X): """ 使用訓練好的權重去預測標籤 輸入: - X: (N,D) - y_pred: (N,) """ y_pred = np.zeros(X.shape[0]) scores = X.dot(self.W) y_pred = np.argmax(scores, axis = 1) return y_pred def loss(self, X_batch, y_batch, reg): """ 輸入: - X_batch: (N,D) - y_batch: (N,) - reg: (float) 正則化強度 返回值: - loss - dW """ passclass LinearSVM(LinearClassifier): def loss(self, X_batch, y_batch, reg): return svm_loss_vectorized(self.W, X_batch, y_batch, reg)class Softmax(LinearClassifier): def loss(self, X_batch, y_batch, reg):return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)
返回主程序,使用SGD去優化損失:
from cs231n.classifiers import LinearSVMsvm = LinearSVM()loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4, num_iters=1500, verbose=True)
iteration 0 / 1500: loss 791.911030iteration 100 / 1500: loss 287.042915iteration 200 / 1500: loss 108.863656iteration 300 / 1500: loss 42.827962iteration 400 / 1500: loss 19.091675iteration 500 / 1500: loss 10.296198iteration 600 / 1500: loss 7.087291iteration 700 / 1500: loss 5.551409iteration 800 / 1500: loss 5.375510iteration 900 / 1500: loss 5.378444iteration 1000 / 1500: loss 5.079874iteration 1100 / 1500: loss 5.385146iteration 1200 / 1500: loss 4.970695iteration 1300 / 1500: loss 5.197667iteration 1400 / 1500: loss 5.032890
畫出損失函數圖像,觀察是否收斂:
plt.plot(loss_hist)plt.xlabel(Iteration number)plt.ylabel(Loss value)plt.show()
計算出SVM線性分類器在訓練集和測試集上的準確率:
y_train_pred = svm.predict(X_train)print training accuracy: %f % (np.mean(y_train == y_train_pred), )y_val_pred = svm.predict(X_val)print validation accuracy: %f % (np.mean(y_val == y_val_pred), )
training accuracy: 0.376735
validation accuracy: 0.385000接下來使用驗證集去調參(正則化強度與學習率). 你應該選用不同範圍內的學習率與正則化強度去調試
如果你調整的足夠完美,應該能在驗證集上得到0.4的準確率
learning_rates = [1e-7] #這裡學習率採用固定參數regularization_strengths =[(j+0.1*i)*1e4 for j in range(1,5) for i in range(0,10)] #[1*1e4,6*1e4]兩數之間間隔0.1*1e4 共55個數results = {} # results是一個字典 (learning_rate, regularization_strength)對應(train_accuracy, val_accuracy)best_val = -1 # 存儲驗證集中得到的最高準確率best_svm = None # 得到最高準確率的那個模型for reg in regularization_strengths: for lr in learning_rates: svm = LinearSVM() loss_hist = svm.train(X_train, y_train, lr, reg, num_iters=1500) y_train_pred = svm.predict(X_train) train_accuracy = np.mean(y_train == y_train_pred) y_val_pred = svm.predict(X_val) val_accuracy = np.mean(y_val == y_val_pred) if val_accuracy > best_val: best_val = val_accuracy best_svm = svm results[(lr,reg)] = train_accuracy, val_accuracy # 列印結果for lr, reg in sorted(results): train_accuracy, val_accuracy = results[(lr, reg)] print lr %e reg %e train accuracy: %f val accuracy: %f % ( lr, reg, train_accuracy, val_accuracy) print best validation accuracy achieved during cross-validation: %f % best_val
lr 1.000000e-07 reg 1.000000e+04 train accuracy: 0.378388 val accuracy: 0.382000
lr 1.000000e-07 reg 1.100000e+04 train accuracy: 0.379510 val accuracy: 0.379000lr 1.000000e-07 reg 1.200000e+04 train accuracy: 0.379633 val accuracy: 0.393000lr 1.000000e-07 reg 1.300000e+04 train accuracy: 0.384653 val accuracy: 0.396000lr 1.000000e-07 reg 1.400000e+04 train accuracy: 0.378878 val accuracy: 0.381000lr 1.000000e-07 reg 1.500000e+04 train accuracy: 0.387633 val accuracy: 0.387000lr 1.000000e-07 reg 1.600000e+04 train accuracy: 0.386082 val accuracy: 0.411000lr 1.000000e-07 reg 1.700000e+04 train accuracy: 0.383959 val accuracy: 0.393000lr 1.000000e-07 reg 1.800000e+04 train accuracy: 0.383776 val accuracy: 0.385000lr 1.000000e-07 reg 1.900000e+04 train accuracy: 0.384694 val accuracy: 0.397000
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.379633 val accuracy: 0.386000lr 1.000000e-07 reg 2.100000e+04 train accuracy: 0.375367 val accuracy: 0.384000lr 1.000000e-07 reg 2.200000e+04 train accuracy: 0.381367 val accuracy: 0.382000lr 1.000000e-07 reg 2.300000e+04 train accuracy: 0.380041 val accuracy: 0.394000lr 1.000000e-07 reg 2.400000e+04 train accuracy: 0.384878 val accuracy: 0.392000lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.381286 val accuracy: 0.384000lr 1.000000e-07 reg 2.600000e+04 train accuracy: 0.374633 val accuracy: 0.386000lr 1.000000e-07 reg 2.700000e+04 train accuracy: 0.383224 val accuracy: 0.387000lr 1.000000e-07 reg 2.800000e+04 train accuracy: 0.376245 val accuracy: 0.384000lr 1.000000e-07 reg 2.900000e+04 train accuracy: 0.380327 val accuracy: 0.393000
lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.377980 val accuracy: 0.381000lr 1.000000e-07 reg 3.100000e+04 train accuracy: 0.368510 val accuracy: 0.364000lr 1.000000e-07 reg 3.200000e+04 train accuracy: 0.382592 val accuracy: 0.382000lr 1.000000e-07 reg 3.300000e+04 train accuracy: 0.376020 val accuracy: 0.378000lr 1.000000e-07 reg 3.400000e+04 train accuracy: 0.377020 val accuracy: 0.387000lr 1.000000e-07 reg 3.500000e+04 train accuracy: 0.374571 val accuracy: 0.372000lr 1.000000e-07 reg 3.600000e+04 train accuracy: 0.372469 val accuracy: 0.382000lr 1.000000e-07 reg 3.700000e+04 train accuracy: 0.373837 val accuracy: 0.382000lr 1.000000e-07 reg 3.800000e+04 train accuracy: 0.369469 val accuracy: 0.375000lr 1.000000e-07 reg 3.900000e+04 train accuracy: 0.375245 val accuracy: 0.384000
lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.370531 val accuracy: 0.382000lr 1.000000e-07 reg 4.100000e+04 train accuracy: 0.373469 val accuracy: 0.398000lr 1.000000e-07 reg 4.200000e+04 train accuracy: 0.374531 val accuracy: 0.386000lr 1.000000e-07 reg 4.300000e+04 train accuracy: 0.372959 val accuracy: 0.379000lr 1.000000e-07 reg 4.400000e+04 train accuracy: 0.373286 val accuracy: 0.378000lr 1.000000e-07 reg 4.500000e+04 train accuracy: 0.375653 val accuracy: 0.391000lr 1.000000e-07 reg 4.600000e+04 train accuracy: 0.373122 val accuracy: 0.373000lr 1.000000e-07 reg 4.700000e+04 train accuracy: 0.372327 val accuracy: 0.387000lr 1.000000e-07 reg 4.800000e+04 train accuracy: 0.370122 val accuracy: 0.384000lr 1.000000e-07 reg 4.900000e+04 train accuracy: 0.367776 val accuracy: 0.381000
best validation accuracy achieved during cross-validation: 0.411000用最好的模型去跑測試集:
y_test_pred = best_svm.predict(X_test)test_accuracy = np.mean(y_test == y_test_pred)print linear SVM on raw pixels final test set accuracy: %f % test_accuracylinear SVM on raw pixels final test set accuracy: 0.380000
推薦閱讀:
※哆啦A夢裡的22世紀的科技究竟有多強?
※目前來看,鎚子科技未來的命運會如何?
※哆啦 A 夢的哪些道具已經實現了?
※他將愛爾蘭最有趣的「科學畫廊」推向全世界,更帶著「黑科技」驚艷達沃斯
※如何評價魅族Pro7的副屏,她是怎樣的一種存在?