斯坦福CS231n項目實戰(三):Softmax線性分類
我的CSDN博客地址:紅色石頭的專欄
我的知乎主頁:紅色石頭 我的知乎專欄:紅色石頭的機器學習之路 我的微信公眾號:紅色石頭的機器學習之路(ID: redstonewill) 歡迎大家關注我!共同學習,共同進步!
Softmax線性分類器的損失函數(Loss function)為:
其中, ,表示得分函數。 表示正確樣本的標籤label,C表示總類別個數。該損失函數也稱作cross-entropy loss。
實際運算過程中,因為有冪指數,為了避免數值過大,我們一般對得分函數s進行一些數值處理。處理原理如下:
令:
相應的python代碼為:
scores = np.array([123, 456, 789]) # example with 3 classes and each having large scoresscores -= np.max(scores) # scores becomes [-666, -333, 0]p = np.exp(scores) / np.sum(np.exp(scores))
Softmax分類器計算每個類別的概率,其損失函數反應的是真實樣本標籤label的預測概率,概率越接近1,則loss越接近0。由於引入正則項,超參數 越大,則對權重W的懲罰越大,使得W更小,分布趨於均勻,造成不同類別之間的概率分布也趨於均勻。下面舉個例子來說明。
- 當 較小時:
- 當 較大時:
由上述例子可見,由於強正則化的影響,概率分布也趨於均勻。但是必須注意的是,概率之間的相對大小即順序並沒有改變。
線性SVM分類器和Softmax線性分類器的主要區別在於損失函數不同。SVM更關注分類正確樣本和錯誤樣本之間的距離( ),只要距離大於 ,就不在乎到底距離相差多少,忽略細節。而Softmax中每個類別的得分函數都會影響其損失函數的大小。舉個例子來說明,類別個數C=3,兩個樣本的得分函數分別為[10, -10, -10],[10, 9, 9],真實標籤為第0類。對於SVM來說,這兩個 都為0;但對於Softmax來說,這兩個 分別為0.00和0.55,差別很大。
下面是Softmax線性分類器的實例代碼,本文詳細代碼請見我的:
- Github
- 碼雲
1. Load the CIFAR10 dataset
# Load the raw CIFAR-10 data.cifar10_dir = CIFAR10/datasets/cifar-10-batches-pyX_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)# As a sanity check, we print out the size of the training and test data.print(Training data shape: , X_train.shape)print(Training labels shape: , y_train.shape)print(Test data shape: , X_test.shape)print(Test labels shape: , y_test.shape)
Training data shape: (50000, 32, 32, 3)
Training labels shape: (50000,)Test data shape: (10000, 32, 32, 3)Test labels shape: (10000,)
Show some CIFAR10 images
classes = [plane, car, bird, cat, dear, dog, frog, horse, ship, truck]num_classes = len(classes)num_each_class = 7for y, cls in enumerate(classes): idxs = np.flatnonzero(y_train == y) idxs = np.random.choice(idxs, num_each_class, replace=False) for i, idx in enumerate(idxs): plt_idx = i * num_classes + (y + 1) plt.subplot(num_each_class, num_classes, plt_idx) plt.imshow(X_train[idx].astype(uint8)) plt.axis(off) if i == 0: plt.title(cls)plt.show()
Subsample the data for more efficient code execution
# Split the data into train, val, test sets and dev setsnum_train = 49000num_val = 1000num_test = 1000num_dev = 500# Validation setmask = range(num_train, num_train + num_val)X_val = X_train[mask]y_val = y_train[mask]# Train setmask = range(num_train)X_train = X_train[mask]y_train = y_train[mask]# Test setmask = range(num_test)X_test = X_test[mask]y_test = y_test[mask]# Development setmask = np.random.choice(num_train, num_dev, replace=False)X_dev = X_train[mask]y_dev = y_train[mask]print(Train data shape: , X_train.shape)print(Train labels shape: , y_train.shape)print(Validation data shape: , X_val.shape)print(Validation labels shape , y_val.shape)print(Test data shape: , X_test.shape)print(Test labels shape: , y_test.shape)print(Development data shape: , X_dev.shape)print(Development labels shape: , y_dev.shape)
Train data shape: (49000, 32, 32, 3)
Train labels shape: (49000,)Validation data shape: (1000, 32, 32, 3)Validation labels shape (1000,)
Test data shape: (1000, 32, 32, 3)Test labels shape: (1000,)Development data shape: (500, 32, 32, 3)Development labels shape: (500,)
2. Preprocessing
Reshape the images data into rows
# Preprocessing: reshape the images data into rowsX_train = np.reshape(X_train, (X_train.shape[0], -1))X_val = np.reshape(X_val, (X_val.shape[0], -1))X_test = np.reshape(X_test, (X_test.shape[0], -1))X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))print(Train data shape: , X_train.shape)print(Validation data shape: , X_val.shape)print(Test data shape: , X_test.shape)print(Development data shape: , X_dev.shape)
Train data shape: (49000, 3072)
Validation data shape: (1000, 3072)Test data shape: (1000, 3072)Development data shape: (500, 3072)
Subtract the mean images
# Processing: subtract the mean imagesmean_image = np.mean(X_train, axis=0)plt.figure(figsize=(4,4))plt.imshow(mean_image.reshape((32,32,3)).astype(uint8))plt.show()
X_train -= mean_imageX_val -= mean_imageX_test -= mean_imageX_dev -= mean_image
Append the bias dimension of ones
# append the bias dimension of ones (i.e. bias trick)X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])print(Train data shape: , X_train.shape)print(Validation data shape: , X_val.shape)print(Test data shape: , X_test.shape)print(Development data shape: , X_dev.shape)
Train data shape: (49000, 3073)
Validation data shape: (1000, 3073)Test data shape: (1000, 3073)
Development data shape: (500, 3073)
3. Define a linear Softmax classifier
class Softmax(object): def __init__(self): self.W = None def loss_naive(self, X, y, reg): """ Structured Softmax loss function, naive implementation (with loops). Inputs: - X: A numpy array of shape (num_train, D) contain the training data consisting of num_train samples each of dimension D - y: A numpy array of shape (num_train,) contain the training labels, where y[i] is the label of X[i] - reg: float, regularization strength Return: - loss: the loss value between predict value and ground truth - dW: gradient of W """ # Initialize loss and dW loss = 0.0 dW = np.zeros(self.W.shape) # Compute the loss and dW num_train = X.shape[0] num_classes = self.W.shape[1] for i in range(num_train): scores = np.dot(X[i], self.W) scores -= np.max(scores) correct_class = y[i] correct_score = scores[correct_class] loss_i = -correct_score + np.log(np.sum(np.exp(scores))) loss += loss_i for j in range(num_classes): softmax_output = np.exp(scores[j]) / np.sum(np.exp(scores)) if j == correct_class: dW[:,j] += (-1 + softmax_output) * X[i,:] else: dW[:,j] += softmax_output * X[i,:] loss /= num_train loss += 0.5 * reg * np.sum(self.W * self.W) dW /= num_train dW += reg * self.W return loss, dW def loss_vectorized(self, X, y, reg): """ Structured Softmax loss function, vectorized implementation (without loops). Inputs: - X: A numpy array of shape (num_train, D) contain the training data consisting of num_train samples each of dimension D - y: A numpy array of shape (num_train,) contain the training labels, where y[i] is the label of X[i] - reg: float, regularization strength Return: - loss: the loss value between predict value and ground truth - dW: gradient of W """ # Initialize loss and dW loss = 0.0 dW = np.zeros(self.W.shape) # Compute the loss and dW num_train = X.shape[0] num_classes = self.W.shape[1] # loss scores = np.dot(X, self.W) scores -= np.max(scores, axis=1).reshape(-1, 1) softmax_output = np.exp(scores) / np.sum(np.exp(scores), axis=1).reshape(-1, 1) loss = np.sum(-np.log(softmax_output[range(softmax_output.shape[0]), list(y)])) loss /= num_train loss += 0.5 * reg * np.sum(self.W * self.W) # dW dS = softmax_output dS[range(dS.shape[0]), list(y)] += -1 dW = np.dot(X.T, dS) dW /= num_train dW += reg * self.W return loss, dW def train(self, X, y, learning_rate = 1e-3, reg = 1e-5, num_iters = 100, batch_size = 200, print_flag = False): """ Train Softmax classifier using SGD Inputs: - X: A numpy array of shape (num_train, D) contain the training data consisting of num_train samples each of dimension D - y: A numpy array of shape (num_train,) contain the training labels, where y[i] is the label of X[i], y[i] = c, 0 <= c <= C - learning rate: (float) learning rate for optimization - reg: (float) regularization strength - num_iters: (integer) numbers of steps to take when optimization - batch_size: (integer) number of training examples to use at each step - print_flag: (boolean) If true, print the progress during optimization Outputs: - loss_history: A list containing the loss at each training iteration """ loss_history = [] num_train = X.shape[0] dim = X.shape[1] num_classes = np.max(y) + 1 # Initialize W if self.W == None: self.W = 0.001 * np.random.randn(dim, num_classes) # iteration and optimization for t in range(num_iters): idx_batch = np.random.choice(num_train, batch_size, replace=True) X_batch = X[idx_batch] y_batch = y[idx_batch] loss, dW = self.loss_vectorized(X_batch, y_batch, reg) loss_history.append(loss) self.W += -learning_rate * dW if print_flag and t%100 == 0: print(iteration %d / %d: loss %f % (t, num_iters, loss)) return loss_history def predict(self, X): """ Use the trained weights of Softmax to predict data labels Inputs: - X: A numpy array of shape (num_train, D) contain the training data Outputs: - y_pred: A numpy array, predicted labels for the data in X """ y_pred = np.zeros(X.shape[0]) scores = np.dot(X, self.W) y_pred = np.argmax(scores, axis=1) return y_pred
4. Gradient Check
Define loss function
def loss_naive1(X, y, W, reg): """ Structured Softmax loss function, naive implementation (with loops). Inputs: - X: A numpy array of shape (num_train, D) contain the training data consisting of num_train samples each of dimension D - y: A numpy array of shape (num_train,) contain the training labels, where y[i] is the label of X[i] - W: A numpy array of shape (D, C) contain the weights - reg: float, regularization strength Return: - loss: the loss value between predict value and ground truth - dW: gradient of W """ # Initialize loss and dW loss = 0.0 dW = np.zeros(W.shape) # Compute the loss and dW num_train = X.shape[0] num_classes = W.shape[1] for i in range(num_train): scores = np.dot(X[i], W) scores -= np.max(scores) correct_class = y[i] correct_score = scores[correct_class] loss_i = -correct_score + np.log(np.sum(np.exp(scores))) loss += loss_i for j in range(num_classes): softmax_output = np.exp(scores[j]) / np.sum(np.exp(scores)) if j == correct_class: dW[:,j] += (-1 + softmax_output) * X[i,:] else: dW[:,j] += softmax_output * X[i,:] loss /= num_train loss += 0.5 * reg * np.sum(W * W) dW /= num_train dW += reg * W return loss, dWdef loss_vectorized1(X, y, W, reg): """ Structured Softmax loss function, vectorized implementation (without loops). Inputs: - X: A numpy array of shape (num_train, D) contain the training data consisting of num_train samples each of dimension D - y: A numpy array of shape (num_train,) contain the training labels, where y[i] is the label of X[i] - W: A numpy array of shape (D, C) contain the weights - reg: float, regularization strength Return: - loss: the loss value between predict value and ground truth - dW: gradient of W """ # Initialize loss and dW loss = 0.0 dW = np.zeros(W.shape) # Compute the loss and dW num_train = X.shape[0] num_classes = W.shape[1] # loss scores = np.dot(X, W) scores -= np.max(scores, axis=1).reshape(-1, 1) softmax_output = np.exp(scores) / np.sum(np.exp(scores), axis=1).reshape(-1, 1) loss = np.sum(-np.log(softmax_output[range(softmax_output.shape[0]), list(y)])) loss /= num_train loss += 0.5 * reg * np.sum(W * W) # dW dS = softmax_output dS[range(dS.shape[0]), list(y)] += -1 dW = np.dot(X.T, dS) dW /= num_train dW += reg * W return loss, dW
Gradient check
from gradient_check import grad_check_sparseimport time# generate a random SVM weight matrix of small numbersW = np.random.randn(3073, 10) * 0.0001# Without regularizationloss, dW = loss_naive1(X_dev, y_dev, W, 0)f = lambda W: loss_naive1(X_dev, y_dev, W, 0.0)[0]grad_numerical = grad_check_sparse(f, W, dW)# With regularizationloss, dW = loss_naive1(X_dev, y_dev, W, 5e1)f = lambda W: loss_naive1(X_dev, y_dev, W, 5e1)[0]grad_numerical = grad_check_sparse(f, W, dW)
numerical: 1.382074 analytic: 1.382074, relative error: 2.603780e-08
numerical: 0.587997 analytic: 0.587997, relative error: 2.764543e-08numerical: 2.466843 analytic: 2.466843, relative error: 8.029571e-09numerical: -1.840196 analytic: -1.840196, relative error: 1.781980e-09numerical: 1.444645 analytic: 1.444645, relative error: 6.200972e-08numerical: -1.381959 analytic: -1.381959, relative error: 1.643225e-08numerical: 1.122692 analytic: 1.122692, relative error: 1.600617e-08numerical: 1.249459 analytic: 1.249459, relative error: 2.936177e-09numerical: 1.556929 analytic: 1.556929, relative error: 1.452262e-08numerical: 1.976238 analytic: 1.976238, relative error: 1.619212e-08
numerical: 2.308430 analytic: 2.308430, relative error: 7.769452e-10numerical: -2.698441 analytic: -2.698440, relative error: 2.672068e-08numerical: 1.991475 analytic: 1.991475, relative error: 3.035301e-08numerical: -1.891048 analytic: -1.891048, relative error: 1.407403e-08numerical: 1.409085 analytic: 1.409085, relative error: 1.916174e-08numerical: 1.688600 analytic: 1.688600, relative error: 6.298778e-10numerical: -0.140043 analytic: -0.140043, relative error: 7.654000e-08numerical: -0.563577 analytic: -0.563577, relative error: 5.109196e-08numerical: 0.224879 analytic: 0.224879, relative error: 1.218421e-07numerical: -5.497099 analytic: -5.497099, relative error: 1.992705e-08
5. Stochastic Gradient Descent
softmax = Softmax()loss_history = softmax.train(X_train, y_train, learning_rate = 1e-7, reg = 2.5e4, num_iters = 1500, batch_size = 200, print_flag = True)
iteration 0 / 1500: loss 389.013148
iteration 100 / 1500: loss 235.704700iteration 200 / 1500: loss 142.948192iteration 300 / 1500: loss 87.236112iteration 400 / 1500: loss 53.494956iteration 500 / 1500: loss 33.153764iteration 600 / 1500: loss 20.907861iteration 700 / 1500: loss 13.442687iteration 800 / 1500: loss 8.929345iteration 900 / 1500: loss 6.238832iteration 1000 / 1500: loss 4.559590
iteration 1100 / 1500: loss 3.501153iteration 1200 / 1500: loss 2.924789iteration 1300 / 1500: loss 2.552109iteration 1400 / 1500: loss 2.370926
# Plot the loss_historyplt.plot(loss_history)plt.xlabel(Iteration number)plt.ylabel(loss value)plt.show()
# Use softmax classifier to predict# Training sety_pred = softmax.predict(X_train)num_correct = np.sum(y_pred == y_train)accuracy = np.mean(y_pred == y_train)print(Training correct %d/%d: The accuracy is %f % (num_correct, X_train.shape[0], accuracy))# Test sety_pred = softmax.predict(X_test)num_correct = np.sum(y_pred == y_test)accuracy = np.mean(y_pred == y_test)print(Test correct %d/%d: The accuracy is %f % (num_correct, X_test.shape[0], accuracy))
Training correct 17023/49000: The accuracy is 0.347408
Test correct 359/1000: The accuracy is 0.359000
6. Validation and Test
Cross-validation
learning_rates = [1.4e-7, 1.5e-7, 1.6e-7]regularization_strengths = [8000.0, 9000.0, 10000.0, 11000.0, 18000.0, 19000.0, 20000.0, 21000.0]results = {}best_lr = Nonebest_reg = Nonebest_val = -1 # The highest validation accuracy that we have seen so far.best_softmax = None # The LinearSVM object that achieved the highest validation rate.for lr in learning_rates: for reg in regularization_strengths: softmax = Softmax() loss_history = softmax.train(X_train, y_train, learning_rate = lr, reg = reg, num_iters = 3000) y_train_pred = softmax.predict(X_train) accuracy_train = np.mean(y_train_pred == y_train) y_val_pred = softmax.predict(X_val) accuracy_val = np.mean(y_val_pred == y_val) results[(lr, reg)] = accuracy_train, accuracy_val if accuracy_val > best_val: best_lr = lr best_reg = reg best_val = accuracy_val best_softmax = softmax print(lr: %e reg: %e train accuracy: %f val accuracy: %f % (lr, reg, results[(lr, reg)][0], results[(lr, reg)][1]))print(Best validation accuracy during cross-validation:
lr = %e, reg = %e, best_val = %f % (best_lr, best_reg, best_val))
lr: 1.400000e-07 reg: 8.000000e+03 train accuracy: 0.376388 val accuracy: 0.381000
lr: 1.400000e-07 reg: 9.000000e+03 train accuracy: 0.378061 val accuracy: 0.393000lr: 1.400000e-07 reg: 1.000000e+04 train accuracy: 0.375061 val accuracy: 0.394000lr: 1.400000e-07 reg: 1.100000e+04 train accuracy: 0.370918 val accuracy: 0.389000lr: 1.400000e-07 reg: 1.800000e+04 train accuracy: 0.361857 val accuracy: 0.378000
lr: 1.400000e-07 reg: 1.900000e+04 train accuracy: 0.354327 val accuracy: 0.373000lr: 1.400000e-07 reg: 2.000000e+04 train accuracy: 0.357531 val accuracy: 0.370000lr: 1.400000e-07 reg: 2.100000e+04 train accuracy: 0.351837 val accuracy: 0.374000lr: 1.500000e-07 reg: 8.000000e+03 train accuracy: 0.380429 val accuracy: 0.387000lr: 1.500000e-07 reg: 9.000000e+03 train accuracy: 0.375959 val accuracy: 0.393000lr: 1.500000e-07 reg: 1.000000e+04 train accuracy: 0.373857 val accuracy: 0.397000lr: 1.500000e-07 reg: 1.100000e+04 train accuracy: 0.371918 val accuracy: 0.386000lr: 1.500000e-07 reg: 1.800000e+04 train accuracy: 0.359735 val accuracy: 0.379000lr: 1.500000e-07 reg: 1.900000e+04 train accuracy: 0.359796 val accuracy: 0.373000lr: 1.500000e-07 reg: 2.000000e+04 train accuracy: 0.352041 val accuracy: 0.365000
lr: 1.500000e-07 reg: 2.100000e+04 train accuracy: 0.356531 val accuracy: 0.372000lr: 1.600000e-07 reg: 8.000000e+03 train accuracy: 0.378265 val accuracy: 0.394000lr: 1.600000e-07 reg: 9.000000e+03 train accuracy: 0.377980 val accuracy: 0.391000lr: 1.600000e-07 reg: 1.000000e+04 train accuracy: 0.371429 val accuracy: 0.389000lr: 1.600000e-07 reg: 1.100000e+04 train accuracy: 0.374224 val accuracy: 0.391000lr: 1.600000e-07 reg: 1.800000e+04 train accuracy: 0.360796 val accuracy: 0.386000lr: 1.600000e-07 reg: 1.900000e+04 train accuracy: 0.355592 val accuracy: 0.371000lr: 1.600000e-07 reg: 2.000000e+04 train accuracy: 0.356122 val accuracy: 0.368000lr: 1.600000e-07 reg: 2.100000e+04 train accuracy: 0.354143 val accuracy: 0.367000Best validation accuracy during cross-validation:lr = 1.500000e-07, reg = 1.000000e+04, best_val = 0.397000
Visualize the cross-validation result
import mathx_scatter = [math.log10(x[0]) for x in results]y_scatter = [math.log10(x[1]) for x in results]maker_size = 100plt.figure(figsize=(10,10))# training accuracyplt.subplot(2, 1, 1)colors = [results[x][0] for x in results]plt.scatter(x_scatter, y_scatter, maker_size, c=colors)plt.colorbar()plt.xlabel(log learning rate)plt.ylabel(log regularization strength)# validation accuracyplt.subplot(2, 1, 2)colors = [results[x][1] for x in results]plt.scatter(x_scatter, y_scatter, maker_size, c=colors)plt.colorbar()plt.xlabel(log learning rate)plt.ylabel(log regularization strength)plt.show()
Use the best softmax to test
y_pred = best_softmax.predict(X_test)num_correct = np.sum(y_pred == y_test)accuracy = np.mean(y_pred == y_test)print(Test correct %d/%d: The accuracy is %f % (num_correct, num_test, accuracy))
Test correct 379/1000: The accuracy is 0.379000
Visualize the weights for each class
W = best_softmax.W[:-1,:] # delete the biasW = np.reshape(W, (32, 32, 3, 10))W_max, W_min = np.max(W), np.min(W)classes = [plane, car, bird, cat, deer, dog, frog, horse, ship, truck]plt.figure(figsize=(10,5))for i in range(10): plt.subplot(2, 5, i+1) # Rescale the weights to be between 0 and 255 imgW = 255.0 * (W[:,:,:,i] - W_min) / (W_max - W_min) plt.imshow(imgW.astype(uint8)) plt.axis(off) plt.title(classes[i])
推薦閱讀:
※人工智慧浪潮襲來,人才儲備卻成致命短板
※全球最聰明的大腦怎麼看AI?他們預測了這13大發展趨勢
※據說是世界上第一門探討「強人工智慧」的公開課開課了!
※送書 | AI插畫師:如何用基於PyTorch的生成對抗網路生成動漫頭像?
※智能行業如何發展?
TAG:機器學習 | 深度學習DeepLearning | 人工智慧 |