cs224n assignment1:關於實現softmax的總結


  1. 因為softmax是單調遞增函數,因此不改變原始數據的大小順序。
  2. 將原始輸入映射到(0,1)區間,並且總和為1,常用於表徵概率。
  3. softmax(x) = softmax(x+c), 這個性質用於保證數值的穩定性。




import numpy as npnndef softmax(x):n"""Compute the softmax of vector x."""n exp_x = np.exp(x)n softmax_x = exp_x / np.sum(exp_x)n return softmax_xn


softmax([1, 2, 3])

array([0.09003057, 0.24472847, 0.66524096])


softmax([1000, 2000, 3000])

array([nan, nan, nan])


為了解決這一問題,這時我們就能用到sofmax的第三個性質,即:softmax(x) = softmax(x+c),

一般在實際運用中,通常設定c = - max(x)。


import numpy as npndef softmax(x):n"""Compute the softmax in a numerically stable way."""n x = x - np.max(x)n exp_x = np.exp(x)n softmax_x = exp_x / np.sum(exp-x)n return softmax_xn


import numpy as npndef softmax(x):n """n Compute the softmax function for each row of the input x.nn Arguments:n x -- A N dimensional vector or M x N dimensional numpy matrix.nn Return:n x -- You are allowed to modify x in-placen """n orig_shape = x.shapenn if len(x.shape) > 1:n # Matrixn exp_minmax = lambda x: np.exp(x - np.max(x))n denom = lambda x: 1.0 / np.sum(x)n x = np.apply_along_axis(exp_minmax,1,x)n denominator = np.apply_along_axis(denom,1,x) n n if len(denominator.shape) == 1:n denominator = denominator.reshape((denominator.shape[0],1))n n x = x * denominatorn else:n # Vectorn x_max = np.max(x)n x = x - x_maxn numerator = np.exp(x)n denominator = 1.0 / np.sum(numerator)n x = numerator.dot(denominator)n n assert x.shape == orig_shapen return xn

np.apply_along_axis可以去查下文檔,e.g. x = np.matrix([[1,2], [3,4]]), denom = lambda x: 1.0 / np.sum(x), np.apply_along_axis(denom, 1, x),結果:array([ 0.33333333, 0.14285714]),由於指定的是橫軸,所以結果相當於1.0/(1+2),1.0/(3+4),如果將函數apply_along_axis的參數改為0,就相當於1.0/(1+3), 1.0/(2+4),結果自然就是array([ 0.25 , 0.16666667])

def softmax(x):n """Compute the softmax function for each row of the input x.n It is crucial that this function is optimized for speed becausen it will be used frequently in later code. You might find numpyn functions np.exp, np.sum, np.reshape, np.max, and numpyn broadcasting useful for this task.n Numpy broadcasting documentation:n http://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmln You should also make sure that your code works for a singlen N-dimensional vector (treat the vector as a single row) andn for M x N matrices. This may be useful for testing later. Also,n make sure that the dimensions of the output match the input.n You must implement the optimization in problem 1(a) of then written assignment!n Arguments:n x -- A N dimensional vector or M x N dimensional numpy matrix.n Return:n x -- You are allowed to modify x in-placen """n orig_shape = x.shapenn if len(x.shape) > 1:n # Matrixn ### YOUR CODE HEREn x_max = np.max(x, axis=1).reshape(x.shape[0], 1)n x -= x_maxn exp_sum = np.sum(np.exp(x), axis=1).reshape(x.shape[0], 1)n x = np.exp(x) / exp_sumn ### END YOUR CODEn else:n # Vectorn ### YOUR CODE HEREn x_max = np.max(x)n x -= x_maxn exp_sum = np.sum(np.exp(x))n x = np.exp(x) / exp_sumn ### END YOUR CODEnn assert x.shape == orig_shapen return xn


[[1,2], [3,4]] => [2, 4](按行找出每行最大值) =>[[-1, 0],[-1, 0]](按行減去每行的最大值) => [e^{-1}+e^{0}, e^{-1}+e^{0}](計算每行每個元素指數值的和)=>即[1.36787944,1.36787944]=>

[[-1,0],[-1,0]]之前減去每行最大值的矩陣,每個元素都進行指數計算, [[e^{-1}, e^{0}], [e^{-1}, e^{0}]] => 這個矩陣除以[1.36787944,1.36787944]

ps: 附上我的智障實現github地址,大佬們輕噴。。。




