cs224n assignment1:關於實現softmax的總結

02-05

softmax的重要性質

因為softmax是單調遞增函數，因此不改變原始數據的大小順序。
將原始輸入映射到(0,1)區間，並且總和為1，常用於表徵概率。
softmax(x) = softmax(x+c), 這個性質用於保證數值的穩定性。

最後一個性質很重要，也是寫softmax函數進行優化的重要手段。我淺顯的理解結合性質2，反正都要映射到(0,1)區間，那麼對數組裡的每個元素加上一個常數實際上並不影響最終的結果了。

根據softmax的公式定義：假設我們有一個數組，V，Vi表示V中的第i個元素，那麼這個元素的softmax值就是

那麼用numpy去寫這個function的時候就會想當然的這樣去寫：

import numpy as npnndef softmax(x):n"""Compute the softmax of vector x."""n exp_x = np.exp(x)n softmax_x = exp_x / np.sum(exp_x)n return softmax_xn

對於小數值一切正常

softmax([1, 2, 3])

array([0.09003057, 0.24472847, 0.66524096])

但遇到較大的數值向量時就有問題了，2333

softmax([1000, 2000, 3000])

array([nan, nan, nan])

這是由numpy中的浮點型數值範圍限制所導致的。當輸入一個較大的數值時，sofmax函數將會超出限制，導致出錯。

為了解決這一問題，這時我們就能用到sofmax的第三個性質，即：softmax(x) = softmax(x+c)，

一般在實際運用中，通常設定c = - max(x)。

上網查了才知道，大佬們原諒我這個弱渣。。。

import numpy as npndef softmax(x):n"""Compute the softmax in a numerically stable way."""n x = x - np.max(x)n exp_x = np.exp(x)n softmax_x = exp_x / np.sum(exp-x)n return softmax_xn

然後是基於矩陣的實現，看了下基本上有兩種寫法：

import numpy as npndef softmax(x):n """n Compute the softmax function for each row of the input x.nn Arguments:n x -- A N dimensional vector or M x N dimensional numpy matrix.nn Return:n x -- You are allowed to modify x in-placen """n orig_shape = x.shapenn if len(x.shape) > 1:n # Matrixn exp_minmax = lambda x: np.exp(x - np.max(x))n denom = lambda x: 1.0 / np.sum(x)n x = np.apply_along_axis(exp_minmax,1,x)n denominator = np.apply_along_axis(denom,1,x) n n if len(denominator.shape) == 1:n denominator = denominator.reshape((denominator.shape[0],1))n n x = x * denominatorn else:n # Vectorn x_max = np.max(x)n x = x - x_maxn numerator = np.exp(x)n denominator = 1.0 / np.sum(numerator)n x = numerator.dot(denominator)n n assert x.shape == orig_shapen return xn

np.apply_along_axis可以去查下文檔，e.g. x = np.matrix([[1,2], [3,4]]), denom = lambda x: 1.0 / np.sum(x), np.apply_along_axis(denom, 1, x),結果：array([ 0.33333333, 0.14285714])，由於指定的是橫軸，所以結果相當於1.0/（1+2），1.0/（3+4），如果將函數apply_along_axis的參數改為0，就相當於1.0/(1+3), 1.0/(2+4),結果自然就是array([ 0.25 , 0.16666667])

def softmax(x):n """Compute the softmax function for each row of the input x.n It is crucial that this function is optimized for speed becausen it will be used frequently in later code. You might find numpyn functions np.exp, np.sum, np.reshape, np.max, and numpyn broadcasting useful for this task.n Numpy broadcasting documentation:n http://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmln You should also make sure that your code works for a singlen N-dimensional vector (treat the vector as a single row) andn for M x N matrices. This may be useful for testing later. Also,n make sure that the dimensions of the output match the input.n You must implement the optimization in problem 1(a) of then written assignment!n Arguments:n x -- A N dimensional vector or M x N dimensional numpy matrix.n Return:n x -- You are allowed to modify x in-placen """n orig_shape = x.shapenn if len(x.shape) > 1:n # Matrixn ### YOUR CODE HEREn x_max = np.max(x, axis=1).reshape(x.shape[0], 1)n x -= x_maxn exp_sum = np.sum(np.exp(x), axis=1).reshape(x.shape[0], 1)n x = np.exp(x) / exp_sumn ### END YOUR CODEn else:n # Vectorn ### YOUR CODE HEREn x_max = np.max(x)n x -= x_maxn exp_sum = np.sum(np.exp(x))n x = np.exp(x) / exp_sumn ### END YOUR CODEnn assert x.shape == orig_shapen return xn

大概思路可以總結如下：

[[1,2], [3,4]] => [2, 4](按行找出每行最大值) =>[[-1, 0],[-1, 0]](按行減去每行的最大值) => $[e^{-1}+e^{0}, e^{-1}+e^{0}]$ (計算每行每個元素指數值的和）=>即[1.36787944,1.36787944]=>

[[-1,0],[-1,0]]之前減去每行最大值的矩陣，每個元素都進行指數計算， $[[e^{-1}, e^{0}], [e^{-1}, e^{0}]]$ => 這個矩陣除以[1.36787944,1.36787944]

ps: 附上我的智障實現github地址，大佬們輕噴。。。

willwinworld/cs224n-my-versiongithub.com