Python最好用的科學計算庫：NumPy快速入門教程（三）

08-16

來自專欄深度學習+自然語言處理（NLP）5 人贊了文章

該文章接上一篇Python最好用的科學計算庫：NumPy快速入門教程（二）

import numpy as np%matplotlib inline

深入理解 NumPy

廣播

廣播操作允許通用函數能夠對不是嚴格相同形狀的輸入進行處理。

廣播的第一個規則是，如果所有輸入數組不具有相同數量的維度，則將「1」重複地預先添加到形狀較小的數組中，直到所有數組具有相同數量的維度。

廣播的第二個規則是，確保有某個維度大小為1的數組，操作起來像該維度與在該維度上具有最大形狀的數組一樣。假定數組的元素沿著「廣播」的那個維度是相同的。

應用了廣播規則後，所有數組的維度應該相互匹配。更多請查看Broadcasting

高級索引和技巧

NumPy比Python的內置類型提供了更多的索引工具。除了我們之前提到過了切片和通過整數索引以外，還可以通過整數數組和布爾數組進行索引。

通過數組索引

>>> a = np.arange(12)**2 # 創建1-12個數的平方值>>> i = np.array( [ 1,1,3,8,5 ] ) # 創建一個用於索引的數組>>> a[i] # 數組a中位置i的值array([ 1, 1, 9, 64, 25])>>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] ) # 創建一個用於索引的二維數組>>> a[j] # 通過j索引得到與j相同維度的數組array([[ 9, 16], [81, 49]])

當被索引的數組a是多維度的，單一的索引數組只對a的第一個維度操作。以下的例子展示的這個特性，該例子將使用一個『調色板』將一個標籤的圖片轉換為彩色圖片。

>>> palette = np.array( [ [0,0,0], # 黑色... [255,0,0], # 紅色... [0,255,0], # 綠色... [0,0,255], # 藍色... [255,255,255] ] ) # 白色>>> image = np.array( [ [ 0, 1, 2, 0 ], # 創建一個用於索引palette數組的數組... [ 0, 3, 4, 0 ] ] )>>> palette[image] # 生成的圖片形狀是(2，4，3),因為索引數組的形狀是(2，4),被索引數組的... # 第一個維度中每個元素都是一個形狀為(3)的數組array([[[ 0, 0, 0], [255, 0, 0], [ 0, 255, 0], [ 0, 0, 0]], [[ 0, 0, 0], [ 0, 0, 255], [255, 255, 255], [ 0, 0, 0]]])

我們也可以使用多維索引數組。多維索引數組的維度必須與被索引數組的維度一致。

>>> a = np.arange(12).reshape(3,4)>>> aarray([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])>>> i = np.array( [ [0,1], # 用於索引a的第一維度的數組... [1,2] ] )>>> j = np.array( [ [2,1], # 用於索引a的第二維度的數組... [3,3] ] )>>> b = a[i,j] # i和j必須具有相同的維度, b[1,0]相當於a[i[1,0],j[1,0]]>>> barray([[ 2, 5], [ 7, 11]])>>> a[i,2]array([[ 2, 6], [ 6, 10]])>>> a[:,j] # i.e., a[ : , j]array([[[ 2, 1], [ 3, 3]], [[ 6, 5], [ 7, 7]], [[10, 9], [11, 11]]])

我們自然的想到，可以將i和j放在一個python內置類型中（比如list），然後使用list進行索引。

>>> l = [i,j]>>> a[l] # 相當於a[i,j]array([[ 2, 5], [ 7, 11]])

但是，我們不能將i和j組成一個數組，因為用一個數組索引將被理解為對a的第一個維度進行索引。

>>> s = np.array( [i,j] )>>> a[s] # not what we want---------------------------------------------------------------------------IndexError Traceback (most recent call last)<ipython-input-30-8891efffcfb5> in <module>() 1 s = np.array( [i,j] )----> 2 a[s] # not what we wantIndexError: index 3 is out of bounds for axis 0 with size 3>>> a[tuple(s)] # 相當於a[i,j]array([[ 2, 5], [ 7, 11]])

另一個使用索引數組的應用是搜索一個時間序列的最大值。

>>> time = np.linspace(20, 145, 5) # time scale>>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series>>> timearray([ 20. , 51.25, 82.5 , 113.75, 145. ])>>> dataarray([[ 0. , 0.84147098, 0.90929743, 0.14112001], [-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ], [ 0.98935825, 0.41211849, -0.54402111, -0.99999021], [-0.53657292, 0.42016704, 0.99060736, 0.65028784], [-0.28790332, -0.96139749, -0.75098725, 0.14987721]])>>> ind = data.argmax(axis=0) # 每個序列最大值的索引>>> indarray([2, 0, 3, 1])>>> time_max = time[ind] # 最大值對應的時間>>> data_max = data[ind, range(data.shape[1])] # 相當於 data[ind[0],0], data[ind[1],1]...>>> time_maxarray([ 82.5 , 20. , 113.75, 51.25])>>> data_maxarray([0.98935825, 0.84147098, 0.99060736, 0.6569866 ])>>> np.all(data_max == data.max(axis=0)) # 以上操作相當於data.max(axis = 0)True

你也可以使用索引序列指定賦值的目標：

>>> a = np.arange(5)>>> aarray([0, 1, 2, 3, 4])>>> a[[1,3,4]] = 0>>> aarray([0, 0, 2, 0, 0])

但是，當索引包含重複值，賦值操作被執行多次，並保留最後一次的值。

>>> a = np.arange(5)>>> a[[0,0,2]]=[1,2,3]>>> aarray([2, 1, 3, 3, 4])

這很合理，但是，當你想使用Python的+=操作時要小心，因為它可能得到的不是你想要的結果。

>>> a = np.arange(5)>>> a[[0,0,2]]+=1>>> aarray([1, 1, 3, 3, 4])

儘管0在列表中出現了兩次，但是第0個元素只增加了1次。因為，在python中，「a+=1」等於「a = a + 1」。

用布爾數組做索引

當我們使用整數數組索引時，我們提供了我們想要選擇的值的索引。但是使用布爾數組時，方式就變了；我們明確地選擇哪個值是我們想要的，哪個不是。

最自然的使用方式就是用一個與原數組形狀相同的布爾數組作為索引。

>>> a = np.arange(12).reshape(3,4)>>> b = a > 4>>> b #我們得到了一個與a形狀一樣的布爾數組array([[False, False, False, False], [False, True, True, True], [ True, True, True, True]])>>> a[b] # 使用b作為索引數組將返回一個一維數組array([ 5, 6, 7, 8, 9, 10, 11])

這個性質在賦值時非常有用：

>>> a[b] = 0 # 所有大於4的值被賦值為0>>> aarray([[0, 1, 2, 3], [4, 0, 0, 0], [0, 0, 0, 0]])

下面這個例子關於如何使用布爾數組生成Mandelbrot set的圖片。

>>> import numpy as np>>> import matplotlib.pyplot as plt>>> def mandelbrot( h,w, maxit=20 ):... """Returns an image of the Mandelbrot fractal of size (h,w)."""... y,x = np.ogrid[ -1.4:1.4:h*1j, -2:0.8:w*1j ]... c = x+y*1j... z = c... divtime = maxit + np.zeros(z.shape, dtype=int)...... for i in range(maxit):... z = z**2 + c... diverge = z*np.conj(z) > 2**2 # who is diverging... div_now = diverge & (divtime==maxit) # who is diverging now... divtime[div_now] = i # note when... z[diverge] = 2 # avoid diverging too much...... return divtime>>> plt.imshow(mandelbrot(400,400))>>> plt.show();

使用布爾數組進行索引的第二種方式與使用整數索引類似；對於數組的每個維度，我們使用一個一維的布爾數組選擇我們想要的切片。

>>> a = np.arange(12).reshape(3,4)>>> b1 = np.array([False,True,True]) # 用於選擇第一個維度的值>>> b2 = np.array([True,False,True,False]) # 用於選擇第二個維度的值>>> a[b1,:] #選擇行array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]])>>> a[:,b2] #選擇列array([[ 0, 2], [ 4, 6], [ 8, 10]])>>> a[b1,b2] #很意外的結果array([ 4, 10])

需要注意的是，你的一維布爾索引的大小必須與你要索引的原數組的對應維度相同。

`ix_()`函數

ix_函數可用於組合不同的向量，以便獲得對每個向量元素進行操作的結果。例如，如果要計算從每個向量a，b和c中取得的所有a + b * c的值：

>>> a = np.array([2,3,4,5])>>> b = np.array([8,5,4])>>> c = np.array([5,4,6,8,3])>>> ax,bx,cx = np.ix_(a,b,c)>>> axarray([[[2]], [[3]], [[4]], [[5]]])>>> bxarray([[[8], [5], [4]]])>>> cxarray([[[5, 4, 6, 8, 3]]])>>> ax.shape, bx.shape, cx.shape((4, 1, 1), (1, 3, 1), (1, 1, 5))>>> result = ax+bx*cx>>> resultarray([[[42, 34, 50, 66, 26], [27, 22, 32, 42, 17], [22, 18, 26, 34, 14]], [[43, 35, 51, 67, 27], [28, 23, 33, 43, 18], [23, 19, 27, 35, 15]], [[44, 36, 52, 68, 28], [29, 24, 34, 44, 19], [24, 20, 28, 36, 16]], [[45, 37, 53, 69, 29], [30, 25, 35, 45, 20], [25, 21, 29, 37, 17]]])>>> result[3,2,4]17>>> a[3]+b[2]*c[4] #對於三個不同維度的數組，我們大大簡化了使用三個嵌套的循環來計算所有對應值的a+b*c的操作17

你也可是使用下面的方式部署：

>>> def ufunc_reduce(ufct, *vectors):... vs = np.ix_(*vectors)... r = ufct.identity... for v in vs:... r = ufct(r,v)... return r

然後這樣使用

>>> ufunc_reduce(np.add,a,b,c)array([[[15, 14, 16, 18, 13], [12, 11, 13, 15, 10], [11, 10, 12, 14, 9]], [[16, 15, 17, 19, 14], [13, 12, 14, 16, 11], [12, 11, 13, 15, 10]], [[17, 16, 18, 20, 15], [14, 13, 15, 17, 12], [13, 12, 14, 16, 11]], [[18, 17, 19, 21, 16], [15, 14, 16, 18, 13], [14, 13, 15, 17, 12]]])

這個版本的reduce的優點是它利用了廣播規則，以避免創建一個參數數組，輸出的大小乘以向量的數量。

通過字元串索引

見https://docs.scipy.org/doc/numpy/user/basics.rec.html#structured-arrays

線性代數

繼續介紹基本的線性代數運算。

簡單數組運算

在numpy文件夾的linalg.py查看更多內容。

>>> import numpy as np>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])>>> print(a)[[1. 2.] [3. 4.]]>>> a.transpose() # 數組的轉置array([[1., 3.], [2., 4.]])>>> np.linalg.inv(a) #矩陣求逆，linalg=linear+algebraarray([[-2. , 1. ], [ 1.5, -0.5]])>>> u = np.eye(2) # 創建一個2x2的單位對角矩陣>>> uarray([[1., 0.], [0., 1.]])>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])>>> np.dot (j, j) # 矩陣內積array([[-1., 0.], [ 0., -1.]])>>> np.trace(u) # 矩陣的跡2.0>>> y = np.array([[5.], [7.]])>>> np.linalg.solve(a, y) #求解由a和y組成的方程組array([[-3.], [ 4.]])>>> np.linalg.eig(j) #計算矩陣的特徵向量(array([0.+1.j, 0.-1.j]), array([[0.70710678+0.j , 0.70710678-0.j ], [0. -0.70710678j, 0. +0.70710678j]]))

一些技巧

這裡我們列出了一些比較實用的技巧。

自動重塑形狀

改變一個數組的形狀，你可以忽略其中一個維度，那個維度將自動推斷應該是多少：

>>> a = np.arange(30)>>> a.shape = 2,-1,3 # -1 的意思是「它應該是的值」>>> a.shape(2, 5, 3)>>> aarray([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]], [[15, 16, 17], [18, 19, 20], [21, 22, 23], [24, 25, 26], [27, 28, 29]]])

向量堆疊

我們如何從一系列同樣大小的行向量創建一個二維數組？在matlab中，很簡單，如果x和y是同樣長度的兩個向量，那麼我們需要做的就是m=[x;y]。在NumPy中，我們使用column_stack, dstack, hstack 和vstack來實現，取決於你想要在哪個維度上堆疊你的向量。例如：

x = np.arange(0,10,2) # x=([0,2,4,6,8])y = np.arange(5) # y=([0,1,2,3,4])m = np.vstack([x,y]) # m=([[0,2,4,6,8], # [0,1,2,3,4]])xy = np.hstack([x,y]) # xy =([0,2,4,6,8,0,1,2,3,4])

直方圖

NumPy的histogram方法返回兩個向量：數組的直方圖數據和間隔向量。需要注意的是matplotlib也有用來創建直方圖的函數（hist）。兩者的不同是，pylab.hist直接將直方圖畫出來，而numpy.histogram只生成數據。

>>> import numpy as np>>> import matplotlib.pyplot as plt>>> # Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2>>> mu, sigma = 2, 0.5>>> v = np.random.normal(mu,sigma,10000)>>> # Plot a normalized histogram with 50 bins>>> plt.hist(v, bins=50, density=1); # matplotlib version (plot)>>> plt.show()

>>> #這個是使用NumPy的histogram函數>>> (n, bins) = np.histogram(v, bins=50, normed=True) # NumPy version (no plot)>>> plt.plot(.5*(bins[1:]+bins[:-1]), n)>>> plt.show()

感謝閱讀，有任何問題可以留言探討。

原文地址：https://docs.scipy.org/doc/numpy/user/quickstart.html#less-basic

Python最好用的科學計算庫：NumPy快速入門教程（三）

深入理解 NumPy

廣播

高級索引和技巧

通過數組索引

用布爾數組做索引

ix_()函數

通過字元串索引

線性代數

簡單數組運算

一些技巧

自動重塑形狀

向量堆疊

直方圖

`ix_()`函數