用Python實現機器學習演算法：邏輯回歸

02-07

邏輯回歸

https://github.com/lawlite19/MachineLearning_Python/tree/master/LogisticRegression

全部代碼

https://github.com/lawlite19/MachineLearning_Python/blob/master/LogisticRegression/LogisticRegression.py

代價函數

可以綜合起來為：

其中：

為什麼不用線性回歸的代價函數表示，因為線性回歸的代價函數可能是非凸的，對於分類問題，使用梯度下降很難得到最小值，上面的代價函數是凸函數

的圖像如下，即y=1時：

可以看出，當

趨於1，y=1,與預測值一致，此時付出的代價cost趨於0，若

趨於0，y=1,此時的代價cost值非常大，我們最終的目的是最小化代價值

同理

的圖像如下（y=0）：

梯度同樣對代價函數求偏導：

可以看出與線性回歸的偏導數一致

推導過程

正則化目的是為了防止過擬合

在代價函數中加上一項

注意j是重1開始的，因為theta(0)為一個常數項，X中最前面一列會加上1列1，所以乘積還是theta(0),feature沒有關係，沒有必要正則化

正則化後的代價：

# 代價函數def costFunction(initial_theta,X,y,inital_lambda): m = len(y) J = 0 h = sigmoid(np.dot(X,initial_theta)) # 計算h(z) theta1 = initial_theta.copy() # 因為正則化j=1從1開始，不包含0，所以複製一份，前theta(0)值為0 theta1[0] = 0 temp = np.dot(np.transpose(theta1),theta1) J = (-np.dot(np.transpose(y),np.log(h))-np.dot(np.transpose(1-y),np.log(1-h))+temp*inital_lambda/2)/m # 正則化的代價方程 return J

正則化後的代價的梯度

# 計算梯度 def gradient(initial_theta,X,y,inital_lambda): m = len(y) grad = np.zeros((initial_theta.shape[0])) h = sigmoid(np.dot(X,initial_theta))# 計算h(z) theta1 = initial_theta.copy() theta1[0] = 0 grad = np.dot(np.transpose(X),h-y)/m+inital_lambda/m*theta1 #正則化的梯度 return grad

S型函數（即

）實現代碼

# S型函數def sigmoid(z): h = np.zeros((len(z),1)) # 初始化，與z的長度一置 h = 1.0/(1.0+np.exp(-z)) return h

映射為多項式因為數據的feture可能很少，導致偏差大，所以創造出一些feture結合

eg:映射為2次方的形式:

實現代碼：

# 映射為多項式 def mapFeature(X1,X2): degree = 3; # 映射的最高次方 out = np.ones((X1.shape[0],1)) # 映射後的結果數組（取代X）這裡以degree=2為例，映射為1,x1,x2,x1^2,x1,x2,x2^2 for i in np.arange(1,degree+1): for j in range(i+1): temp = X1**(i-j)*(X2**j) #矩陣直接乘相當於matlab中的點乘.* out = np.hstack((out, temp.reshape(-1,1))) return out

使用scipy的優化方法梯度下降使用scipy中optimize中的fmin_bfgs函數

調用scipy中的優化演算法fmin_bfgs（擬牛頓法Broyden-Fletcher-Goldfarb-Shanno

costFunction是自己實現的一個求代價的函數，

initial_theta表示初始化的值,

fprime指定costFunction的梯度

args是其餘測參數，以元組的形式傳入，最後會將最小化costFunction的theta返回

result = optimize.fmin_bfgs(costFunction, initial_theta, fprime=gradient, args=(X,y,initial_lambda))

運行結果data1決策邊界和準確度

data2決策邊界和準確度

使用scikit-learn庫中的邏輯回歸模型實現

https://github.com/lawlite19/MachineLearning_Python/blob/master/LogisticRegression/LogisticRegression_scikit-learn.py

導入包

from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import StandardScaler from sklearn.cross_validation import train_test_split import numpy as np

劃分訓練集和測試集

x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

歸一化

scaler = StandardScaler() scaler.fit(x_train) x_train = scaler.fit_transform(x_train) x_test = scaler.fit_transform(x_test)

邏輯回歸

model = LogisticRegression() model.fit(x_train,y_train)

預測

predict = model.predict(x_test) right = sum(predict == y_test) predict = np.hstack((predict.reshape(-1,1),y_test.reshape(-1,1))) # 將預測值和真實值放在一塊，好觀察 print predict print (測試集準確率：%f%%%(right*100.0/predict.shape[0])) #計算在測試集上的準確度

邏輯回歸_手寫數字識別_OneVsAll

https://github.com/lawlite19/MachineLearning_Python/blob/master/LogisticRegression

全部代碼

https://github.com/lawlite19/MachineLearning_Python/blob/master/LogisticRegression/LogisticRegression_OneVsAll.py

隨機顯示100個數字我沒有使用scikit-learn中的數據集，像素是20*20px，彩色圖如下

灰度圖：

實現代碼：

# 顯示100個數字def display_data(imgData): sum = 0 顯示100個數（若是一個一個繪製將會非常慢，可以將要畫的數字整理好，放到一個矩陣中，顯示這個矩陣即可） - 初始化一個二維數組 - 將每行的數據調整成圖像的矩陣，放進二維數組 - 顯示即可 pad = 1 display_array = -np.ones((pad+10*(20+pad),pad+10*(20+pad))) for i in range(10): for j in range(10): display_array[pad+i*(20+pad):pad+i*(20+pad)+20,pad+j*(20+pad):pad+j*(20+pad)+20] = (imgData[sum,:].reshape(20,20,order="F")) # order=F指定以列優先，在matlab中是這樣的，python中需要指定，默認以行 sum += 1 plt.imshow(display_array,cmap=gray) #顯示灰度圖像 plt.axis(off) plt.show()

OneVsAll如何利用邏輯回歸解決多分類的問題，OneVsAll就是把當前某一類看成一類，其他所有類別看作一類，這樣有成了二分類的問題了

如下圖，把途中的數據分成三類，先把紅色的看成一類，把其他的看作另外一類，進行邏輯回歸，然後把藍色的看成一類，其他的再看成一類，以此類推...

可以看出大於2類的情況下，有多少類就要進行多少次的邏輯回歸分類

手寫數字識別共有0-9，10個數字，需要10次分類

由於數據集y給出的是0,1,2...9的數字，而進行邏輯回歸需要0/1的label標記，所以需要對y處理

說一下數據集，前500個是0,500-1000是1,...,所以如下圖，處理後的y，前500行的第一列是1，其餘都是0,500-1000行第二列是1，其餘都是0....

然後調用梯度下降演算法求解theta

實現代碼：

# 求每個分類的theta，最後返回所有的all_theta def oneVsAll(X,y,num_labels,Lambda): # 初始化變數 m,n = X.shape all_theta = np.zeros((n+1,num_labels)) # 每一列對應相應分類的theta,共10列 X = np.hstack((np.ones((m,1)),X)) # X前補上一列1的偏置bias class_y = np.zeros((m,num_labels)) # 數據的y對應0-9，需要映射為0/1的關係 initial_theta = np.zeros((n+1,1)) # 初始化一個分類的theta # 映射y for i in range(num_labels): class_y[:,i] = np.int32(y==i).reshape(1,-1) # 注意reshape(1,-1)才可以賦值 #np.savetxt("class_y.csv", class_y[0:600,:], delimiter=,) 遍歷每個分類，計算對應的theta值 for i in range(num_labels): result = optimize.fmin_bfgs(costFunction, initial_theta, fprime=gradient, args=(X,class_y[:,i],Lambda)) # 調用梯度下降的優化方法 all_theta[:,i] = result.reshape(1,-1) # 放入all_theta中 all_theta = np.transpose(all_theta) return all_theta

預測之前說過，預測的結果是一個概率值，利用學習出來的theta代入預測的S型函數中，每行的最大值就是是某個數字的最大概率，所在的列號就是預測的數字的真實值,因為在分類時，所有為0的將y映射在第一列，為1的映射在第二列，依次類推

實現代碼：

# 預測def predict_oneVsAll(all_theta,X): m = X.shape[0] num_labels = all_theta.shape[0] p = np.zeros((m,1)) X = np.hstack((np.ones((m,1)),X)) #在X最前面加一列1 h = sigmoid(np.dot(X,np.transpose(all_theta))) #預測返回h中每一行最大值所在的列號 - np.max(h, axis=1)返回h中每一行的最大值（是某個數字的最大概率） - 最後where找到的最大概率所在的列號（列號即是對應的數字） p = np.array(np.where(h[0,:] == np.max(h, axis=1)[0])) for i in np.arange(1, m): t = np.array(np.where(h[i,:] == np.max(h, axis=1)[i])) p = np.vstack((p,t)) return p

運行結果10次分類，在訓練集上的準確度：

使用scikit-learn庫中的邏輯回歸模型實現https://github.com/lawlite19/MachineLearning_Python/blob/master/LogisticRegression/LogisticRegression_OneVsAll_scikit-learn.py

導入包

from scipy import io as spio import numpy as np from sklearn import svm from sklearn.linear_model import LogisticRegression

載入數據

data = loadmat_data("data_digits.mat") X = data[X] # 獲取X數據，每一行對應一個數字20x20px y = data[y] # 這裡讀取mat文件y的shape=(5000, 1) y = np.ravel(y) # 調用sklearn需要轉化成一維的(5000,)

擬合模型

model = LogisticRegression() model.fit(X, y) # 擬合

預測

predict = model.predict(X) #預測 print u"預測準確度為：%f%%"%np.mean(np.float64(predict == y)*100)

輸出結果（在訓練集上的準確度）

（未完待續）