Python實現線性回歸

04-30

Python實現線性回歸

完整代碼與數據見：https://github.com/xiaohuzai/ML-Excercise.git

導入數據

import numpy as npdata = np.loadtxt(linear_regression_data1.txt,delimiter=,)X = np.c_[np.ones(data.shape[0]),data[:,0]]y = np.c_[data[:,1]]

data:

6.1101 17.592

5.5277 9.1302

8.5186 13.662

7.0032 11.854

5.8598 6.8233

8.3829 11.886

7.4764 4.3483

8.5781 12

6.4862 6.5987

5.0546 3.8166......

1 6.1101

1 5.5277

1 8.5186

1 7.0032

1 5.8598

1 8.3829

1 7.4764

1 8.5781

1 6.4862

1 5.0546......

17.592

9.1302

13.662

11.854

6.8233

11.886

4.3483

6.5987

3.8166......

顯示數據

import matplotlib.pyplot as pltplt.scatter(X[:,1], y, s=30, c=r, marker=x, linewidths=1)plt.xlim(4,24)plt.xlabel(Population of City in 10,000s)plt.ylabel(Profit in $10,000s)

計算損失函數

# theta默認值為[0,0]Tdef computerCost（X,y,theta=[[0],[0]]）: m = y.size J = 0 # X點乘theta h = X.dot(theta) J = 1.0/(2*m)*(np.sum(np.sqaure(h-y)))# theta默認值為[0,0]T時損失函數的值computerCost(X,y)Out[15]: 32.072733877455676

梯度下降函數

# 默認迭代次數為1500次，學習率alfa取0.01def gradientDescent(X, y, theta=[[0],[0]], alpha=0.01, num_iters=1500): m = y.size # J_history用來保存每一次迭代後損失函數J的值 J_history = np.zeros(num_iters) # 迭代的過程 for iter in np.arange(num_iters): h = X.dot(theta) theta = theta - alpha*(1.0/m)*(X.T.dot(h-y)) J_history[iter] = computeCost(X, y, theta) return(theta, J_history)# 計算迭代1500次以後theta的值theta , Cost_J = gradientDescent(X, y)thetaOut[27]: array([[-3.63029144], [ 1.16636235]])# 即theta0 = -3.63029144，theta1 = 1.16636235# 畫出每一次迭代和損失函數變化plt.plot(Cost_J)plt.ylabel(Cost J)plt.xlabel(Iterations)

可見在迭代了1500次以後，損失函數J的值趨近收斂。

畫圖

# 畫出我們自己寫的線性回歸圖xx = np.arange(5,23)yy = theta[0]+theta[1]*xxplt.scatter(X[:,1], y, s=30, c=r, marker=x, linewidths=1)plt.plot(xx,yy, label=Linear regression (Gradient descent))

# 和Scikit-learn中的線性回歸對比一下 from sklearn.linear_model import LinearRegression# 我們自己梯度下降計算得到的線性回歸xx = np.arange(5,23)yy = theta[0]+theta[1]*xxplt.scatter(X[:,1], y, s=30, c=r, marker=x, linewidths=1)plt.plot(xx,yy, label=Linear regression (Gradient descent))# 使用Scikit計算得到的線性回歸regr = LinearRegression()regr.fit(X[:,1].reshape(-1,1), y.ravel())plt.plot(xx, regr.intercept_+regr.coef_*xx, label=Linear regression (Scikit-learn GLM))plt.xlim(4,24)plt.xlabel(Population of City in 10,000s)plt.ylabel(Profit in $10,000s)plt.legend(loc=4)

可以看出兩者基本重合。

預測

使用我們計算得到的線性回歸模型，預測一下人口為35000和70000的城市的結果。

print(theta.T.dot([1, 3.5])*10000)[ 4519.7678677]print(theta.T.dot([1, 7])*10000)[ 45342.45012945]