python機器學習經典實例-學習筆記3-線性回歸

continuing。。。。。。。。。。。。。。

  • 創建嶺回歸器

線性回歸的主要問題是對異常值敏感。在真實世界的數據收集過程中,經常會遇到錯誤的度

量結果。而線性回歸使用的普通最小二乘法,其目標是使平方誤差最小化。這時,由於異常值誤差的絕對值很大,因此會引起問題,從而破壞整個模型。例如下圖

紅色圓圈所圈點為數據的奇異點

右下角的兩個數據點明顯是異常值,但是這個模型需要擬合所有的數據點,因此導致整個模

型都錯了。僅憑直覺觀察,我們就會覺得如下圖的擬合結果更好。

普通最小二乘法在建模時會考慮每個數據點的影響,因此,最終模型就會像上圖顯示的直線那樣。顯然,我們發現這個模型不是最優的。為了避免這個問題,我們引入正則化項的係數作為閾值來消除異常值的影響。這個方法被稱為嶺回歸。

# -*- coding: utf-8 -*-"""Created on Fri Mar 30 17:01:42 2018@author: imace"""import numpy as npX = []y = []with open(C://Users/imace/Desktop/python_execise/Chapter01/data_multivar.txt , r) as f: for line in f.readlines(): data = [float(i) for i in line.split(,)] xt, yt = data[:-1], data[-1] X.append(xt) y.append(yt)# Train/test splitnum_training = int(0.8 * len(X))num_test = len(X) - num_training# Training data#X_train = np.array(X[:num_training]).reshape((num_training,1))X_train = np.array(X[:num_training])y_train = np.array(y[:num_training])# Test data#X_test = np.array(X[num_training:]).reshape((num_test,1))X_test = np.array(X[num_training:])y_test = np.array(y[num_training:])# Create linear regression objectfrom sklearn import linear_modellinear_regressor = linear_model.LinearRegression()ridge_regressor = linear_model.Ridge(alpha=0.01, fit_intercept=True, max_iter=10000)# Train the model using the training setslinear_regressor.fit(X_train, y_train)ridge_regressor.fit(X_train, y_train)# Predict the outputy_test_pred = linear_regressor.predict(X_test)y_test_pred_ridge = ridge_regressor.predict(X_test)# Measure performanceimport sklearn.metrics as smprint ("LINEAR:")print ("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2) )print ("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2) )print ("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2) )print ("Explained variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2) )print ("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))print ("
RIDGE:")print ("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred_ridge), 2) )print ("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred_ridge), 2) )print ("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred_ridge), 2) )print ("Explained variance score =", round(sm.explained_variance_score(y_test, y_test_pred_ridge), 2) )print ("R2 score =", round(sm.r2_score(y_test, y_test_pred_ridge), 2))# Polynomial regressionfrom sklearn.preprocessing import PolynomialFeaturespolynomial = PolynomialFeatures(degree=10)X_train_transformed = polynomial.fit_transform(X_train)datapoint = np.array([0.39,2.78,7.11]).reshape(1,-1)poly_datapoint = polynomial.fit_transform(datapoint)poly_linear_model = linear_model.LinearRegression()poly_linear_model.fit(X_train_transformed, y_train)print ("
Linear regression:
", linear_regressor.predict(datapoint))print ("
Polynomial regression:
", poly_linear_model.predict(poly_datapoint))# Stochastic Gradient Descent regressorsgd_regressor = linear_model.SGDRegressor(loss=huber, n_iter=50)sgd_regressor.fit(X_train, y_train)print ("
SGD regressor:
", sgd_regressor.predict(datapoint))

補充:

atapoint = np.array([0.39,2.78,7.11]).reshape(1,-1)print(datapoint)

輸出一行

[[0.39 2.78 7.11]]

datapoint = np.array([0.39,2.78,7.11]).reshape(-1,1)print(datapoint)

輸出一列

[[0.39] [2.78] [7.11]]

程序輸出結果為

LINEAR:Mean absolute error = 3.95Mean squared error = 23.15Median absolute error = 3.69Explained variance score = 0.84R2 score = 0.83RIDGE:Mean absolute error = 3.95Mean squared error = 23.15Median absolute error = 3.69Explained variance score = 0.84R2 score = 0.83Linear regression: [-11.0587295]Polynomial regression: [-8.14917282]SGD regressor: [-7.93671074]

從上邊的結果來看兩種演算法所得的結果一樣,即線性回歸和嶺回歸演算法所得的結果一樣。

推薦閱讀:

day10-解數獨
python爬蟲學習(一)分析ajax請求
如何理解python大法好?
Python 在電氣工程及其自動化上有哪些應用?
定位後端開發,有哪些書籍值得推薦?

TAG:機器學習 | Python |