使用GridSearchCV（網格搜索），快速選擇超參數

04-28

在機器學習模型中，需要人工選擇的參數稱為超參數。比如隨機森林中決策樹的個數，人工神經網路模型中的隱藏層層數和每層的節點個數，正則項中常數大小等等，它們都需要事先指定。超參數選擇不恰當，就會出現欠擬合或者過擬合的問題。

我們在選擇超參數有兩個途徑：

1.憑經驗

2.選擇不同大小的參數，帶入到模型中，挑選表現最好的參數。

通過途徑2選擇超參數時，可以使用Python中的GridSearchCV方法，自動對輸入的參數進行排列組合，並一一測試，從中選出最優的一組參數。

from sklearn.model_selection import train_test_splitfrom sklearn import datasets, svmfrom sklearn.model_selection import GridSearchCVfrom sklearn.metrics import accuracy_score#使用sklearn庫中自帶的iris數據集作為示例iris = datasets.load_iris()X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=0) #分割數據集

設置參數調整的範圍及配置，這裡的參數都是人為指定的。用嵌套字典的列表的格式表示。

param_grid = [ {C: [1, 10, 100, 1000], kernel: [linear]}, {C: [1, 10, 100, 1000], gamma: [0.001, 0.0001], kernel: [rbf]},]

將超參數配置及模型放入GridSearch中進行自動搜索

svm_model = svm.SVC()clf = GridSearchCV(svm_model, param_grid, cv=5)clf.fit(X_train, y_train)

獲取選擇的最優模型

best_model = clf.best_estimator_

查看選擇的最優超參數配置

print(clf.best_params_)

預測

y_pred = best_model.predict(X_test)print(accuracy, accuracy_score(y_test, y_pred))