python3機器學習經典實例-第三章預測模型11
實例1-建立事件預測器
接下來把所學的知識用於解決真實世界的問題。我們將建立一個SVM來預測一棟大樓進出樓門的人數。該數據集可以在 CalIt2 Building People Counts Data Set 下載。我們將對數據集稍做調整,以便簡化分析過程。調整過的數據集存放在building_event_binary.txt文件和building_event_multiclass.txt文件中。
修改內容
出現的錯誤類型:class_weight must be dict, balanced, or None, got: auto修改:params = {kernel: rbf, probability: True, class_weight: balanced} 出現的錯誤類型:bad input shape ()修改:經過逐步分析,發現:input_data[i] 的值是單一字元串,但是transform方法中的參數需要列表格式,所以改成:[input_data[i]]錯誤類型:Expected 2D array, got 1D array instead:修改:input_data_encoded = np.array(input_data_encoded).reshape(1, -1)print後面添加括弧
具體步驟代碼
- 數據包和載入數據
import numpy as npfrom sklearn import preprocessingfrom sklearn.svm import SVCinput_file = building_event_binary.txt#input_file = building_event_multiclass.txt# Reading the dataX = []count = 0with open(input_file, r) as f: for line in f.readlines(): data = line[:-1].split(,) X.append([data[0]] + data[2:])X = np.array(X)# Convert string data to numerical datalabel_encoder = [] X_encoded = np.empty(X.shape)for i,item in enumerate(X[0]): if item.isdigit(): X_encoded[:, i] = X[:, i] else: label_encoder.append(preprocessing.LabelEncoder()) X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])X = X_encoded[:, :-1].astype(int)y = X_encoded[:, -1].astype(int)
- 建立SVM分類器
# Build SVMparams = {kernel: rbf, probability: True, class_weight: balanced} classifier = SVC(**params)classifier.fit(X, y)
- 進行交叉驗證
# Cross validationfrom sklearn import cross_validationaccuracy = cross_validation.cross_val_score(classifier, X, y, scoring=accuracy, cv=3)print ("Accuracy of the classifier: " + str(round(100*accuracy.mean(), 2)) + "%")
- 用一個新的數據點測試SVM:
# Testing encoding on single data instanceinput_data = [Tuesday, 12:30:00,21,23]input_data_encoded = [-1] * len(input_data)count = 0for i,item in enumerate(input_data): if item.isdigit(): input_data_encoded[i] = int(input_data[i]) else: input_data_encoded[i] = int(label_encoder[count].transform([input_data[i]])) count = count + 1 input_data_encoded = np.array(input_data_encoded).reshape(1, -1)# Predict and print output for a particular datapointoutput_class = classifier.predict(input_data_encoded)print ("Output class:", label_encoder[-1].inverse_transform(output_class)[0])
運行輸出的結果卻和書本上的不一樣
Accuracy of the classifier: 93.95%Output class: noevent
如果用building_event_multiclass.txt文件代替building_event_binary.txt文件作為輸入數據文件,:
#input_file = building_event_binary.txtinput_file = building_event_multiclass.txt
可以在命令行工具中看到以下結果(與書上的結果一樣)
Accuracy of the classifier: 65.33%Output class: eventA
推薦閱讀:
※讓我們一起來學習CNTK吧
※【機器學習】監督學習技巧整理概述
※激活函數的實現與梯度檢查<五>
※機器學習在工業機器人領域有哪些應用?
※Convex Formulation for Learning from Positive and Unlabeled Data