時間序列與回歸預測(一)
圖 標準普爾綜合指數從1900年-2014年的指數變換(可以看到幾次金融危機之前的模型趨勢失效)
對於時間序列模式識別目前的方法主要涉及到兩個方向:一個叫做複雜系統,另外一個是機器學習。複雜系統是需要將數據擬合到已知的模型當中,比如古典的AR、MA、ARMA、ARIMA。而機器學習在用一類通用模型,例如神經網路,進行「暴力」擬合。時間序列是以時間為主自變數的函數,生活中有很多序列數據都屬於時間序列的範疇,比如股票指數、心腦電圖,甚至語音信號、草原某地的風速等等,都是有其內在特徵的變化。本文將會採用深度學習框架Keras對時間序列預測,數據採用美國國際航空公司1949年-1960年每個月的乘客量,一共144個點。實驗參考於 文章:http://machinelearningmastery.com/time-series-prediction-with-deep-learning-in-python-with-keras/
數據集名稱是international-airline-passengers.csv我們首先用pandas做一下可視化,查看一下import pandasimport matplotlib.pyplot as pltdataset = pandas.read_csv("international-airline-passengers.csv", usecols=[1], engine="python", skipfooter=3)plt.plot(dataset)plt.show()
從這張圖上可以看出來,這樣的時間序列具有很明顯的周期性和趨勢性。
接下來我們就開始回歸預測。
基於多層感知器MLP的時間序列回歸
首先我們要把數據從文件中提取出來,並以numpy.array的形式進行存儲。依賴安裝:numpy、scipy、pandas、keras(theano or tensorflow backend)
首先我們載入環境import numpyimport matplotlib.pyplot as pltimport pandasfrom keras.models import Sequentialfrom keras.layers import Dense
設定隨機種子數
# fix random seed for reproducibilitynumpy.random.seed(7)
載入數據集
# load the datasetdataframe = pandas.read_csv("international-airline-passengers.csv", usecols=[1], engine="python", skipfooter=3)dataset = dataframe.valuesdataset = dataset.astype("float32")
為了對模型訓練並驗證模型,所以採取交叉驗證的方式,訓練與驗證比例為2:1
# split into train and test setstrain_size = int(len(dataset) * 0.67)test_size = len(dataset) - train_sizetrain, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]print(len(train), len(test))
接下來,我們需要選取時間序列的特徵,比如像這種沒有其他特徵可以候選並具有周期性與趨勢性時間序列,那麼自身的歷史數據就是最好的特徵。
那麼取之前多長時間的數據作為預測輸入呢?我們姑且放下這個問題,把它作為參數來調節# convert an array of values into a dataset matrixdef create_dataset(dataset, look_back=1): dataX, dataY = [], for i in range(len(dataset)-look_back-1): a = dataset[ii+look_back), 0] dataX.append(a) dataY.append(dataset[i + look_back, 0]) return numpy.array(dataX), numpy.array(dataY)
其中look_back就是歷史數據長度,比如look_back=1,那麼實際上就是用D(t-1)來預測D(t),如果look_back=N,那麼自變數便為D(t-N),D(t-N+1),...,D(t-1)。
建立數據集# reshape into X=t and Y=t+1look_back = 1trainX, trainY = create_dataset(train, look_back)testX, testY = create_dataset(test, look_back)
我們建立多層感知器MLP模型,並且訓練模型
# create and fit Multilayer Perceptron modelmodel = Sequential()model.add(Dense(8, input_dim=look_back, activation="relu"))model.add(Dense(1))model.compile(loss="mean_squared_error", optimizer="adam")model.fit(trainX, trainY, nb_epoch=200, batch_size=2, verbose=2)
驗證我們模型訓練的效果
# Estimate model performancetrainScore = model.evaluate(trainX, trainY, verbose=0)print("Train Score: ", trainScore)testScore = model.evaluate(testX, testY, verbose=0)print("Test Score: ", testScore)
跑起來是這個樣子:
...Epoch 195/2000s - loss: 551.1626Epoch 196/2000s - loss: 542.7755Epoch 197/2000s - loss: 539.6731Epoch 198/2000s - loss: 539.1133Epoch 199/2000s - loss: 539.8144Epoch 200/2000s - loss: 539.8541("Train Score: ", 531.45189520653253)("Test Score: ", 2353.351849099864)
可能數值內容還不太能夠有效地顯示預測模型的效果,我們做一下可視化
# generate predictions for trainingtrainPredict = model.predict(trainX)testPredict = model.predict(testX)# shift train predictions for plottingtrainPredictPlot = numpy.empty_like(dataset)trainPredictPlot[:, :] = numpy.nantrainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict# shift test predictions for plottingtestPredictPlot = numpy.empty_like(dataset)testPredictPlot[:, :] = numpy.nantestPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict# plot baseline and predictionsplt.plot(dataset)plt.plot(trainPredictPlot)plt.plot(testPredictPlot)plt.show()
我們把它顯示在同一張圖上:
之後我們可以調節look_back、隱含層多少、隱含層節點多少來調節模型。如果將look_back調整成3,我們看看效果:最後來個完整版。# Multilayer Perceptron to Predict International Airline Passengers (t+1, given t)import numpyimport matplotlib.pyplot as pltimport pandasfrom keras.models import Sequentialfrom keras.layers import Dense# convert an array of values into a dataset matrixdef create_dataset(dataset, look_back=1): dataX, dataY = [], [ ] for i in range(len(dataset)-look_back-1): a = dataset[i:(i+look_back), 0] dataX.append(a) dataY.append(dataset[i + look_back, 0]) return numpy.array(dataX), numpy.array(dataY)# fix random seed for reproducibilitynumpy.random.seed(7)# load the datasetdataframe = pandas.read_csv("international-airline-passengers.csv", usecols=[1], engine="python", skipfooter=3)dataset = dataframe.valuesdataset = dataset.astype("float32")# split into train and test setstrain_size = int(len(dataset) * 0.67)test_size = len(dataset) - train_sizetrain, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]print(len(train), len(test))# reshape into X=t and Y=t+1look_back = 1trainX, trainY = create_dataset(train, look_back)testX, testY = create_dataset(test, look_back)# create and fit Multilayer Perceptron modelmodel = Sequential()model.add(Dense(8, input_dim=look_back, activation="relu"))model.add(Dense(1))model.compile(loss="mean_squared_error", optimizer="adam")model.fit(trainX, trainY, nb_epoch=200, batch_size=2, verbose=2)# Estimate model performancetrainScore = model.evaluate(trainX, trainY, verbose=0)print("Train Score: ", trainScore)testScore = model.evaluate(testX, testY, verbose=0)print("Test Score: ", testScore)# generate predictions for trainingtrainPredict = model.predict(trainX)testPredict = model.predict(testX)# shift train predictions for plottingtrainPredictPlot = numpy.empty_like(dataset)trainPredictPlot[:, :] = numpy.nantrainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict# shift test predictions for plottingtestPredictPlot = numpy.empty_like(dataset)testPredictPlot[:, :] = numpy.nantestPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict# plot baseline and predictionsplt.plot(dataset)plt.plot(trainPredictPlot)plt.plot(testPredictPlot)plt.show()
推薦閱讀:
※自己實現黑白圖片自動上色AI(一)
※keras中embedding層是如何將一個正整數(下標)轉化為具有固定大小的向量的?
※Keras使用VGG16訓練圖片分類?
※機器學習、深度學習入坑之路?
TAG:机器学习 | 深度学习DeepLearning | Keras |