有趣的應用 | 使用RNN預測股票價格系列一
作者: readilen
原文鏈接:https://www.jianshu.com/p/a2ceb69c98a6查看更多的專業文章,請移步至「人工智慧LeadAI」公眾號,查看更多的課程信息和產品信息,請移步至全新打造的官網:www.leadai.org.
正文共11490個字,16張圖,預計閱讀時間:29分鐘。
01
概述
我們將解釋如何建立一個有LSTM單元的RNN模型來預測S&P500指數的價格。 數據集可以從Yahoo!下載。 在例子中,使用了從1950年1月3日(Yahoo! Finance可以追溯到的最大日期)的S&P 500數據到2017年6月23日。 為了簡單起見,我們只使用每日收盤價進行預測。 同時,我將演示如何使用TensorBoard輕鬆調試和模型跟蹤。
02
關於RNN和LSTM
RNN的目的使用來處理序列數據。在傳統的神經網路模型中,是從輸入層到隱含層再到輸出層,層與層之間是全連接的,每層之間的節點是無連接的。但是這種普通的神經網路對於很多問題卻無能無力。例如,你要預測句子的下一個單詞是什麼,一般需要用到前面的單詞,因為一個句子中前後單詞並不是獨立的。
RNN之所以稱為循環神經網路,即一個序列當前的輸出與前面的輸出也有關。具體的表現形式為網路會對前面的信息進行記憶並應用於當前輸出的計算中,即隱藏層之間的節點不再無連接而是有連接的,並且隱藏層的輸入不僅包括輸入層的輸出還包括上一時刻隱藏層的輸出。理論上,RNN能夠對任何長度的序列數據進行處理。
Long Short Term 網路,一般就叫做 LSTM,是一種 RNN 特殊的類型,LSTM區別於RNN的地方,主要就在於它在演算法中加入了一個判斷信息有用與否的「處理器」,這個處理器作用的結構被稱為cell。一個cell當中被放置了三扇門,分別叫做輸入門、遺忘門和輸出門。一個信息進入LSTM的網路當中,可以根據規則來判斷是否有用。
只有符合演算法認證的信息才會留下,不符的信息則通過遺忘門被遺忘。說起來無非就是一進二出的工作原理,卻可以在反覆運算下解決神經網路中長期存在的大問題。目前已經證明,LSTM是解決長序依賴問題的有效技術,並且這種技術的普適性非常高,導致帶來的可能性變化非常多。各研究者根據LSTM紛紛提出了自己的變數版本,這就讓LSTM可以處理千變萬化的垂直問題。
數據準備
股票價格是長度為NN,定義為p0,p1,...,pN-1,其中pi是第i天的收盤價,0≤i<N。 我們有一個大小固定的移動窗口w(後面我們將其稱為input_size),每次我們將窗口向右移動w個單位,以使所有移動窗口中的數據之間沒有重疊。
我們使用一個移動窗口中的內容來預測下一個,而在兩個連續的窗口之間沒有重疊。
我們將建立RNN模型將LSTM單元作為基本的隱藏單元。 我們使用此值從時間t內將第一個移動窗口W0移動到窗口Wt:
預測價格在下一個窗口在Wt+1
我們試圖學習一個近似函數,
展開的RNN
考慮反向傳播(BPTT)是如何工作的,我們通常將RNN訓練成一個「unrolled」的樣式,這樣我們就不需要做太多的傳播計算,而且可以節省訓練的複雜性。
以下是關於Tensorflow教程中input_size的解釋:
By design, the output of a recurrent neural network (RNN) depends on arbitrarily distant inputs. Unfortunately, this makes backpropagation computation difficult. In order to make the learning process tractable, it is common practice to create an 「unrolled」 version of the network, which contains a fixed number (num_steps) of LSTM inputs and outputs. The model is then trained on this finite approximation of the RNN. This can be implemented by feeding inputs of length num_steps at a time and performing a backward pass after each such input block.
價格的順序首先被分成不重疊的小窗口。 每個窗口都包含input_size數字,每個數字被視為一個獨立的輸入元素。 然後,任何num_steps連續的輸入元素被分配到一個訓練輸入中,形成一個訓練
在Tensorfow上的「unrolled」版本的RNN。 相應的標籤就是它們後面的輸入元素。
例如,如果input_size = 3和num_steps = 2,我們的第一批的訓練樣例如下所示:
以下是數據格式化的關鍵部分:
seq = [np.array(seq[i * self.input_size: (i + 1) * self.input_size])
for i in range(len(seq) // self.input_size)]
# Split into groups of `num_steps`X = np.array([seq[i: i + self.num_steps] for i in range(len(seq) - self.num_steps)])y = np.array([seq[i + self.num_steps] for i in range(len(seq) - self.num_steps)])培訓/測試拆分
由於我們總是想預測未來,我們以最新的10%的數據作為測試數據。
正則化
標準普爾500指數隨著時間的推移而增加,導致測試集中大部分數值超出訓練集的範圍,因此模型必須預測一些以前從未見過的數字。 但這卻不是很理想。
為了解決樣本外的問題,我們在每個移動窗口中對價格進行了標準化。 任務變成預測相對變化率而不是絕對值。 在t時刻的標準化滑動窗口Wt中,所有的值除以最後一個未知價格 Wt-1中的最後一個價格:
建立模型
定義
- lstm_size:一個LSTM圖層中的單元數。
- num_layers:堆疊的LSTM層的數量。
- keep_prob:單元格在 dropout 操作中保留的百分比。
- init_learning_rate:開始學習的速率。
- learning_rate_decay:後期訓練時期的衰減率。
- init_epoch:使用常量init_learning_rate的時期數。
- max_epoch:訓練次數在訓練中的總數
- input_size:移動窗口的大小/一個訓練數據點
- batch_size:在一個小批量中使用的數據點的數量。
The LSTM model has num_layers stacked LSTM layer(s) and each layer contains lstm_sizenumber of LSTM cells. Then a dropout mask with keep probability keep_prob is applied to the output of every LSTM cell. The goal of dropout is to remove the potential strong dependency on one dimension so as to prevent overfitting.
*The training requires max_epoch epochs in total; an epoch is a single full pass of all the training data points. In one epoch, the training data points are split into mini-batches of size batch_size. We send one mini-batch to the model for one BPTT learning. The learning rate is set to init_learning_rate during the first init_epoch epochs and then decayby learning_rate_decay during every succeeding epoch.*
# Configuration is wrapped in one object for easy tracking and passing.
class RNNConfig(): input_size=1 num_steps=30lstm_size=128
num_layers=1 keep_prob=0.8 batch_size = 64 init_learning_rate = 0.001 learning_rate_decay = 0.99 init_epoch = 5 max_epoch = 50config = RNNConfig()定義圖形
(1) Initialize a new graph first.
import tensorflow as tf
tf.reset_default_graph()lstm_graph = tf.Graph()(2) How the graph works should be defined within its scope.
with lstm_graph.as_default():
(3) Define the data required for computation. Here we need three input variables, all defined as
tf.placeholder
because we don』t know what they are at the graph construction stage.
- inputs:
the training data X, a tensor of shape (# data examples, num_steps, input_size); the number of data examples is unknown, so it is None. In our case, it would be batch_sizein training session. Check the input format example if confused.
- targets:the training label y, a tensor of shape (# data examples, input_size).
- learning_rate:a simple float.
# Dimension = (
# number of data examples, # number of input in one computation step, # number of numbers in one input # ) # We dont know the number of examples beforehand, so it is None. inputs = tf.placeholder(tf.float32, [None, config.num_steps, config.input_size]) targets = tf.placeholder(tf.float32, [None, config.input_size])learning_rate = tf.placeholder(tf.float32, None)
(4) This function returns one
LSTMCell
with or without dropout operation.
def _create_one_cell():
return tf.contrib.rnn.LSTMCell(config.lstm_size, state_is_tuple=True) if config.keep_prob < 1.0: return tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=config.keep_prob)(5) Let』s stack the cells into multiple layers if needed.
MultiRNNCell
helps connect sequentially multiple simple cells to compose one cell.
cell = tf.contrib.rnn.MultiRNNCell(
[_create_one_cell() for _ in range(config.num_layers)], state_is_tuple=True ) if config.num_layers > 1 else _create_one_cell()(6)tf.nn.dynamic_rnn
constructs a recurrent neural network specified by cell (RNNCell). It returns a pair of (model outpus, state), where the outputs val is of size (batch_size, num_steps, lstm_size) by default. The state refers to the current state of the LSTM cell, not consumed here.val, _ = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
(7)tf.transpose
converts the outputs from the dimension (batch_size, num_steps, lstm_size) to (num_steps, batch_size, lstm_size). Then the last output is picked.
# Before transpose, val.get_shape() = (batch_size, num_steps, lstm_size)
# After transpose, val.get_shape() = (num_steps, batch_size, lstm_size)val = tf.transpose(val, [1, 0, 2])# last.get_shape() = (batch_size, lstm_size)ast = tf.gather(val, int(val.get_shape()[0]) - 1, name="last_lstm_output")(8) Define weights and biases between the hidden and output layers.
weight = tf.Variable(tf.truncated_normal([config.lstm_size, config.input_size]))
bias = tf.Variable(tf.constant(0.1, shape=[targets_width]))prediction = tf.matmul(last, weight) + bias(9) We use mean square error as the loss metric and the RMSPropOptimizer algorithm for gradient descent optimization.
loss = tf.reduce_mean(tf.square(prediction - targets))
optimizer = tf.train.RMSPropOptimizer(learning_rate)minimize = optimizer.minimize(loss)開始訓練過程
(1) To start training the graph with real data, we need to start a tf.session
first.
with tf.Session(graph=lstm_graph) as sess:
(2) Initialize the variables as defined.
tf.global_variables_initializer().run()
(0) The learning rates for training epochs should have been precomputed beforehand. The index refers to the epoch index.
learning_rates_to_use = [
config.init_learning_rate * ( config.learning_rate_decay ** max(float(i + 1 - config.init_epoch), 0.0) ) for i in range(config.max_epoch)](3) Each loop below completes one epoch training.
for epoch_step in range(config.max_epoch):
current_lr = learning_rates_to_use[epoch_step] # Check https://github.com/lilianweng/stock-rnn/blob/master/data_wrapper.py # if you are curious to know what is StockDataSet and how generate_one_epoch() # is implemented. for batch_X, batch_y in stock_dataset.generate_one_epoch(config.batch_size): train_data_feed = { nputs: batch_X, targets: batch_y, learning_rate: current_lr } train_loss, _ = sess.run([loss, minimize], train_data_feed)(4) Don』t forget to save your trained model at the end.
saver.save(sess, "your_awesome_model_path_and_name", global_step=max_epoch_step)
使用TensorBoard
在沒有可視化的情況下構建圖形就像在黑暗中繪製,非常模糊和容易出錯。 Tensorboard提供了圖形結構和學習過程的簡單可視化。 看看下面這個案例,非常實用:
Brief Summary
- Use with [tf.name_scope](https://www.tensorflow.org/api_docs/python/tf/name_scope)("your_awesome_module_name"): to wrap elements working on the similar goal together.
- Many tf.*methods acceptsname=argument. Assigning a customized name can make your life much easier when reading the graph.
- Methods liketf.summary.scalarandtf.summary.histogramhelp track the values of variables in the graph during iterations.
- In the training session, define a log file usingtf.summary.FileWriter.
with tf.Session(graph=lstm_graph)
as sess: merged_summary = tf.summary.merge_all() writer = tf.summary.FileWriter("location_for_keeping_your_log_files", sess.graph) writer.add_graph(sess.graph)Later, write the training progress and summary results into the file.
_summary = sess.run([merged_summary], test_data_feed) writer.add_summary(_summary, global_step=epoch_step)
# epoch_step in range(config.max_epoch)結果
我們在例子中使用了以下配置。
num_layers=1
keep_prob=0.8batch_size = 64init_learning_rate = 0.001learning_rate_decay = 0.99init_epoch = 5max_epoch = 100num_steps=30總的來說預測股價並不是一件容易的事情。 特別是在正則化後,價格趨勢看起來非常嘈雜。
測試數據中最近200天的預測結果。 模型是用 input_size= 1 和 lstm_size= 32 來訓練的。
image.png
測試數據中最近200天的預測結果。 模型是用 input_size= 1 和 lstm_size= 128 來訓練的。
image.png
測試數據中最近200天的預測結果。 模型是用 input_size= 5 和 lstm_size= 128 來訓練的。
image.png
代碼:
stock-rnn/main.py
import os
import pandas as pdimport pprintimport tensorflow as tfimport tensorflow.contrib.slim as slim from data_model import StockDataSetfrom model_rnn import LstmRNN flags = tf.app.flags flags.DEFINE_integer("stock_count", 100, "Stock count [100]") flags.DEFINE_integer("input_size", 5, "Input size [5]") flags.DEFINE_integer("num_steps", 30, "Num of steps [30]") flags.DEFINE_integer("num_layers", 1, "Num of layer [1]") flags.DEFINE_integer("lstm_size", 128, "Size of one LSTM cell [128]") flags.DEFINE_integer("batch_size", 64, "The size of batch images [64]") flags.DEFINE_float("keep_prob", 0.8, "Keep probability of dropout layer. [0.8]") flags.DEFINE_float("init_learning_rate", 0.001, "Initial learning rate at early stage. [0.001]")flags.DEFINE_float("learning_rate_decay", 0.99, "Decay rate of learning rate. [0.99]") flags.DEFINE_integer("init_epoch", 5, "Num. of epoches considered as early stage. [5]")flags.DEFINE_integer("max_epoch", 50, "Total training epoches. [50]") flags.DEFINE_integer("embed_size", None, "If provided, use embedding vector of this size. [None]")flags.DEFINE_string("stock_symbol", None, "Target stock symbol [None]") flags.DEFINE_integer("sample_size", 4, "Number of stocks to plot during training. [4]") flags.DEFINE_boolean("train", False, "True for training, False for testing [False]") FLAGS = flags.FLAGS pp = pprint.PrettyPrinter() if not os.path.exists("logs"): os.mkdir("logs")def show_all_variables(): model_vars = tf.trainable_variables() slim.model_analyzer.analyze_vars(model_vars, print_info=True) def load_sp500(input_size, num_steps, k=None, target_symbol=None, test_ratio=0.05): if target_symbol is not None: return [ StockDataSet( target_symbol, input_size=input_size, num_steps=num_steps, test_ratio=test_ratio) ] # Load metadata of s & p 500 stocks info = pd.read_csv("data/constituents-financials.csv") info = info.rename(columns={col: col.lower().replace( , _) for col in info.columns}) info[file_exists] = info[symbol].map(lambda x: os.path.exists("data/{}.csv".format(x))) print info[file_exists].value_counts().to_dict() info = info[info[file_exists] == True].reset_index(drop=True) info = info.sort(market_cap, ascending=False).reset_index(drop=True) if k is not None: info = info.head(k) print "Head of S&P 500 info:n", info.head() # Generate embedding meta file info[[symbol, sector]].to_csv(os.path.join("logs/metadata.tsv"), sep=t, index=False) return [ StockDataSet(row[symbol], input_size=input_size, num_steps=num_steps, test_ratio=0.05) for _, row in info.iterrows()] def main(_): pp.pprint(flags.FLAGS.__flags) # gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) run_config = tf.ConfigProto() run_config.gpu_options.allow_growth = True with tf.Session(config=run_config) as sess: rnn_model = LstmRNN(sess,
FLAGS.stock_count, lstm_size=FLAGS.lstm_size, num_layers=FLAGS.num_layers, num_steps=FLAGS.num_steps, input_size=FLAGS.input_size, keep_prob=FLAGS.keep_prob, embed_size=FLAGS.embed_size, ) show_all_variables() stock_data_list = load_sp500( FLAGS.input_size, FLAGS.num_steps, k=FLAGS.stock_count, target_symbol=FLAGS.stock_symbol, ) if FLAGS.train: rnn_model.train(stock_data_list, FLAGS) else: if not rnn_model.load()[0]: raise Exception("[!] Train a model first, then run test mode")if __name__ == __main__: tf.app.run()
推薦閱讀:
※開源代碼「All in One」:6 份最新「Paper + Code」等你復現 | PaperDaily #12
※論文 | 深度學習實現目標跟蹤
※循環神經網路RNN介紹1:什麼是RNN、為什麼需要RNN、前後向傳播詳解、Keras實現
※這 8 份「Paper + Code」,你一定用得上 | PaperDaily #08
※循環神經網路(RNN)介紹3:RNN的反向傳播演算法Backpropagation Through Time (BPTT)
TAG:RNN |