標籤：

自然語言處理

show and tell 代碼閱讀筆記

02-27

訓練階段

https://github.com/ruotianluo/ImageCaptioning.pytorch/blob/master/models/ShowTellModel.py#L49-L82github.com

fc_feats：圖片過resnet後的FC層的feature，[batch_size*5，2048]，其中5是一張圖有5個描述的文本

在第1個step，fc_feats經過Linear層成input [batch_size*5，512]，state全0

output, state = LSTM(input, state)

之後的第i個step，input變成文本的第i-1個詞的embedding [batch_size*5，512]，state是上一個step的state

output, state = LSTM(input, state)

每個output經過Linear層映射為 [batch_size*5，vocab_size]

預測階段（非beam_search的）

https://github.com/ruotianluo/ImageCaptioning.pytorch/blob/master/models/ShowTellModel.py#L121-L168github.com

第1個step，input為image的信息，state全0

output, state = LSTM(input, state)

第2個step，input為<begin of sentence>（全0信息），state為上一個step的state

output, state = LSTM(input, state)

第3個以後的step，input為上一個output取最大概率的詞的embedding，state為上一個step的state

output, state = LSTM(input, state)

最後返回的是每次output取最大概率的詞的list

推薦閱讀：

※為何讀不懂你的那個TA
※RNN基本模型匯總（deeplearning.ai）
※torchtext入門教程，輕鬆玩轉文本數據處理
※tf.nn.nce_loss 來自一篇古老的文章
※嶺回歸-嶺回歸

TAG:自然語言處理 |