Neural Machine Translation with Word Predictions 閱讀筆記

03-02

idea：seq2seq做機器翻譯同時預測一個小辭彙表，

和多任務學習很像

Rnn-based encoder-decoder approach with word frequency estimation 這篇文章也是要預測詞表，但不同於此文

因為詞表預測過去的研究主要是決定詞的選擇和目標詞表的限定，而此文將詞表預測作為模型訓練的控制機制

Word Prediction for the Initial State

注意到encoder給decoder的initial state理論上是包含所有詞表信息的

為initial state的詞表預測也可看做是對encoder的提升，用這個表示：

然後assume each target word is independent of each other——這個就表明是個辭彙集合的意思吧

其中y是目標詞序列

f是FC後加softmax

t是FC後加tanh

c是attention相關

decoder的hidden states也可以用來預測辭彙表

The only difference is that we remove the already generated words from the prediction task

公式：

這兩個loss和NMT原本的loss相加