Question Answer Matching
來自專欄 自然語言處理論文閱讀筆記
Title(2016)
Improved Representation Learning for Question Answer Matching
Introduction
defined : given a question and a pool of candidate passages, select the passages that contain the correct answer.
challenge : lies in the complex and versatile semantic relations observed between questions and passage answers.
- factoid QA may be largely cast as a textual entailment problem.
- what makes an answer better than another in the real world for non-factoid QA often depends on many factors.
- good answer must relate to the question, they often do not share common lexical units.
- the system should be capable of capturing the nuances between the best answer and an acceptable one.
- task is usually approached as a pairwise-ranking problem.
- propose two independent models, Convolutional-pooling LSTM and Convolution-based LSTM.
- introduce an effective attention mechanism to generate answer representations according to the question.
Related work
- the answer selection problem was transformed to a syntactical matching between the question/answer parse trees.
- Such methods might suffer from the availability of additional resources, the effort of feature engineering and the systematic.
- the task can be converted into a classification or ranking problem.
- the question and answer representations can be learned and then matched by certain similarity metrics.
Approaches
QA-LSTM
generate a fixed-sized distributed vector representations using one of the following three approaches.
- the concatenation of the last vectors on both directions of the biLSTM.
- average pooling over all the output vectors of the biLSTM.
- max pooling over all the output vectors.
cosine similarity sim(q, a) to score the input (q, a) pair.
During training, for each question randomly sample K negative answers, but
only use the one with the highest L to update the model.
Convolutional LSTMs
- LSTM keep the useful information from long range dependency. But the strength has a trade off effect of ignoring the local n-gram coherence.
- No long-range dependencies are taken into account during the formulation of convolution vectors
Convolutional-pooling LSTMs
- replace the simple pooling layers (average/maxpooling) by a convolutional layer.
Convolution-based LSTMs
- capture the local n-gram interaction at the lower level using a convolution.
- At the higher level, bidirectional LSTMs, which extract the long range dependency based on convoluted n-gram.
- After the biLSTM step, maxpooling over the biLSTM output vectors to obtain the representations of both q and a.
Attentive LSTMs
- The answers might belong and contain lots of words that are not related to the question at hand.
- biLSTM output vector is multiplied by a softmax weight, which is determined by the question representation from biLSTM.
Experiment:
推薦閱讀:
※重磅譯制 | 更新:牛津大學xDeepMind自然語言處理 第9講(下)語音模型
※Deliberation Networks 閱讀筆記
※第三章 自然語言理解的技術分類及比較
※【機器閱讀理解】Fast and Accurate Reading Comprehension by Combining Self-Attention and Convolution
※[本人原創]思維(意識)整體論圖Graph假設(記憶-抽象實體化,聯結記憶(工作記憶)-邊關係化)
TAG:自然語言處理 |