Question Answer Matching

Question Answer Matching

來自專欄 自然語言處理論文閱讀筆記

Title(2016)

Improved Representation Learning for Question Answer Matching

Introduction

defined : given a question and a pool of candidate passages, select the passages that contain the correct answer.

challenge : lies in the complex and versatile semantic relations observed between questions and passage answers.

  1. factoid QA may be largely cast as a textual entailment problem.
  2. what makes an answer better than another in the real world for non-factoid QA often depends on many factors.
  3. good answer must relate to the question, they often do not share common lexical units.
  4. the system should be capable of capturing the nuances between the best answer and an acceptable one.
  5. task is usually approached as a pairwise-ranking problem.
  6. propose two independent models, Convolutional-pooling LSTM and Convolution-based LSTM.
  7. introduce an effective attention mechanism to generate answer representations according to the question.

An example of a question with the ground-truth answer and a negative answer extracted from the InsuranceQA dataset

Related work

  1. the answer selection problem was transformed to a syntactical matching between the question/answer parse trees.
  • Such methods might suffer from the availability of additional resources, the effort of feature engineering and the systematic.
  1. the task can be converted into a classification or ranking problem.
  • the question and answer representations can be learned and then matched by certain similarity metrics.

Approaches

QA-LSTM

Basic Model: QA-LSTM

generate a fixed-sized distributed vector representations using one of the following three approaches.

  • the concatenation of the last vectors on both directions of the biLSTM.
  • average pooling over all the output vectors of the biLSTM.
  • max pooling over all the output vectors.

cosine similarity sim(q, a) to score the input (q, a) pair.

During training, for each question randomly sample K negative answers, but

only use the one with the highest L to update the model.

Convolutional LSTMs

  1. LSTM keep the useful information from long range dependency. But the strength has a trade off effect of ignoring the local n-gram coherence.
  2. No long-range dependencies are taken into account during the formulation of convolution vectors

Convolutional-pooling LSTMs

  1. replace the simple pooling layers (average/maxpooling) by a convolutional layer.

Convolutional-pooling LSTM

Convolution-based LSTMs

  1. capture the local n-gram interaction at the lower level using a convolution.
  2. At the higher level, bidirectional LSTMs, which extract the long range dependency based on convoluted n-gram.
  3. After the biLSTM step, maxpooling over the biLSTM output vectors to obtain the representations of both q and a.

Convolution-based LSTM

Attentive LSTMs

  1. The answers might belong and contain lots of words that are not related to the question at hand.
  2. biLSTM output vector is multiplied by a softmax weight, which is determined by the question representation from biLSTM.

Attentive LSTM

Experiment:

The experimental results of InsuranceQA. the negative answer count K

The test set results on TREC-QA


推薦閱讀:

重磅譯制 | 更新:牛津大學xDeepMind自然語言處理 第9講(下)語音模型
Deliberation Networks 閱讀筆記
第三章 自然語言理解的技術分類及比較
【機器閱讀理解】Fast and Accurate Reading Comprehension by Combining Self-Attention and Convolution
[本人原創]思維(意識)整體論圖Graph假設(記憶-抽象實體化,聯結記憶(工作記憶)-邊關係化)

TAG:自然語言處理 |