Question Answer Matching

05-26

來自專欄自然語言處理論文閱讀筆記

Title(2016)

Improved Representation Learning for Question Answer Matching

Introduction

defined : given a question and a pool of candidate passages, select the passages that contain the correct answer.

challenge : lies in the complex and versatile semantic relations observed between questions and passage answers.

factoid QA may be largely cast as a textual entailment problem.
what makes an answer better than another in the real world for non-factoid QA often depends on many factors.
good answer must relate to the question, they often do not share common lexical units.
the system should be capable of capturing the nuances between the best answer and an acceptable one.
task is usually approached as a pairwise-ranking problem.
propose two independent models, Convolutional-pooling LSTM and Convolution-based LSTM.
introduce an effective attention mechanism to generate answer representations according to the question.

An example of a question with the ground-truth answer and a negative answer extracted from the InsuranceQA dataset

Related work

the answer selection problem was transformed to a syntactical matching between the question/answer parse trees.

Such methods might suffer from the availability of additional resources, the effort of feature engineering and the systematic.

the question and answer representations can be learned and then matched by certain similarity metrics.

Approaches

QA-LSTM

Basic Model: QA-LSTM

generate a fixed-sized distributed vector representations using one of the following three approaches.

cosine similarity sim(q, a) to score the input (q, a) pair.

During training, for each question randomly sample K negative answers, but

only use the one with the highest L to update the model.

Convolutional LSTMs

LSTM keep the useful information from long range dependency. But the strength has a trade off effect of ignoring the local n-gram coherence.
No long-range dependencies are taken into account during the formulation of convolution vectors

Convolutional-pooling LSTMs

replace the simple pooling layers (average/maxpooling) by a convolutional layer.

Convolutional-pooling LSTM

Convolution-based LSTMs

capture the local n-gram interaction at the lower level using a convolution.
At the higher level, bidirectional LSTMs, which extract the long range dependency based on convoluted n-gram.
After the biLSTM step, maxpooling over the biLSTM output vectors to obtain the representations of both q and a.

Convolution-based LSTM

Attentive LSTMs

The answers might belong and contain lots of words that are not related to the question at hand.
biLSTM output vector is multiplied by a softmax weight, which is determined by the question representation from biLSTM.

Attentive LSTM

Experiment:

The experimental results of InsuranceQA. the negative answer count K

The test set results on TREC-QA