Text Summarization

08-16

Text Summarization

來自專欄自然語言處理論文閱讀筆記

標題(2017):

Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization

概要：

Although generated summaries are similar to source texts literally, they have low semantic relevance.
We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries.

引入：

An example of RNN generated summary. It has high similarity to the text literally, but low semantic relevance.

In this work, our goal is to improve the semantic relevance between source texts and generated summaries for Chinese social media text summarization.

During training, it maximizes the similarity score.

The representation of source texts is produced by an encoder, while that of summaries is computed by a decoder.

背景：

Current Chinese social media text summarization model is based on encoder-decoder framework.

Encoder-decoder model is able to compress source texts x into continuous vector representation with an encoder, and then generate the summary y.

f is recurrent neural network output function, and s_0 is the last hidden state of encoder h_N

Attention mechanism is introduced to better capture context information of source

texts:

g(s_t , h_i ) is a relevant score between decoder hidden state s t and encoder hidden state h_i

When predicting an output word, the decoder takes account of attention vector.

模型：

It consists of decoder (above), encoder(below) and cosine similarity function.

The model consists of three components: encoder, decoder and a similarity function.

The encoder compresses source texts into semantic vectors.
and the decoder generates summaries and produces semantic vectors of the generated summaries.
Finally, the similarity function evaluates the relevance between the sematic vectors of source texts and generated summaries.
Our training objective is to maximize the similarity score.

Text Representation:

There are several methods to represent a text or a sentence, such as mean pooling of RNN output or reserving the last state of RNN.

We select the last output $h_{N}$ of RNN encoder as the semantic vector of the source text.

Actually, the last output $s_{M}$ contains information of both source text and generated summaries. We simply compute the semantic vector of the summary by subtracting $h_{N}$ from $s_{M}$ :

Semantic Relevance:

Here, we use cosine similarity to measure the semantic relevance, which is

represented with a dot product and magnitude:

compute the semantic relevance of source text and generated summary given semantic vector V_t and V_s

Source text and summary share the same language, so it is reasonable to assume that their semantic vectors are distributed in the same space.

Training:

The objective is to minimize the loss function:

where p(y|x; θ) is the conditional probability of summaries given source texts, and is computed by the encoder-decoder model

Experiments:

Results of our model and baseline systems. Our models achieve substantial improvement of all ROUGE scores over baseline systems. (W: Word level; C: Character level).