AI 論道 - 第二期: GAN 用於生成式摘要提取的實踐和探索 - 生成可讀性更強的摘要

作者:楊敏 @MirandaYang

編輯:溫偉煌 @溫涼

這篇文章主要介紹了我們的本科生劉林青等同學的將 GAN 用於生成式摘要提取的實踐性工作。這裡主要想和大家分享一些在實現中遇到的挑戰以及一些解決方法。歡迎多多交流,郵箱 ailundao2017@gmail.com

隨著近幾年信息爆髮式的增長,迅速從文本中提取出關鍵信息已經成為我們的迫切需求。自動文本摘要技術旨在解決這個問題,它有非常多的應用場景,例如新聞關鍵句生成等,並且為很多下游應用提供支持。這裡我們主要關注生成式(abstractive)摘要,它將文中原句進行縮寫、轉述等,生成更凝練的摘要,有一些短語甚至不包含在原文當中。為了使生成的句子更為接近人類本身的語言,甚至很難讓讀者判斷是機器生成還是人為生成,我們嘗試用對抗生成網路(Generative Adversarial Network)來提高生成句子的質量。

生成式摘要模型近年來已經取得了很大的進展,但是仍然存在問題:(1)生成的摘要語法性和可讀性不高。(2)很多之前的工作用最大似然估計(MLE)來訓練 sequence-to-sequence 模型,導致 exposure bias 積累,影響測試過程 。為了解決以上問題,我們提出了一個對抗框架來共同訓練生成模型 G (Generator) 和判別模型 D (Discriminator)。 具體而言,生成器 G 將原始文本作為輸入並生成摘要,我們使用強化學習(即策略梯度)來優化 G 以獲得反饋 (reward) 較高的摘要。 因此,它有效地繞過了 exposure bias 和 non-differentiable 的任務度量問題。鑒別器 D 需要去區分是真實人類做出的摘要還是 G 生成的摘要,而生成器 G 的訓練過程是最大化 D 發生錯誤的概率。因此,這種逆向過程最終可以通過調整 G 來產生符合邏輯的,高質量的抽取式概要。

生成器

生成器 G 採用了 pointer-generator 模型。在標準的 Sequence-to-sequence attentional model 的基礎上,分別計算從原文中複製辭彙(pointing or copying)的概率,以及從詞表中自由生成辭彙(generating)的概率。具體模型細節如下圖所示(此部分主要參考了 [1])。

這裡在 copy 的機制下,如果 w 是詞表外的詞,那麼 generation 對應的概率設為 0;同理,在 pointer 機制下,如果 w 沒有出現在原文中,那麼此處 attention distribution 的加權和設為 0。根據以下公式,生成詞 w 的概率等於它從辭彙表中生成的概率加上其出現在原文本中任何地方的概率。Pointer-generator 的主要優勢之一即為能夠生成預設詞表之外的辭彙。

判別器

判別器 D 是一個基於 CNN 的二元文本分類器。我們使用 CNN 對輸入的文本序列進行編碼,通過多個濾波器來獲取多組特徵,然後進行 max-over-time pooling 操作,最終池化後的特徵輸入到全連接的 softmax 層得到分類概率(參考 [6])。

模型參數更新

在對抗訓練的過程中,使用判別器 D 作為 G 中的 reward。我們可以通過動態更新 D 來進一步改進 G 的表現。當 G 固定時,我們可以通過以下目標函數更新 D:

在 D 的參數確定之後,我們繼續更新 G。G 的損失函數分為兩個部分:由策略梯度 (policy gradient) 計算的 Jpg 和由極大似然估計 (MLE) 計算得到的 Jml。G 的最終目標函數為:J = β(Jpg) + (1- β)Jml 。β 是平衡 Jpg 和 Jml 大小差異的比例因子。 Jpg 和 Jml 具體計算如下:

Loss by Policy Gradient:

生成器 G 被看作是一個 stochastic parameterized policy,被訓練以得到最大化的最終 reward:

我們使用 Reinforce 演算法,並將鑒別器 D 的結果 (G 生成的摘要被判別成真實摘要的概率) 作為 reward:

由於 D 只能對一個完全生成的完整序列進行評估,與 [2] 類似,我們通過蒙特卡洛樹搜索(MCTS)來評估中間狀態的 reward,方法框架如圖所示:

目標函數 Jpg(θ) 的梯度計算如下圖所示:

Loss of maximum-likelihood function:

訓練序列生成解碼器的最傳統的做法就是在每一個解碼的步驟都最小化 maximum-likelihood loss:

數據集

我們採用 CNN/Dailymail 新聞數據集來進行評估,新聞中的 highlight 作為 summary 的 ground truth 參與訓練,數據集可以從 cs.nyu.edu/~kcho/DMQA/ 獲取。這一數據集包含 287226 條訓練數據,13368 條驗證數據和 11490 條測試數據,是至今為止規模最大的 abstractive summarization 數據集之一。

參數設置

我們首先預訓練 generator G,訓練了 13 epoch。然後我們將預訓練的 generator 生成 10000 條 summary 和對應的 ground truth summary 去預訓練 discriminator D。最後再進行對抗訓練 5 epoch 之後在 validation set 上 loss 不再下降即停止訓練。訓練 batch size 為 16,訓練 generator 的 learning rate 為 0.15,訓練 discriminator 的 learning rate 為 0.01,對抗訓練的學習率為 0.0001.

對比演算法

我們對比的方法有: abstractive model (ABS) [3],the pointer-generator coverage networks (PGC) [1] 和 abstractive deep reinforced model (DeepRL) [4]。

結果分析

1. ROUGE

ROUGE 全稱是 Recall-Oriented Understudy for Gisting Evaluation,是 summary 中最常用的評估指標。用來計算生成的 summary 和 ground truth summary 之間的重合 n-gram 的 precision,recall 和 F 值。

註:DeepRL [4] 等提出的有純 RL 的訓練結果,但是根據原文作者的評價,他認為這個 rl 的結果對於 summary 任務沒有幫助,因為可讀性非常的差,並且存在很多不符合句法語法的部分。所以在 ROUGE 評估時,只對比了 ML+RL 的版本,也就是原文作者彙報的最高可讀性的結果。

2. novel n-gram

這一指標主要是用來評估模型的 abstractive 能力,也就是評估生成的 summary 中有多少未在原文中出現的新語句或者短語。我們計算了 1,2,3,4-gram 和 sentence 級別的 novel 程度。

可以看到,我們的模型與 PGC 模型相比,有了更多的 novel n gram,能夠生成更加多樣性並且概括性的摘要。

3. 人工評估

我們隨機選取了 50 條 summary 和對應的文本來進行人工評估,有評估者對 summary 的生成質量進行排序, 1 代表可讀性質量最低,5 代表可讀性質量最高。

可以看到,我們的系統在可讀性方面,也有了一定的提升。

下面是我們的模型生成的一些摘要樣例:


Source Text

an 11-year-old chinese schoolboy has become mute after he drank a glass of water which had been laced with perfume and chalk dust as part of a prank . pupil xiao gao , from fujian province in south eastern china , has not said a word in five days after a classmate who had allegedly bullied him gave him the water , reports people 』s daily online . but medical experts are at a loss to explain his sudden inability to talk and say that the combination of water , perfume and dust - while nasty - should not have caused that type of damage . pupil xiao gao , aged 11 , (pictured) from fujian in south east china , has not spoken a word in five days after a classmate he 『 does not get on with 』 gave him the water which had been laced with perfume and chalk dust . gao , who is in year two at haidu number 8 middle school , is now awaiting further medical examination as well as an appointment with a psychologist . school officials told reporters that after gao lost his voice the classmate who gave him the water 『 became very afraid 』 and told teachers it was meant as a joke and that she did not think he would drink it . she said she only added perfume and chalk dust to the water - and not nail varnish as was rumoured . she said when gao drank the water his classmates told him to spit it out but by then he had already swallowed it . gao , who is a pupil at haidu number 8 middle school , ( pictured ) is now awaiting further medical examination . gao 』s form teacher was informed of his condition and immediately notified his family before taking him to hospital . the unnamed girl has now been excluded from the school . a doctor at fujian provincial hospital told reporters that in all the years he has practised medicine he has never seen a case like this before . qiu bin gaosu said as perfume consists mainly of alcohol , perfume fragrance and methanol , consuming a small amount would not impact the body , although a large amount would irritate the throat and stomach . he said consuming small amounts of chalk dust , which is nontoxic , should also not have this sort of effect . xiao gao 』s cousin told reporters that a number of doctors have seen gao and they found no physical abnormalities . gao claims the girl who gave him the water let him drink it and then said 『 you will die after drinking this 』 . relatives told reporters that gao 』s mother was ill and his father worked at a vegetable market . they said the boy was well-behaved and doubted the condition was a hoax - particularly as he had been unable to speak for five days . his form teacher , mr lin , said gao was a little introverted , quiet and very honest and was often bullied by other children in his class . he said : 『 whenever a teacher saw any bullying we would chastise the offender and teach them about why their behaviour was not acceptable . 』 gao 』s classroom - a doctor said that in all the years he has practised he has never seen a case like this before . through written messages gao told reporters that he was thirsty so he drank the water his classmate handed him - but after swallowing a couple of mouthfuls he realised something was n』t right . he claims it was then the girl took away the water and said 『 you will die after drinking this . 』 gao told reporters that he did n』t get along with the classmate who handed him the water . the story has gone viral in china and has been reported on all the main news websites . the people 』s daily reported that the local public security bureau has sta rted an investigation into the incident .


Reference Summary

schoolboy xiao gao , 11 , drank water spiked with perfume and chalk dust .

he has not spoken since the prank which went wrong five days ago .

doctors ca n』t explain sudden voice loss and think it may be psychological .


ABS Summary

xiao gao , from fujian province , has not said a word in five days .

but medical experts are at a loss to explain his sudden inability to talk .

but medical experts are at a loss to explain his sudden inability to talk .


PGC Summary

pupil xiao gao has not said a word in five days after a classmate he 『 does not get on with 』 gave him the water which had been laced with perfume and chalk dust .

but medical experts are at a loss to explain his sudden inability to talk and say that the combination of water , perfume and dust - while nasty - should not have caused that type of damage .


Our Summary

xiao gao , 11 , drank a glass of water which had been laced with perfume and chalk dust .

medical experts are at a loss to explain his sudden inability to talk and say that the combination of water , perfume and dust - while nasty - should not have caused that type of damage .

gao , who is in year two at haidu number 8 middle school , is now awaiting further medical examination as well as an appointment with a psychologist .


訓練細節和技巧

1. rollout 的數目確定 (精度和計算效率的平衡)

我們通過 rollout 來對生成的序列進行採樣,rollout 數目越多,理論上可以接近最真實的結果,但是在實際應用中,對一個序列為 100 的生成任務,如果在每個位置上都進行一次 mcts,並且進行 4 次 rollout,那麼就相當於完成 4*100=400 次前向傳播,這個計算量其實是相當大的。rollout 的數目和計算效率之間要形成一個平衡,在這裡我們嘗試了 4,8,16,24 四種參數。結果表明在 4 和 8 的參數條件下幾乎沒有提升,因為採樣數目太少,無法得到正確的估計值。16 和 24 結果相仿,考慮到計算效率,和(機器翻譯 + gan 做優化的工作中也做了類似探索 [5]),16 次 rollout 可以有比較好的效果。當然理論上,採樣次數越多越好。

2. MCTS 的簡化 (減少 50-70% 計算開銷)

在 alphago 的 paper 中,對於 mcts 的耗時長的缺陷,提出了一種可能的解決方案,即採用比較簡單的模型去進行搜索,從而節約採樣時間。但是我們的模型 seq2seq,attention,copy-and-gen,每一個部分都對結果有比較大的影響,所以不能採取以上策略。在這個任務中,mcts 的目的在於幫助模型發現一些潛在的更優的解,避免在陷入局部最優的情況,也就是一個開拓視野的效果。因此我們認為,可以採用如下兩種策略,既能夠擴大模型生成的多樣性來找到更多潛在的優質生成,又能夠節約計算時間。(a) 在我們的 seq2seq 生成的 summary 中,最大長度為 100,平均長度和中位數為 70。80% 句子長度小於 80,也就是說,當訓練過程中,句子通過 seq2seq 生成到長度為 80 時,如果進行 mcts 採樣,後 30 個單詞最終還是很大概率會因為前 70 序列已經出現了 < eos > 而被直接截斷輸入到判別器中。也就是說,這部分的計算資源其實是浪費掉了。我們可以將 mcts 的採樣位置從 max_dec_steps 縮短到某一個可以接受的長度,從而保證大多數的句子可以經過完整的 mcts。(b) 我們的目的,是為了在可能某個陷入局部的生成中,找到更多潛在的優質生成,所以對每一個位置上的採樣,雖然能夠得到更精確的結果,但是如果只對其中一部分位置進行採樣,也能夠得到類似的效果。也就是說,在 mcts 之前的策略是每一個位置都需要進行搜索,我們可以採取某種策略,只在某些位置上進行搜索,這個策略,可以是等間隔搜索,也可以是固定採樣概率隨機選取位點搜索,也可以採取某種 decay 機制,在早期可能性比較廣的位置進行更密集的採樣,在後期進行較少的採樣。我們的實驗中,測試了等間隔搜索。間隔從 3-5 都進行了測試,均有提高。

綜上,如果能夠截斷到生成序列的某個長度,並且進行間隔採樣,比如對於 100 的序列生成,截斷到 75 長度,並且進行 3 間隔採樣,實際只需要在 25 個位點進行 mcts。比起之前的 100 位點的 mcts,大大減少了計算開銷。並且在我們的實際測試中,這一策略尤其是截斷策略,因為存在 <eos> 截斷的機制,所以 對於模型的結果幾乎沒有影響。

mcts 是一個可以高度並行化的過程,在做了上述簡化後,通過多卡加速,就能將 gan 框架這一非常耗時的運算過程,減少一半及以上的計算開銷。

參考文獻:

  1. Get To The Point: Summarization with Pointer-Generator Networks.
  2. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient.
  3. Abstractive text summarization using sequence-to-sequence rnns andbeyond.
  4. A deep rein- forced model for abstractive summarization.
  5. Improving Neural Machine Translation with Conditional SequenceGenerative Adversarial Nets.
  6. Convolutional Neural Networks for Sentence Classification

推薦閱讀:

吳恩達 DeepLearning.ai 課程提煉筆記(5-2)序列模型 --- NLP和詞嵌入
學習筆記CB004:提問、檢索、回答、NLPIR
人工智慧學習筆記(四):對n元字元模型更詳細的數學模型描述
tf.nn.nce_loss 來自一篇古老的文章
第三章 自然語言理解的技術分類及比較

TAG:生成對抗網路GAN | 自然語言處理 | 機器學習 |