



正如其他人所說的,模型部分沒有什麼亮點。都是最近幾年的一些具有顯著成果工作的一個方法上的集成,達到了一個最好的單模型結果。這一點上Google最近的很多工作都有類似的風格(比如今年2月份的一篇Paper Exploring the Limits of Language Modeling也是類似)。BTW,我不是做機器翻譯的。

相比之下,GNMT論文[1]中更令我感興趣的部分在於Quantizable Model and Quantized Inference這一節(也是這篇論文工程味道很濃的原因之一)。

這節主要介紹了Google是如何上線這個複雜模型的。Deep LSTM這樣的複雜模型想要上線必須要很好地解決Latency問題,而且很多TFBoys應該知道,Tensorflow這個框架自帶的線上Inference功能在time consuming上很糟糕...(起碼據我所知是很糟糕)。所以我第一時間就找了論文中有沒有這部分的描述,還好Google在論文中描寫了他們如何做線上布置的。

簡單地講Google做了兩件事:Quantized Inference 和 TPU。兩點都值得關注,Quantized Inference這部分Google主要提了他們怎麼對LSTM模型做Quantize:

Many of those previous studies [18, 19, 41, 26] however mostly focus on CNN models with relatively few layers. Deep LSTMs with long sequences pose a novel challenge in that quantization errors can be significantly amplified after many unrolled steps or after going through a deep LSTM stack.

Quantize做到了Perplexity Loss略有損但是BLEU不變。



以上的觀點僅來自於個人的一點經驗,因為沒有橫向對比過/不清楚到底有多少公司/組織有能力架構這樣的Deep LSTM做low latency的Inference,我的答案的關注點可能有失偏頗,望指正。

最近兩天最令我反感的一個中國科技新聞就是「谷歌神經機器翻譯取得了顛覆性的突破」。感覺集盡標題黨之能事,分分鐘想靠標題獲取點擊量~ 谷歌發了篇journal性質的「集成」式論文,探討了神經機器翻譯在良好的工程實現下,和統計機器翻譯的對比。然後,中國新聞用了誇張的手法,描述了這一成果。

我想說:大廈永遠不是一天建成的,這85%的錯誤率的下降是最近兩三年來,機器翻譯學者日以繼日的研究實現的。Seq2Seq+Attention讓錯誤率下降了X個點,Stack LSTM+系統領域谷歌N篇扛鼎製作讓錯誤率下降Y個點,之後直接Minimize Sentence loss(BLEU等)而不是Maximum Likelihood又讓錯誤率下降了Z個點。現在到好,卧槽,我看著標題以為以為這篇論文又讓NMT提高了百分之八十五,直接先看中文新聞再看Arxiv,最後發現是中國記者的筆讓機器翻譯系統顛覆式的飛躍了。秉著一個嚴謹的態度,應該說明這篇論文沒有方法學的提升,而是google以及其他學者之前的工作讓神經機器翻譯有了突破。

谷歌這篇論文:論文是已有模型的拼湊,工程味道很濃,是過去幾年自然語言處理領域好方法的集成,集成了Seq2Seq + Attention + Stack LSTM + Minimize Sentence Loss ,方法學上的貢獻不多,實驗經驗的貢獻比較多。從頭到尾,雖然我不是做機器翻譯的,我也可以光看他的圖和公式和簡單的符號解釋看明白論文。這個機器翻譯系統仍然基於Seq2Seq框架。Seq2Seq框架,這篇兩年前的NIPS論文可謂是重劍無鋒,大巧不工,為機器翻譯甚至是自然語言生成另闢蹊徑,讓很多已經做不下去的領域迎來了新的希望。之後又有Attention機制讓Seq2Seq模型更加優雅更加有效。再加上最近發現訓練時候,直接優化句子或者語料庫一級的目標,例如直接優化BLEU會讓模型訓練的更加有效。

最後吐槽一下錯誤率這個非常擅長造大新聞的指標,滿分6,原來3.694 現在4.263,這個數據並不讓人十分興奮。但是60%錯誤率的提升確是個大新聞!!!國內媒體在23頁論文中,其他都不看,就看了這個指標(而且只有一個語言是85%的提升)寫到了標題,說的嚴重些:其心可誅~ 如果按照錯誤率這個指標的下降,例如原來錯誤率3%,現在1.5%(我不知道這是下降了50%還是100%),提高了1.5個點在很多數據集上這1.5個點的提升是過不了統計學的顯著性檢驗~

總之,我是不怎麼信有什麼 奇點 理論的,磚要一塊一塊搬,機器翻譯要一行一行代碼寫,每篇論文提高一小步。人工智慧大家不要相信今天,但一定要相信明天



  1. 谷歌的神經網路翻譯(GNMT)的性能與傳統的基於片語的翻譯(PBMT)相比,的確有了顯著的提高。在不同的語言對上,GNMT把PBMT與人工翻譯的鴻溝縮小了 58% ~ 87%,在某些語言對上可以說接近了人工翻譯的水平。
  2. 但是,說GNMT將取代人工翻譯,還為時尚早。GNMT仍然時不時地會犯一些很傻的錯誤,論文的最後一頁列舉了一些,機智的網友們也發現了不少。實際場合的翻譯,尤其是書面翻譯,對這樣的錯誤容忍度很低。
  3. GNMT的貢獻主要還是在不為用戶所了解的技術方面。神經網路翻譯與PBMT相比,模型「清爽」了許多,一個神經網路搞定一切,只是一直以來在性能和速度方面比不上PBMT。GNMT把神經網路翻譯在性能和速度方面的潛力發揮了出來,我覺得神經網路翻譯在不久的將來將成為主流。


因為有了deep learning,所以才做得到以句子為單位。
從continuous word embedding開始到一長串句子的RNN, deep learning其實成功的對MT任務的參數空間做了降維。不信的人手寫一個6-gram語言模型跑一跑去。

因為有了deep learning,所以從representation到表示全部可以end-to-end的來train,所以可以充分利用海量數據和海量機器。


不過我不覺得現在NMT的突破是semantic上的。Semantic這玩意到底是啥其實都定義不清楚(望著我的dissertation哭泣)。現在主要還是memory based的思想來硬搞,數據夠大,模型夠強力,機器夠多,死記硬背都能幹的很好。並不是說真的和人一樣去理解文本里的Semantics。





來看一段2012年的視頻 微軟最新計算機「同聲傳譯。等不急的同學可以從6分30秒開始看,那是語音對文字翻譯。7分20秒開始黑科技,語音對語音同傳,而且可以保留原有說話人的聲線。


(當年借的我的workstation做的這個demo,因為在MSRA就我的Z820+雙顯卡強大到足以支撐這個demo。後來就只要普通電腦了。現在用於skype translator。)








粗粗看了一下這篇最新的論文 (周末有空得細看一遍),感覺主要是工程上的進展,而不是模型和思路上的突破。工作絕對是厲害的工作,但肯定不是媒體吹捧的"顛覆性進展"。科技進展從來都是循序漸進。不懂技術的小編們做科技媒體,必然是這個效果吧,誰名氣大關注誰,然後斷章取義截取點醒目的數字。和大多數論文中的錯誤率比較一樣,85%這個數據是怎麼來的,跟什麼比較,這些都是很模糊的概念,而評價翻譯的好壞更是很主觀的事。外行的媒體人就抓住 85% 這個數字不放。


Machine translation is by no means solved. GNMT can still make significant errors that a human translator would never make, like dropping words and mistranslating proper names or rare terms, and translating sentences in isolation rather than considering the context of the paragraph or page.

為了驗證這段話,我讓 Google translate 把這段話本身給翻譯了 (如有人評論指出,應該用中文翻英文,不嚴謹了):










Neural machine translation主要的工作來自kyunghyun Cho等人在14年的工作,後續有很多模型跟進。Google這篇文章很好的利用了google的超強硬體優勢和系統架構能力,主要是篇工程性文章,在模型演算法上感覺沒有太大創新,但是LSTM層數很多訓練量大。

感覺主要credit要給Kyungyun Cho一干人等,當然Google很好地商業化了neural machine translation模型也是很優秀的工作。


其實NMN很久之前,TensorFlow 官網就有實現教程了,只是寫得比較爛,所以可能沒什麼人會看。。

不過 TensorLayer 提供了中英文版本的教材,使用 EmbeddingAttentionSeq2seqWrapper 就能把整個網路實現!

GitHub - zsdonghao/tensorlayer: TensorLayer: A Deep Learning and Reinforcement Learning Library for TensorFlow.

Welcome to TensorLayer



1. google的。

2. 百度的

3. bing的







第一章 黃金時代

我們的故事要從1887年的德國開始。位於萊茵河邊的卡爾斯魯厄是一座風景秀麗的城市,在它的城中心,矗立著著名的18世紀的宮殿。鬱鬱蔥蔥的森林和溫暖的氣候也使得這座小城成為了歐洲的一個旅遊名勝。然而這些怡人的景色似乎沒有分散海因里希 魯道夫 赫茲(Heinrich Rudolf Hertz)的注意力:現在他正在卡爾斯魯厄大學的一間實驗室里專心致志地擺弄他的儀器。那時候,赫茲剛剛30歲,也許不會想到他將在科學史上成為和他的老師赫耳姆霍茲(Hermann von Helmholtz)一樣鼎鼎有名的人物,不會想到他將和卡爾 本茨(Carl Benz)一樣成為這個小城的驕傲。現在他的心思,只是完完全全地傾注在他的那套裝置上。



百度譯文 (2016.9.30)

The first chapter of the golden age

Our story starts in Germany in 1887.Karlsruhe is located in the Rhine river is a beautiful scenery of the city, in its city center, stands the famous eighteenth Century palace. A wild profusion of vegetation forest and warm climate also makes the city become a tourist spot in europe. However, the pleasant scenery does not seem to distract Heinrich? Rudolf? Hz (Heinrich Rudolf Hertz) attention: he is now a laboratory at University of Karlsruhe in intently with his instrument. At that time, at just 30 years old, may not think he will become a teacher and his sister Hotz Hector in the history of Science(Hermann von Helmholtz) as a celebrated figure, he will not think and Carle Benz? (Carl Benz) has become the pride of the town. Now his mind, but completely devoted to his set of devices.

Hertz"s device today seems to be very simple: its main part is an electric spark generator, there are two very close to the small copper ball as a capacitor. Hertz stared at the two opposite brass balls, then closed the circuit switch. Suddenly, the power of the magic began to show in this simple system: the invisible current through the device in the induction coil, and began to charge the copper ball capacitor. Hertz looked at him coldly, in the heart of the two section of the capacitor to imagine the situation of rising voltage. In the field of electricity for so long, have full confidence in Hertz, on their own knowledge he knew, as the voltage rise quickly between the two ball air will be punctured, then the system oscillation circuit will form a high-frequency (LC loop), but he now wants to see not this.

Sure enough, after a while, with slight "bang", a bunch of blue electric beautiful flowers burst open between two copper ball, the whole system has formed a complete loop current, small beam in the air kept twisting, blooming out faint fluorescence.


"Karlsruhe is located in the Rhine river is a beautiful scenery of the city," 是什麼鬼?

"At that time, at just 30 years old, may not think he will become a teacher and his sister Hotz Hector in the history of Science (Hermann von Helmholtz) as a celebrated figure," 原文何來可以譯成"sister"的地方?


必應翻譯 (2016.9.30)

The first chapter of the golden age

Our story in 1887 from Germany began. Rhine River is a scenic city of Karlsruhe, in the city centre, stands the famous 18th century palace. The lush forests and warmclimate make this town became a tourist attraction in Europe. However, these delightful scenery seemed to have dispersed Heinrich? Rudolph? Hertz (Heinrich Rudolf Hertz)attention: he is now a laboratory of the University of Karlsruhe became absorbed in playing with his instrument. At that time, Hertz was just 30 years old, may not think about him in the history of science and his teacher Heer Helmholtz (Hermann vonHelmholtz) famous figures, does not think he will and Karl? Benz (Carl Benz) become the pride of the town. Now his mind is completely devoted to her on his device.

Hertz equipment today is very simple: it is a major part of EDM generator, two near the little copper ball apart as a capacitor. Hertz was so engrossed in watching these two opposing and copper balls, and then closes the circuit switch. Suddenly, the magic started to show in this simple system: invisible currents through the Inductioncoil in the device and started to charge the ball capacitor. Hertz drily watched his device, imagining the capacitance of two voltage rise in my heart. In the electrical field for so long, and Hertz have sufficient confidence in their knowledge, he knows, as the voltage rises, soon between the two balls of air being penetrated, then the whole system will form a high-frequency oscillation circuit (LC circuit), but he now wants to observe not this one.

And sure enough, after a while, with the nuances of "pop" sound, a bouquet of beautiful blue flowers burst open between the two copper ball, the system forms a complete loop, small current beam keeps twisting in the air, bursting out with faint fluorescence.



Chapter 1 The Golden Age

Our story begins in 1887 in Germany. Karlsruhe, on the Rhine, is a beautiful city with its famous 18th-century palaces in the heart of the city. The lush forest and warm climate make this small town a European tourist attraction. However, these pleasant views do not seem to distract Heinrich Rudolf Hertz, who is now playing with his instruments in a laboratory at the University of Karlsruhe. At that time,Hertz was just 30 years old and probably would not have thought that he would become as famous as his teacher Hermann von Helmholtz in the history of science and would not have thought that he would be with Carl Benz) as the pride of this small town. Now his mind, but completely poured into his set of devices.

Hertz"s device appears to be very simple today: its main part is an electric spark generator, there are two small copper balls as close to the capacitor. Hertz stared intently at the two opposing copper balls, and then closed the circuit switch. Suddenly, the magic of electricity began to show in this simple system: the invisible current through the device in the induction coil, and began to charge the copper ball capacitor. Hertz stared coldly at his device, imagining the rising voltage of the capacitor in the heart. In the field of electrical study for so long, Hertz on their knowledge is full of confidence, he knew that with the rise in voltage, and soon the air between the two balls will be breakdown, and then the whole system will To form a high-frequency oscillation circuit (LC circuit), but he is not want to observe this.

Sure enough, after a while, with the subtle "pop" is heard, a bouquet of beautiful blue electric flower burst open between the two copper balls, the entire system to form a complete loop, a small current beam in the air Kept twisting, blooming out of the faint fluorescence.

除了時態(三個翻譯引擎都處理不好),和一處把「自己的知識」譯成了「their knowledge」(三個引擎都有此問題)。其他基本能流暢閱讀。原文中括弧里的英文人名,在譯文中直接用來做人名而刪去了括弧。辭彙選擇上傾向選用短語而不是大詞(相對bing),相對降低了文中的低級語法錯誤引起的違和感(個人感覺)。





So bored in the afternoon, writing a bibliography. I am a comparative literature, in that time, contact a lot of theoretical knowledge of books, but often wonder, if when you undergraduate reading a little more would be nice.


Afternoon too boring, writing bibliography. I was reading comparative literature, in this period of time, contact a lot of theoretical knowledge of the book, but often think, if the time to read more than just undergraduate.



