【神文】用文言文英語和白話文三語評述機器學習之產生式模型

02-02

談吐風生，援筆立成

——生成式模型選錄

翻譯：尹肖貽

校對：孫月傑

機器學習之中，以產生（/生成）式模型最為波雲詭譎，近來多得學界巨擘青睞。現掬三則材料，翻譯此處，以備討論。

同仁多諳熟機器學習，朝乾而夕惕。此番多有班門弄斧之嫌；倘有不周之處，貽笑大方，望讀者不吝賜教。

生成式模型的魔術

[Extraction] Understanding and Implementing Deepminds DRAW Model

【取自】理解和實現Deepmind小組的DRAW模型

BynEric Jang

…Wed like to be able to draw samples c～P, or basicallyngenerate new pictures of houses that may not even exist in real life. However,nthe distribution of pixel values over these kinds of images is very complex.nSampling the correct images needs to account for many long-range spatial andnsemantic correlations,

吾等取樣某值於某概率之域，譬如勾畫房舍圖樣，神思或可飛出三界之外。然其點描畫紙各處，其功甚巨。此類採樣，須裁量絲縷於巨幅，細思關聯於鴻篇。

我們考慮這樣的任務：嘗試生成圖中的建築的圖像。每個圖像c~P，P代表某種代表「房子」的概率分布，可以是現實生活中的，也可以是虛構的。可是如何確定圖片中每個像素的值，是極其複雜的。採樣到正確的圖像，需要許多大尺度空間或語義的聯繫，

such as:

Adjacent pixels tend to be similar (a property of the distribution ofnnatural images)

The top half of the image is usually sky-colored

The roof and body of the house probably have different colors

Any windows are likely to be reflecting the color of the sky

The house has to be structurally stable

The house has to be made out of materials like brick, wood, and cementnrather than cotton or meat

... and so on.

諸如：鄰筆常類同；上而為天；蓬牖或異色；玻璃映天藍；屋體聳立；房上多泥石瓦木，而非錦衣玉食；諸如此類。

比方說：l 相鄰像素傾向於相似（自然圖像的一個性質）；l 圖像的上半部分往往是天空的顏色；l 房頂跟房體很可能顏色不一樣；l 窗子很可能反射著天空的色彩；l 房子必須有堅固的結構；l 房子由瓦、木頭、水泥什麼的造出來的，而不是棉花、肉什麼的；l ……

We need to specify these relationships via equations and code, even ifnimplicitly rather than analytically (writing down the 360,000-dimensionalnequations for P).

吾輩演此技藝於算術編碼之上，顧隱含而非人為，於概率域而求索。

我們把這些關係寫成等式與代碼，即使內隱的而非分析的（寫成360,000維的等式來擬合概率分布P）【註：360000=600^2,是作者前文設定的圖像大小】

Trying to write down the equations that describe "housenpictures" may seem insane, but in the field of Machine Learning, this isnan area of intense research effort known as Generative Modeling.

如是演繹勾畫房舍之圖，似有愚公移山之浩繁。於機器學習，此類演繹，統稱生成式建模。

試圖寫出等式來描述「房舍圖像」，乍看起來非常「瘋狂」。在機器學習領域，這樣「瘋狂」地研究產生（或描述）圖像規則的領域，叫做生成式建模。

Formally, generative models allow us to create observation data out ofnthin air by sampling from the joint distribution over observation data andnclass labels. That is, when you sample from a generative distribution, you getnback a tuple consisting of an image and a class label.

正言之，生成模型，即以採樣圖文並之標記，便可於虛無之間，織造所示所聞。換言之，採樣產生之分布，既得圖文，又得標記。

確切地說，生成式模型讓我們可以，通過在觀測數據並分類標籤的聯合分布上採樣的方式，從無到有地產生出觀測數據（createnobservation data out of thin air）。換句話說，當我們在生成式模型的分布上採樣，我們得到圖像與其標籤構成的元組。

This is in contrast to discriminative models, which can only sample fromnthe distribution of class labels, conditioned on observations (you need tonsupply it with the image for it to tell you what it is). Generative models arenmuch harder to create than discriminative models.

異於判別模型，採樣需於觀察條件之下，得其類別標記，即告知模型，該圖為何物，如是這般。生成模型較判別模型，難度遠甚。

與判別式模型相對照，後者的採樣空間是給定觀測圖像條件下的類別標籤。（即，你需要告訴演算法，圖像裡面是什麼）。從這個對比不難發現，建立生成式模型比判別式模型，難度大得多。

There have been some awesome papers in the last couple of years that havenimproved generative modeling performance by combining them with Deep Learningntechniques. Many researchers believe that breakthroughs in generative models arenkey to solving ``unsupervised learning, and maybe even general AI, so thisnstuff gets a lot of attention.

近年得益於深度模型進步，生成模型亦多有論文述論。研究界廣存同仁篤信生成模型，舉其為非監督學習之要津，乃至廣義人工智慧之秘訣，而頗受青睞。

近幾年，深度學習技術對生成式模型有所建樹。許多研究者認為，生成式模型方面的突破，是解決「非監督學習」的關鍵，乃至於廣義人工智慧的根基，所以生成式成為時代的寵兒。

生成式模型與判別式模型，你們分別是什麼？

[Extraction] On Discriminative vs. Generative classifiers: A comparison ofnlogistic regression and na?ve Bayes

【摘錄】判別式分類器pk生成式分類器：邏輯回歸與樸素貝葉斯的對比

By Andrew Y. Ng, Michael I. Jordan

Generative classifiers learn a model of the joint probability, p(x, y), ofnthe inputs x and the label y, and make their predictions by using Bayes rulesnto calculate p(y|x), and then picking the most likely label y.

生成式分類器統劃圖文甲及其標記乙，以貝氏規則籌算條件概率，若甲而得乙，則可任舉圖文，而饋其標記。

生成式分類器研究數據的聯合分布p(x,ny)，其中輸入數據為x，標籤為y。通過貝葉斯規則計算p(y|x)，選取最可能的標籤y。

Discriminative classifiers model the posterior p(y|x) directly, or learn andirect map from inputs x to the class labels. n

判別式分類器則直入後驗，以圖文甲推得標記乙。

判別式分類器直接模擬後驗分布p(y|x)，即從輸入x直接得到它的類別標籤。

There are several compelling reasons for using discriminative rather thanngenerative classifiers, one of which, succinctly articulated by Vapnik [6], isnthat "one should solve the [classification] problem directly and nevernsolve a more general problem as an intermediate step [such as modelingnp(x|y)]."

判別勝於產生，良有以也。嘗有雅士魏普尼，一言以蔽：處事之際，須簡明扼要，而非勞師襲遠。

有許多有力的理由，支持我們使用判別式分類器，而不是生成式分類器。其中一個，來源於Vapnik的原話，簡潔而有力：「我們在解決（分類）問題時，應該直接求解，而不是先解決一個更寬泛的問題作為中間步驟（比如建模p(x|y)）」。

Indeed, leaving aside computational issues and matters such as handlingnmissing data, the prevailing consensus seems to be that discriminativenclassifiers are almost always to be preferred to generative ones.

誠然，且不論生成模型之計算浩繁，條列闕如，世人偏愛判別之於產生，似成共識。

的確，即使撇開計算問題與數據缺失的情況不談，學界的共識也傾向於認為，判別式模型比生成式模型更好。

Another piece of prevailing folk wisdom is that the number of examplesnneeded to fit a model is often roughly linear in the number of free parameters of a model. This has its theoretical basisnin the observation that for "many" models, the VC dimension isnroughly linear or at most some low-order polynomial in the number ofnparameters, and it is known that sample complexity in the discriminativensetting is linear in the VC dimension.

另有鄉野智謀，以為模型參數之多寡，較於估算情況之多寡，乃伯仲之間。此源於別案另律，VC維度同參數之數量，若線性比齊，則不分軒輊，或稍有不足。另有定俗曰，公斷之集，採樣之簡明或繁冗，全在VC維度。

另一類普遍的「民間智慧」認為，模型需要擬合的樣本的數量，往往跟模型參數的數量呈近似的線性關係。這在許多模型中有理論基礎，VC維與參數接近線性相關，至多也只是呈較低次數的多項式關係。眾所周知，在判別集合中採樣的複雜度，跟VC維呈線性關係。

We consider the naive Bayes model (for both discrete and continuousninputs) and its discriminative analog,nlogistic regression/linear classification, and show: (a) The generative modelndoes indeed have a higher asymptotic error (as the number of training examplesnbecomes large) than the discriminative model, but (b) The generative model maynalso approach its asymptotic error much faster than the discriminativenmodel-possibly with a number of training examples that is only logarithmic,nrather than linear, in the number of parameters. This suggests-and ournempirical results strongly support-that, asnthe number of training examples is increased, there can be two distinct regimesnof performance, the first in which the generative model has already approachednits asymptotic error and is thus doing better, and the second in which thendiscriminative model approaches its lower asymptotic error and does better.

吾等推演樸素貝氏模型，所填圖文或斷或連，而比之以譬喻羅吉斯回歸或分類模型，斷言如是：（一）生成模型至於無窮處，確多有差額，然（二）其收斂之效，遠速於判別，對數之於線性，彰顯無遺。此二者揭示，並經驗范之，曰，以訓練數之增加，模型神態各異，初以生成模型鰲里奪尊，後以判別模型一騎絕塵。

我們研究了樸素貝葉斯模型（對於離散跟連續兩種輸入情況）和它對應的判別式版本邏輯斯蒂回歸/線性分類器，得出如下結論：（a）生成式模型（隨著訓練樣本的增大）的確有更高的漸進誤差，但是（b）生成式模型達到漸進誤差線的速度，比判別式模型快得多，也許只需要參數的對數的量的訓練樣本，而不是線性的量，就可以達到漸近線。這表明——而且我們原先的結論強有力地支持——隨著訓練樣本的增加，不同的模型表現出不同的性態：生成式模型比對應的判別式模型更快地達到漸進誤差線，在判別式模型還未達到它的漸進誤差線時，取得長時間的領先；而在那一點以後，判別式模型具有更低的漸進誤差，而表現更佳。、

What would be a practical use case for Generative models?

生成式模型在現實場景下有什麼功用咧？

By Yoshua Bengio

Because if younare able to generate the data generating distribution, you probably capturednthe underlying causal factors. Now, in principle, you are in the best possiblenposition to answer any question about that data. That means AI.

蓋因通達概率分布之輩，必曉暢數據生成之要訣。茲以是理論，縱觀統籌，通演數據之變，捨生成式模型，概莫能外。奇技所至，謂之人工智慧。

因為如果你掌握了產生數據的概率分布，你很有可能找出了隱含在的數據背後，產生它們的「原因」；所以，在理論上，你處於在極其優越的位置上，回答與數據有關的任何問題。這意味著實現了人工智慧的本質。

But maybe thisnis too abstract of an explanation. A practical use-case is for simulatingnpossible futures when planning a decision and reasoning. As I wrote earlier, Inknow what to do to avoid making a car accident even though I never experiencednone. I actually have zero training example of that category! Nor anything closento it (thankfully). I am able to do so only because I can generate the sequencenof events and their consequence for me, if I chose to do some (fatal) action.nSelf-driving cars? Robots? Dialogue systems? etc.

然前腔文飾太盛，練達之處在於判事處世之際，推衍緣由，預謀未然。縱余愚鈍，然即未歷車馬橫禍，尚曉規避；縱無此類教訓，不致引火燒身。蓋余之智力，於性命攸關之時，可瞻前而顧後。自駕車、機械人、高談闊論機，凡此種種，機緣一統，如是而得。

不過，這個回答太抽象了。較為實際的情況下，生成式模型的功用在於模擬未來的數據，以做出判斷或推理。正如我前文所寫的那樣，雖然沒有經歷過車禍，我也知道怎樣避開這樣的事情。可是，我從未參加過躲避車禍的培訓！（幸運的是）甚至沒有任何一件與之類似的事情！我之所以能夠做這件事，因為我能夠在（致命的）事件中，按時間的先後，估算事件的發展，並預判它們對我可能造成的影響。自動駕駛車、機器人、人機對話系統，其理論基礎都是這樣的。

Anothernpractical example is structured outputs, where you want to generate Ynconditionally on X. If you have good algorithms for generating Ys in the firstnplace, the conditional extension is pretty straightforward. When Y is a verynhigh-dimensional object (image, sentence, data structure, complex set ofnactions, choice of a combination of drug treatments, etc.), then thesentechniques can be useful.

另有例證，倘得後文勾股凜然，生成模型可由甲，闡證而得乙。若肇始之時，已知乙產生之算術，延展之事，勢如破竹。當乙處高維空間之中（圖、句、數桁、雜務集、藥物運籌之類），生成模型良堪一用。

生成式模型另外一個應用的例子是，輸出數據如果是結構化的，就可以在條件集X的基礎上產生數據Y。如果你手裡有一個Y的產生演算法，你可以很直接地得到概率條件的延展。當Y處於高維空間的時候（例如圖像、句子、數據結構、複雜的行動集合、醫藥治療的組合方案等），生成式模型都有用武之地。

We are usingnimages because they are fun and tell us a lot (humans are highly visualnanimals), which helps to debug and understand the limitations of thesenalgorithms.

吾輩效力於圖像之事業，蓋因其趣味盎然、神采豐厚（更兼人類視覺最銳）；試煉或深思演算法之軒輊，無出其右。

我們大多跟圖像打交道，因為圖像是有趣的，蘊含了豐富的信息（人類是高度依賴視覺的動物）。圖像能幫助我們調試、理解演算法的制約和邊界。

致謝：

感謝中國科學院計算技術研究所劉昕博士有益的建議，以及孫月傑醫生對此文稿的辛勤校對。對我的導師及研究小組的同學們一併致以謝忱。

本文屬於深度學習大講堂原創，不可隨意轉發，如需要轉發，請聯繫@果果是枚開心果

譯者簡介：尹肖貽，曾就讀中國農業大學，獲得電子信息工程學士學位、英語語言與文化學士學位，現就讀於中國科學院計算技術研究所智能信息實驗室，攻讀計算機科學博士學位。熱愛知識，廣泛獵奇於各類學科，哲學、心理學、數學稍有心得。愛好運動，工於長跑。博學、博雅、博物、博採，雜俎廣納，博觀約取。郵箱：yxyuni@163.com

最後，歡迎關注我們的公眾號：深度學習大講堂，我們致力於推送人工智慧，深度學習方面最新的技術，產品以及活動！