Flow-based Density Estimation

07-31

來自專欄 qsss sss的數學學習筆記2 人贊了文章

Flow-based generative model是最近的熱點。最出名的應用是Fast WaveNet 和最近的Glow。本文從模型目前最general的Neural Autoregressive Flows 來介紹這些Flow-based Density Estimation。

首先是normalizing flow。假設有一個可逆映射 $f$ （這就是一個flow），考慮那麼對於隨機變數 $X$ 和 $Y=f(X)$ ，其pde（概率密度函數）有change of variables formula：

現在要用這個f來逼近一個概率分布，可以用SGD優化KL-divergence：

也就是：

注意這個公式沒印對，p_X那一項應該已經在期望里了

文章給出了兩種應用：

1. 用MLE逼近一個未知數據分布。此時x應該是從未知分布來的採樣，f把x投到一個latent space，這個latent space應該符合一個足夠好算的分布 $p_{target}(y)$ 。

2. variational inference，公式(2)其實就是一個特殊的variational inference的reparameterization trick。

以下是訓練過程：

一般來說行列式非常難算，in practice 這個f被設計成Jacobi matrix是上三角或者對角的以簡化計算。比如這篇文章提出的affine coupling layer：

這個transformer的Jacobi是

Affine autoregressive flows是一個autoregressive的model。考慮序列 $X[1:T], Y[1:T]$ ：

剩下的和之前的normalizing flow一樣，對x的變換不能太複雜：

Affine autoregressive flows用的是affine layer。以下是Fast WaveNet的例子：

Fast WaveNet中還提到generate sample的時候有一些tip，簡單來說就是iteration可以提高sample的質量。

In general, normalising flows might require repeated iterations to transform uncorrelated noise into structured samples, with the output generated by the flow at each iteration passed in as input at the next one. This is less crucial for IAFs, as the autoregressive latents can induce significant structure in a single pass. Nonetheless we observed that having up to 4 flow iterations (which we implemented by simply stacking 4 such networks on top of each other) did improve the quality.

回到正文，文章中採用DNN作為 $au$ ：

相應的公式為：

整體架構

如果用weight大於0的Affine layer和可逆的激活函數，這個DNN就是可逆的。在維度比較小的時候直接算行列式也能接受。用DNN的話有個優點是輸出的各個維度有相關性。

實驗都比較簡單，可能是因為latent space維度不高，太複雜的task不一定能做。