[建模]L2正則化
UPDATE: 當前觀點:L2正則化對局部動力學的影響並不是毀滅性的。
UPDATE: 原結論可能有誤。
UPDATE: 代碼不穩定,目前圖像可信度不高
模型正則化(regularisation)是防止過擬合的常用手段之一,L2和L1正則化因為其形式簡單被大量使用。但是L2正則項的不可抹消性(non-eliminatable)是一個值得注意的問題,下面我對L2正則做簡單的處理獲得一個可抹消的logL2正則。
L2 norm is weird in the sense you have to set all weights to zero to eliminate its effect. aka
This does not seems to be a desired property, since the optimiser might confuses its objective. A possible improvement is to assert the second moment is a constant K.
The l2 norm translates to K=0 and metric=l1
Its natural to considerfor common statistical models (Its impossible for your best model to have all-zero parameters, especially in neural networks). Lets use the most common loss first: mse/l2. Then the regulariser reads
But mse is often restrictive and assumes linear deviation in the local vincinty, which generally is not the case since E(x^2) respond quadratically to change in x. Hence it makes sense to log-transform the soft constraint and this leads to
here we chooseto make the function easily differentiable, and without loss of generality
However we notice this term is not distributive. Hence we can define a distributive counterpartm, which then requires elementwise calculation of
(OLD WRONG argument )
We further show that logL2 permits a non-trivial limit-cycle near its local minimum, after freezing weights close to zero (DOF=41, figure 1)
In contrast, the L2 norm does not permit a such limit cycle and exhibits trivial oscillation in its parameters. ( figure 2,3, DOF=22 after freezing )
In summary, one should be cautious about applying L2 norm since it greatly distort the loss space near local optima and hence its dynamics. In general, I advocate modellers to assess the eliminatability of any given regulariser before their application.
推薦閱讀:
※pytorch實現capsule
※EdX-Columbia機器學習課第10講筆記:核方法與高斯過程
※基於模糊層次綜合評價法和聚類演算法的多人戰鬥競技類遊戲平衡性分析
※在數據為王的人工智慧時代如何收集機器學習數據
※KDD 2017 參會報告