平移不變的正則線性回歸
註:之前寫過一個系列的機器學習文章,討論了我對一些流行的機器學習模型在文章和教科書里很難找到的但是我認為又很重要的性質。由於放到了牆外的blog上,國內很難訪問。有朋友建議我搬到知乎上,所以就有了下面這篇試水文章。下面是正文。
In this post, I will discuss a seldom documented aspect or trick for one of the simplest model: linear regression. I will show how to make the solution of ridge regression translation invariant and what the meaning of the bias term is. Some people might find it obvious. However it is important for practical purpose. And later when we deal with non-linear models, this trick will not be obvious.
For a recap, the regularized linear regression is to fit a linear model by minimizing a regularized least square target
(1)
The solution is simply .
Actually, we want to fit a linear model with bias term . Most of the textbook will tell you that we dont have to worry about the bias. Since we can always augment the variables by adding an extra dimension as and . What the textbook missed, the bias term , is actually important when regularizer is involved. Lets take a better look at . We can equivalently rewrite the (1) as
When augmented variables are used, what (1) really does is to let . However, it actually make little sense to use non-zero . To see why, lets take a look at the solution for , we solve
and get
Here we define a pseudo version of the average notation
This result means that, before we see the data, we assume there are pseudo samples sitting at the origin 0. Why would one make such an assumption? In my opinion, there is absolutely no reason. So the conclusion is, never regularize bias.
If we want our solution invariant w.r.t. translation, we should let , which means we should minimize
Substituting , we have
where
The solution for then is
This solution means that we first centerize the data and , then regress the centerized data. In this way, we have a translation invariant solution for , since we always centerize our data first.
Here is some Matlab code to show the idea.
function [w, w0] = linReg(X, y, lambda)d = size(X,1);xbar = mean(X,2);ybar = mean(y,2);X = bsxfun(@minus,X,xbar);y = bsxfun(@minus,y,tbar);w = (X*X+lambda*eye(d))(X*y); w0 = ybar-dot(w,xbar);
推薦閱讀:
※極市分享|機器視覺技術在智能檢測產品研發過程中的應用研究
※十分種讀懂KNN
※機器學習篇-評估機器學習的模型
※EdX-Columbia機器學習課第5講筆記:貝葉斯線性回歸
※學Python,這10道題你一定得會
TAG:機器學習 |