從微積分和線性代數角度看線性最小二乘原理

02-12

線性最小二乘

線性最小二乘（Linear least squares），簡稱為最小二乘（Least squares），是統計學中用來進行線性回歸分析，線性擬合最核心的方法。給定一些離散數據點，確定一條直線，使得該直線與這些點誤差的平方和最小，這就是最小二乘法。

Wikipedia：In statistics and mathematics, linear least squares is an approach fitting a mathematical or statistical model to data in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown parameters of the model. The resulting fitted model can be used to summarize the data, to predict unobserved values from the same system, and to understand the mechanisms that may underlie the system.

問題

Q：現在笛卡爾坐標系中給定三個離散數據點 $(1, 1), (2, 2), (2, 3)$ ，確定一條直線 $l$ ，使得直線 $l$ 與各數據點誤差的平方和最小。

從微積分角度

設這條直線函數 $[ y = kx + b ]$ ，則誤差的平方和函數為

$egin{align*} f(k, b) &= (k + b - 1)^2 + (2k + b - 2)^2 + (3k + b - 2)^2 \ &= 14k^2 + 3b^2 - 22k - 10b + 12kb + 9 end{align*}$

要是誤差的平方和最小，那麼就是求函數 $f(k, b)$ 的最小值！而 $f(k, b)$ 是關於 $k, b$ 的多元函數，多元函數的極值問題當然就是分別對 $k, b$ 求偏導咯。

$f_k(k, b) = frac{partial f(k, b)}{partial k} = 28k + 12b - 22 \ f_b(k, b) = frac{partial f(k, b)}{partial b} = 12b + 24k - 20$

令 $f_k(k, b) = 0, f_b(k, b) = 0$ ，則解得 $k = frac{1}{2}, b = frac{2}{3}$ ，則直線方程為 $y = frac{1}{2}x + frac{2}{3}$

從線性代數角度

我們仍然設直線 $l$ 方程為 $y = kx + b$ ，則要求直線 $l$ 擬合三個點 $(1, 1), (2, 2), (2, 3)$ 就是求以下線性方程組的解。

$egin{equation} left{ egin{aligned} k + b & = 1 \ 2k + b & = 2 \ 3k + b & = 2 end{aligned} ight. end{equation}$

其矩陣形式 $Ax = b$ 為：

$egin{equation} left[ egin{matrix} 1&1\ 2&1\ 3&1 end{matrix} ight] left[ egin{matrix} k\ b\ end{matrix} ight] = left[ egin{matrix} 1\ 2\ 2 end{matrix} ight] end{equation}$

顯然 $Ax = b$ 是無解，而確定 $k, b$ 的過程就是求線性方程 $Ax = b$ 最優解的過程，使得 $Ax$ 向量最靠近 $b$ 向量。即 $Arrowvert{Ax - b}Arrowvert$ 最小，而 $Arrowvert{Ax - b}Arrowvert$ 即為誤差 $e$ 。要求最優解則需將 $b$ 向量投影到 $A$ 矩陣的列空間中去，得到對應投影向量 $hat{b}$ ，從而線性方程變為 $A^TAhat{x} = A^Tb$ ，解得 $hat{x} = left[ egin{matrix} frac{1}{2}\ frac{2}{3}\ end{matrix} ight]$ 。因此 $k = frac{1}{2}, b = frac{2}{3}$ ，直線方程為 $y = frac{1}{2}x + frac{2}{3}$ 。