支持向量機SMO演算法內循環的一個問題？

01-30

內循環中
def innerL(i, oS):
Ei = calcEk(oS, i)
if ((oS.labelMat[i]*Ei &< -oS.tol) and (oS.alphas[i] &< oS.C)) or ((oS.labelMat[i]*Ei &> oS.tol) and (oS.alphas[i] &> 0)):
j,Ej = selectJ(i, oS, Ei) #this has been changed from selectJrand

alphaIold = oS.alphas[i].copy(); alphaJold = oS.alphas[j].copy();
if (oS.labelMat[i] != oS.labelMat[j]):
L = max(0, oS.alphas[j] - oS.alphas[i])
H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i])
else:
L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C)
H = min(oS.C, oS.alphas[j] + oS.alphas[i])
if L==H: print "L==H"; return 0
eta = 2.0 * oS.K[i,j] - oS.K[i,i] - oS.K[j,j] #changed for kernel
if eta &>= 0: print "eta&>=0"; return 0

oS.alphas[j] -= oS.labelMat[j]*(Ei - Ej)/eta
oS.alphas[j] = clipAlpha(oS.alphas[j],H,L)
updateEk(oS, j) #added this for the Ecache
if (abs(oS.alphas[j] - alphaJold) &< 0.00001): print "j not moving enough"; return 0
oS.alphas[i] += oS.labelMat[j]*oS.labelMat[i]*(alphaJold - oS.alphas[j])#update i by the same amount as j
updateEk(oS, i) #added this for the Ecache #the update is in the oppostie direction
b1 = oS.b - Ei- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.K[i,i] - oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.K[i,j]
b2 = oS.b - Ej- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.K[i,j]- oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.K[j,j]
if (0 &< oS.alphas[i]) and (oS.C &> oS.alphas[i]): oS.b = b1
elif (0 &< oS.alphas[j]) and (oS.C &> oS.alphas[j]): oS.b = b2

else: oS.b = (b1 + b2)/2.0
return 1
else: return 0
if ((oS.labelMat[i]*Ei &< -oS.tol) and (oS.alphas[i] &< oS.C)) or ((oS.labelMat[i]*Ei &> oS.tol) and (oS.alphas[i] &> 0)):是什麼意思呢
我對演算法的C和容錯率不是很了解求大大幫忙

這裡想說雖然這篇文沒有多大價值，但我也是花了一些時間寫的，收藏了卻一個贊或感謝都不給的確讓人有點傷心啊....

---------------------------------

恰好最近也在自己實現SVM，就搜到了這個問題...

這裡的C是regularization parameter.有時候數據是線性可分的，但是數據雜訊，有一些偏離正常位置很遠的數據點。這會對SVM會有很大的影響。

於是需要加入一個slack variable：

目標函數就變成：

而與之對應的KKT條件就是：

yi*f(i) &>= 1 and alpha = 0 (outside the boundary)

yi*f(i) == 1 and 0&yi*f(i) &<= 1 and alpha = C (between the boundary)

這裡的yi是訓練樣本的標籤（1or-1），f(i)是SVM的預測值。

C不是一個變數，是你去指定的，C如果取得比較大，分錯的點就比較少，但是會overfitting.

至於這裡的容錯率，是由於KKT條件本身是比較苛刻的，所以也需要設定一個容忍值，只要在誤差在容忍值範圍內則優化就可以結束。

具體到這裡的if語句：

if ((oS.labelMat[i]*Ei &< -oS.tol) and (oS.alphas[i] &< oS.C)) or ((oS.labelMat[i]*Ei &> oS.tol) and (oS.alphas[i] &> 0)):

是因為在選擇拉格朗日乘子的時候，優先選擇樣本前面係數0&這裡再解釋一下，引用John C. Platt的文章Sequential Minimal Optimization:
A Fast Algorithm for Training Support Vector Machines，就是SMO的原作者。

原文鏈接如下：

microsoft.com 的頁面

There are two separate choice heuristics: one for the first Lagrange multiplier and one for the
second. The choice of the first heuristic provides the outer loop of the SMO algorithm. The outer
loop first iterates over the entire training set, determining whether each example violates the KKT
conditions (12). If an example violates the KKT conditions, it is then eligible for optimization.
After one pass through the entire training set, the outer loop iterates over all examples whose
Lagrange multipliers are neither 0 nor C (the non-bound examples). Again, each example is
checked against the KKT conditions and violating examples are eligible for optimization. The
outer loop makes repeated passes over the non-bound examples until all of the non-bound
examples obey the KKT conditions within ε. The outer loop then goes back and iterates over the
entire training set. The outer loop keeps alternating between single passes over the entire training
set and multiple passes over the non-bound subset until the entire training set obeys the KKT
conditions within ε, where upon the algorithm terminates.

The first choice heuristic concentrates the CPU time on the examples that are most likely to
violate the KKT conditions: the non-bound subset. As the SMO algorithm progresses, examples
thatare at the bounds are likely to stay at the bounds, while examples that are not at the bounds
will move as other examples are optimized. The SMO algorithm will thus iterate over the non-bound
subset until that subset is self-consistent, then SMO will scan the entire data set to search
for any bound examples that have become KKT violated due to optimizing the non-bound subset.

所謂的非邊界樣本 non-bound example就是不等於邊界0或C的alpha值，在外層循環中優先選擇遍歷非邊界樣本，因為這些樣本更有可能需要調整（更有可能違反KKT條件），當其他樣本在演算法運行中得到優化的時候，非邊界樣本會移動，而邊界樣本（alpha值等於0或者C）常常留在邊界上，由於大部分的數據樣本都不可能是支持向量，因此對應的alpha乘子一旦取得0值後就無需再進行調整，因為該樣本對 $f(x)$ 沒有任何影響。

根據Osuna定理，對於任何兩個乘子，只要其中一個不滿足KKT條件，則我們總能通過雙變數二次優化使目標函數減小。SMO使用heuristic的方式尋找乘子對，尋找第一個alpha構成了SMO的外部循環。首先對整個訓練集合進行搜索，尋找不滿足KKT條件的樣例。

這裡的Ei是真實結果和預測結果的差，所以y[i](即這裡的LabelMat[i]）*Ei = y[i]*f(i) - y[i]^2 = y[i]*f(i) - 1

又C是一個我們指定的大於0的值，所以：

（1）if y[i]*Ei &< 0, so yi*f(i) &< 1, 且alpha &< C，則不滿足上述KKT條件；

（2）if y[i]*Ei &> 0, so yi*f(i) &> 1, if alpha &> 0，同樣不滿足KKT條件；

所以需要選擇不滿足KKT條件的alpha進行優化。

圖片截取自Standford的Statistical Learning的ppt。

@莫輕浮。

你好，非常感謝樓主精彩的回答，首先，C（即oS.tol）？？我相信這是題主的筆誤。題主這裡做了非常好的，if y[i]*Ei &< 0 和 &>0的分析，但&<-tol ,&>tol呢？我這裡做了一個小小的分析，發現還是有點問題，希望能得到你的解答，先謝過了。下面是我的思路：