1. 一些知友對於深度學習的實際應用持悲觀態度 a. 我開始機器視覺的研究是2011年,那時候我根本想不到,計算機視覺會發展到現在的水平。科技是加速發展的,人的思維卻喜歡線性插值。 b. 現在深度學習的投資那麼火,投資人不是傻子,不能實際應用創造價值,沒有人願意拿自己的錢打水漂。敬請期待深度學習的百億獨角獸們瘋狂奔跑吧。
不知道如何評價,對我來說這類工作都是"black magic": no one knows why, no one can explain, no exact strategy, it just works. 隨便一個工作就是幾百個node,每個node的setting還都不一樣,再用一堆trick來train,誰知道為啥這樣效果會好...反正就是效果很好就是了... 我也不知道很多工作里是如何得出來那麼fancy的CNN結構的,難不成真是靠枚舉?
There are several new ImageNet results floating around that beat my 5.1% error rate on ImageNet. Most recently an interesting paper from Google that uses "batch normalization". I wanted to make a few comments regarding "surpassing human-level accuracy". The most critical one is this:
Human accuracy is not a point. It lives on tradeoff curve.
Estimating the lower bound error 5.1% is an approximate upper bound on human error, achieved by a relatively dedicated labeler who trained on 500 images and then evaluated on 1500. It is interesting to go further and estimate the lower bound on human error. We can do this approximately since I have broken down my errors based on categories, some of which I feel are fixable (by more training, or more expert knowledge of dogs, etc.), and some which I believe to be relatively insurmountable (e.g. multiple correct answers per image, or incorrect ground truth label).
In detail, the my human error types were: 1. Multiple correct objects in the image (12 mistakes) 2. Clearly incorrect label ground truth (5 mistakes) 3. Fine-grained recognition error (28 mistakes) 4. Class unawareness error (18 mistakes) 5. Insufficient training data (4 mistakes) 6. Unsorted/misc category (9 mistakes)
For a total of 76 mistakes, giving 76/1500 ~= 0.051 error. From these, I would argue that 1. and 2. are near insurmountable, while the rest could be further reduced by fine-grained experts (3.) and longer training period (4., 5.). For an optimistic lower bound, we could drop these errors down to 76 - 28 - 18 - 4 = 26, giving 26/1500 ~= 1.7% error, or even 1.1% if we drop all of (6.).
In conclusion When you read the "surpassing-human" headlines, we should all keep in mind that human accuracy is not a point - it"s a tradeoff curve. We trade off human effort and expertise with the error rate: I am one point on that curve with 5.1%. My labmates with almost no training are another point, with even up to 15% error. And based on the above hypothetical calculations, it"s not unreasonable to suggest that a group of very dedicated humans might push this down to 2% or so.
That being said, I"m very impressed with how quickly multiple groups have improved from 6.6% down to ~5% and now also below! I did not expect to see such rapid progress. It seems that we"re now surpassing a dedicated human labeler. And imo, when we are down to 3%, we"d matching the performance of ahypothetical super-dedicated fine-grained expert human ensemble of labelers.
PReLU和初始化W的方法都有很強的motivation。黃勛同學的迴文還提到了PReLU的相關文章」Object recognition with hierarchical discriminant saliency networks"。 而ReLU有sparsity的作用,但在運算能力允許,弱化sparsity的regularization能力,以獲得性能更好的模型,很合理啊。 用variance作為依據初始化W的方法,能減輕參數初始化工作的複雜度,很有價值的。而且原文寫得很謙遜,包括初始化W那段,作者甚至說明了它並不是提升性能的關鍵,vgg的方法也是可行的。
原論文用「surpassing「也無可厚非啊,Andrej搞了個5.1%的」an approximate upper bound on human error「不就是讓人去競爭的嗎, 打敗了他定的線還不讓人用個"surpassing"啊? 又沒說打敗了這條線就是打敗了全人類。現在Andrey有個了個3%和1.1%,大家繼續提升就是了,今年競賽應該會有更亮眼的結果出來。而且現在的4.94%和google的4.8%都是top5的錯誤率,單模型的top-1錯誤率大約是~20%, 目測還有不少提升空間,這個數據集還不至於已經到了overfit的程度。