Kaggle比賽的終極武器: 模型融合(Model Ensemble)
「如果你沒有什麼好的思路的話,那麼就模型融合吧!」
模型融合針是一種非常有效的技術,它可以明顯提升ML任務的表現成績。通過把多個單模型融合在一起,能夠降低bias,variance,控制Overfitting,提高準確率。
下面這篇文解釋了為什麼Ensemble能夠起到這些作用,還介紹了幾種常用的Ensemble的方法: (weighted)vote, averaging, stacking,blending。
Model ensembling is a very powerful technique to increase accuracy on a variety of ML tasks. In this article I will share my ensembling approaches for Kaggle Competitions.
For the first part we look at creating ensembles from submission files. The second part will look at creating ensembles through stacked generalization/blending.
第一部分,我們對預測結果的文件進行ensemble;第二部分,我們通過stacked generalization、blending等方法來實現ensemble。
This is how you win ML competitions: you take other peoples』 work and ensemble them together.」Vitaly Kuznetsov NIPS2014
1. Creating ensembles from submission files
The most basic and convenient way to ensemble is to ensemble Kaggle submission CSV files. You only need the predictions on the test set for these methods — no need to retrain a model. This makes it a quick way to ensemble already existing model predictions, ideal when teaming up.
2.StackedGeneralization & Blending
Averagingprediction files is nice and easy, but it』s not the only method that the top Kagglers areusing. The serious gains start with stacking and blending. Hold on to yourtop-hats and petticoats: Here be dragons. With 7 heads. Standing on top of 30other dragons.
原文鏈接:https://mlwave.com/kaggle-ensembling-guide/
關注微信公眾號:kaggle數據分析,後台回復「ensemble」可獲取文章和代碼。
推薦閱讀:
※一篇文章看懂數據挖掘,大數據,機器學習
※【機器學習】如何做出一個更好的Machine Learning預測模型
※R之方差分析
※玩點好玩的--使用馬爾可夫模型自動生成文章