不用人類知識成為圍棋大師：AlphaGo Zero 【附 AlphaGo Zero ，AlphaGo 論文下載】

01-24

人工智慧的長期目標是學習超過人類的演算法，在具有挑戰性領域。最近，AlphaGo 成為在 Go 遊戲中打敗世界冠軍的第一個計劃。該 AlphaGo 中的樹搜索使用深層神經網路評估位置和選擇的移動位置。這些神經網路是通過人類專家動作的監督學習進行訓練，並通過自我博弈中進行強化學習。在這裡我們介紹一種僅基於強化學習的演算法，沒有人類數據，遊戲以外的指導或領域知識規則。 AlphaGo 成為自己的老師：神經網路被訓練來預測 AlphaGo 自己的移動,選擇 AlphaGo 遊戲的獲勝者。這種神經網路提高了樹搜索的強度，導致了更高的質量在下一次迭代中移動選擇和更強的自我發揮。開始tabula rasa，我們的新程序 AlphaGo Zero實現了超人的表現，贏得了100-0對於之前發布的版本，即擊敗冠軍的AlphaGo。

AlphaGo Zero 論文下載

http://www.tensorflownews.com/wp-content/uploads/2017/10/nature24270.pdf

原文地址：https://www.nature.com/nature/journal/v550/n7676/pdf/nature24270.pdf

Mastering the game of Go with deep neural networks and tree search

AlphaGo 論文：https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency inchallenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. Thetree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks weretrained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introducean algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond gamerules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo』s own move selections and alsothe winner of AlphaGo』s games. This neural network improves the strength of the tree search, resulting in higher qualitymove selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

更多機器學習資源：http://www.tensorflownews.com/