強化學習(十一):mcts
05-09
Introduction
- Game Tree: use tree to represent the game
- Minimax: a strategy to make decision based on game tree
- Negamax: equivalent to Minimax
- Alpha–beta pruning: a search algorithm that minimize the search space
- Monte Carlo tree search: a heuristic search algorithm that converges to Minimax
Project
mcts is based on the project Reversi ( @林小囧 ).
- reversi.py: the code of the Reversi game
- search1.py: mcts with random policy as defaul policy
- search2.py: mcts with custom policy as default policy
- demo.py: search1.py vs search2.py
Summary
This article introduces mcts. The code is here.
Reference
Paper
- A Survey of Monte Carlo Tree Search Methods
- Monte-Carlo Tree Search
- Mastering the Game of Go with Deep Neural Networks and Tree Search
- Mastering the game of Go without human knowledge
Article
- 28 天自制你的 AlphaGo (6) : 蒙特卡洛樹搜索(MCTS)基礎
- AlphaGo論文的譯文,用深度神經網路和樹搜索征服圍棋:Mastering the game of Go with deep neural networks and tree search
- DeepMind新一代圍棋程序AlphaGo Zero再次登上Nature
- DeepMind 研發的圍棋 AI AlphaGo 是如何下棋的?
- 深度解讀 AlphaGo 演算法原理
Code
- Python Code
- Introduction to Monte Carlo Tree Search
推薦閱讀:
※學習 蒙特卡羅方法 必須預先學習哪些知識?
※一份數學小白也能讀懂的「馬爾可夫鏈蒙特卡洛方法」入門指南
※蒙特卡洛樹搜索 MCTS 入門
※離散型隨機變數的模擬-逆變換法
※如何用一個1-8隨機生成器製作一個1-7隨機數生成器?
TAG:強化學習ReinforcementLearning | 蒙特卡洛方法 |