標籤：

強化學習ReinforcementLearning 蒙特卡洛方法

強化學習（十一）：mcts

05-09

Introduction

Game Tree: use tree to represent the game
Minimax: a strategy to make decision based on game tree
Negamax: equivalent to Minimax
Alpha–beta pruning: a search algorithm that minimize the search space
Monte Carlo tree search: a heuristic search algorithm that converges to Minimax

Project

mcts is based on the project Reversi ( @林小囧 ).

reversi.py: the code of the Reversi game
search1.py: mcts with random policy as defaul policy
search2.py: mcts with custom policy as default policy
demo.py: search1.py vs search2.py

Summary

This article introduces mcts. The code is here.

Reference

Paper

A Survey of Monte Carlo Tree Search Methods
Monte-Carlo Tree Search
Mastering the Game of Go with Deep Neural Networks and Tree Search
Mastering the game of Go without human knowledge

Article

28 天自制你的 AlphaGo (6) : 蒙特卡洛樹搜索（MCTS）基礎
AlphaGo論文的譯文，用深度神經網路和樹搜索征服圍棋：Mastering the game of Go with deep neural networks and tree search
DeepMind新一代圍棋程序AlphaGo Zero再次登上Nature
DeepMind 研發的圍棋 AI AlphaGo 是如何下棋的？
深度解讀 AlphaGo 演算法原理

Code

Python Code
Introduction to Monte Carlo Tree Search

推薦閱讀：

※學習蒙特卡羅方法必須預先學習哪些知識？
※一份數學小白也能讀懂的「馬爾可夫鏈蒙特卡洛方法」入門指南
※蒙特卡洛樹搜索 MCTS 入門
※離散型隨機變數的模擬－逆變換法
※如何用一個1-8隨機生成器製作一個1-7隨機數生成器？

TAG:強化學習ReinforcementLearning | 蒙特卡洛方法 |