強化學習(十一):mcts

Introduction

  • Game Tree: use tree to represent the game
  • Minimax: a strategy to make decision based on game tree
  • Negamax: equivalent to Minimax
  • Alpha–beta pruning: a search algorithm that minimize the search space
  • Monte Carlo tree search: a heuristic search algorithm that converges to Minimax

Project

mcts is based on the project Reversi ( @林小囧 ).

  • reversi.py: the code of the Reversi game
  • search1.py: mcts with random policy as defaul policy
  • search2.py: mcts with custom policy as default policy
  • demo.py: search1.py vs search2.py

Summary

This article introduces mcts. The code is here.

Reference

Paper

  • A Survey of Monte Carlo Tree Search Methods
  • Monte-Carlo Tree Search
  • Mastering the Game of Go with Deep Neural Networks and Tree Search
  • Mastering the game of Go without human knowledge

Article

  • 28 天自制你的 AlphaGo (6) : 蒙特卡洛樹搜索(MCTS)基礎
  • AlphaGo論文的譯文,用深度神經網路和樹搜索征服圍棋:Mastering the game of Go with deep neural networks and tree search
  • DeepMind新一代圍棋程序AlphaGo Zero再次登上Nature
  • DeepMind 研發的圍棋 AI AlphaGo 是如何下棋的?
  • 深度解讀 AlphaGo 演算法原理

Code

  • Python Code
  • Introduction to Monte Carlo Tree Search

推薦閱讀:

學習 蒙特卡羅方法 必須預先學習哪些知識?
一份數學小白也能讀懂的「馬爾可夫鏈蒙特卡洛方法」入門指南
蒙特卡洛樹搜索 MCTS 入門
離散型隨機變數的模擬-逆變換法
如何用一個1-8隨機生成器製作一個1-7隨機數生成器?

TAG:強化學習ReinforcementLearning | 蒙特卡洛方法 |