Paper Sharing Three - Automated Curriculum Learning by Rewarding Temporally Rare Events

Name

Automated Curriculum Learning by Rewarding Temporally Rare Events

Attachment

  1. Original: arxiv
  2. Code: Github(Will be published shortly)
  3. Video: 油管

Detail

Problem or Challenge:

Reward shaping are difficult to design manually, especially for complex RL tasks.

Assumptions or hypotheses:

  1. Domain knowledge: the specification of a set of positive pre-defined events.
  2. Sparse and/or delayed rewards.

Methods or Solutions:

  1. Rarity of Events(ROE): Rewarding a reinforcement learning agent by the rarity of experienced events such that rare events have a higher value than frequent events.
  2. The goal of this approach is to learn through a process of curiosity rather than optimizing towards a difficult pre-defined goal.
  3. Reward FunctionR_{t}(x) = sum_{i =1}^{left| x 
ight|}{x_{i}}frac{1}{max(mu_{x}(epsilon_{i}),	au)}

Experiment or Result:

  1. A2C VS A2C+RoE, These results show that the selected scenarios are indeed difficult using only extrinsic rewards as A2C learned a weak policy in three of the five scenarios.

2. The episodic mean occurrence of events:

3. Evaluation results averaged over 100 runs show that the A2C+RoE learned a policy that is more versatile, capable of using all the weapons in the map, which is why it can easily adapt.

4. Heat maps show that the A2C+RoE-policy has a more balanced distribution of locations on the map in the Deathmatch scenarios.

Limitation or Weakness:

For domains in which reward shaping is not necessary, i.e. the extrinsic reward smoothly leads to an optimal behavior, our approach is less well suited.

Summary

  1. Rarity of Events(RoE) determine reward based on the temporal rarity of pre-defined events.
  2. The approach is designed to work well in challenging environments that have a plethora of known events and sparse and/or delayed rewards.
  3. Novelty search and Rarity of Events have the ability to learn interesting behaviors without the need for a goal.
  4. Our approach is designed to learn a policy that can balance their occurrences which results in a more versatile behavior.

Reference

[1] N. Justesen, and S. Risi, 「Automated Curriculum Learning by Rewarding Temporally Rare Events,」 arXiv preprint arXiv:1803.07131, 2018.

[3] 標題圖片來源: Visual Doom AI Competition 2017

推薦閱讀:

TAG:人工智慧演算法 | 強化學習ReinforcementLearning | 深度學習DeepLearning |