Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters

Alberto Maria Metelli*, Amarildo Likmeta*, and Marcello Restelli

Advances in Neural Information Processing Systems 32 (NeurIPS), 2019.

Acceptance rate: 428/6743 (21.2%)
CORE 2018: A*   GGS 2018: A++

Abstract
How does the uncertainty of the value function propagate when performing temporal difference learning? In this paper, we address this question by proposing a Bayesian framework in which we employ approximate posterior distributions to model the uncertainty of the value function and Wasserstein barycenters to propagate it across state-action pairs. Leveraging on these tools, we present an algorithm, Wasserstein Q-Learning (WQL), starting in the tabular case and then, we show how it can be extended to deal with continuous domains. Furthermore, we prove that, under mild assumptions, a slight variation of WQL enjoys desirable theoretical properties in the tabular setting. Finally, we present an experimental campaign to show the effectiveness of WQL on finite problems, compared to several RL algorithms, some of which are specifically designed for exploration, along with some preliminary results on Atari games.

[Link] [Poster] [Code] [BibTeX]

 @inproceedings{metelli2019propagating,
    author = "Metelli*, Alberto Maria and Likmeta*, Amarildo and Restelli, Marcello",
    title = "Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters",
    booktitle = "Advances in Neural Information Processing Systems 32 ({NeurIPS})",
    year = "2019",
    pages = "4335--4347",
    url = "https://papers.nips.cc/paper/8685-propagating-uncertainty-in-reinforcement-learning-via-wasserstein-barycenters"
}