著者
Will Dabney Zebulun Lloyd Kurth-Nelson Naoshige Uchida Clara K Starkweather Demis Hassabis Rémi Munos Matthew Botvinick
雑誌
第43回日本神経科学大会
巻号頁・発行日
2020-06-16

Since its introduction, the reward prediction error (RPE) theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. In the present work, we propose a novel account of dopamine-based reinforcement learning. Inspired by recent artificial intelligence research on distributional reinforcement learning, we hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea leads immediately to a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.The RPE theory of dopamine derives from work in the artificial intelligence (AI) field of reinforcement learning (RL). Since the link to neuroscience was first made, however, RL has made substantial advances, revealing factors that radically enhance the effectiveness of RL algorithms. In some cases, the relevant mechanisms invite comparison with neural function, suggesting new hypotheses concerning reward-based learning in the brain. Here, we examine one particularly promising recent development in AI research and investigate its potential neural correlates. Specifically, we consider a computational framework referred to as distributional reinforcement learning.