著者
Vincent D Costa Ramon Bartolo Hua Tang Bruno B Averbeck
雑誌
第43回日本神経科学大会
巻号頁・発行日
2020-06-15

All organisms, from slime molds to humans, have to decide whether to forego immediate rewards, like food, in order to explore an unknown option and learn if it is better than something already experienced. This trade-off is referred to as the explore-exploit dilemma. To balance exploration and exploitation biological agents need to know when exploration is advantageous. An efficient strategy for managing explore-exploit tradeoffs is to predict the immediate and future outcomes of each available choice option. Predicting whether choices will be immediately rewarded or unrewarded is easily computed based on past experience. Predicting how often choices are rewarded or unrewarded in the future is a much more difficult computation, as it relies on prospection. Yet these predictions can be integrated to decide when exploration is advantageous. Given theoretical and lay beliefs that balancing exploration and exploitation is difficult, prior studies have focused on identifying cortical mechanisms of exploratory decision making, ignoring how subcortical motivational circuits aid in managing explore-exploit tradeoffs. Here, we leverage theoretical advances in the use of partially observable Markov decision process models to understand how reward uncertainty motivates exploration, in order to characterize neural activity in the amygdala, ventral striatum, orbitofrontal cortex, and dorsolateral prefrontal cortex of macaque monkeys as they solve a multi-arm bandit task designed to query specific aspects of novelty seeking and explore-exploit decision making. Our findings challenge the widely held corticocentric view of how the brain solves the explore-explore dilemma, by emphasizing similarities in how subcortical and cortical regions encode value computations critical for deciding when exploration is warranted.