NeurIPS2020

R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Sergey Shuvaev, Sarah Starosta, Duda Kvitsiani, Ádám Kepecs, Alexei A. Koulakov

9 citations

Abstract

When should you continue with your ongoing plans and when should you instead decide to pursue better opportunities? We show in theory and experiment that such stay-or-leave decisions are consistent with deep R-learning both behaviorally and neuronally. Our results suggest that real-world agents leave depleting resources when their reward rate falls below its exponential average, which, we argue, is a Bayes optimal rule in dynamic natural environments. Our work links reinforcement learning, the marginal value theorem and Bayesian inference approaches to offer a learning algorithm and a decision rule for making sequential stay-or-leave choices.