Model-free and model-based strategy for rats' action selection
演題番号 : P1-m25
船水 章大 / Akihiro Funamizu:1 伊藤 真 / Makoto Ito:2 銅谷 賢治 / Kenji Doya:2 神崎 亮平 / Ryohei Kanzaki:1,3 高橋 宏知 / Hirokazu Takahashi:1,3,4
1:東京大院・情報理工 / Grad Sch of Inf Sci and Tech, Univ of Tokyo, Tokyo 2:沖縄科学技術研究基盤整備機構 / Okinawa Inst of Sci and Tech, Okinawa 3:東大・先端研 / RCAST, Univ of Tokyo, Tokyo 4:JSTさきがけ / PRESTO-JST, Saitama
To investigate the roles of model-free and model based strategies in action learning, we analyzed rats' performance in a choice task consisting of random sequence of fixed-reward and variable-reward trials.
Rats were asked to nose-poke to a left or right hole, and received a reward stochastically depending on their choice. While the reward probability for each trial was fixed in fixed-reward trials, it was varied among 4 settings after 23 to 275 trials in variable-reward trials. Only in fixed-reward trials, a light stimulus was presented. During the choice task, devaluation trials were also conducted, in which rats never received a reward in successive 10 trials. In randomly selected 5 trials among the 10 trials, the light stimulus informing fixed-reward trials was presented, and no light stimulus was presented in the rest.
In variable-reward trials, rats changed the choices depending on the reward probabilities. In contrast, in fixed-reward trials, rats kept selecting the same choice for more than 80%. In devaluation trials, rats were significantly persistent to a same choice in light-stimulus trials compared to no-light-stimulus trials. This result suggests that choices were more flexible to the change of reward probability in varied-reward trials than in fixed-reward trials, implying an existence of multiple strategies for action selection.
We then analyzed rats' choice sequences by a model-free strategy, Q-learning with gradual updates of the reward expectation for each action, and a model-based strategy with estimation of reward-probability setting by a hidden Markov model. In variable-reward trials, rats' choice sequences tended to match the performance of model-based strategy. In contrast, in fixed-reward trials, choice sequences were significantly better fitted to model-free strategy than model-based strategy. These results suggest that rats can take model-free and model-based strategies depending on a task complexity.