KOBV Portal

Hits per page

hit 1 - 1 | 1 hit

Select All Export

Online Resource

Transferring knowledge from human-demonstration trajectories to reinforcement learning

Wang, Guo-fang ; Fang, Zhou ; Li, Ping ; [et al.]

SAGE Publications ; 2018

In: Transactions of the Institute of Measurement and Control Vol. 40, No. 1 ( 2018-01), p. 94-101

add to watchlist on the watchlist

Details

In: Transactions of the Institute of Measurement and Control, SAGE Publications, Vol. 40, No. 1 ( 2018-01), p. 94-101

Abstract: Nowadays, transfer learning (TL) has become a crucial technique to accelerate the slow optimization procedure of reinforcement learning (RL) by re-utilizing knowledge acquired in a previous related task. Nevertheless, most of the current relevant research acquires knowledge through RL training in the source task, which would be too time-consuming. In view of this situation, in this paper, we propose a novel TL framework where the agent extracts knowledge from human-demonstration trajectories of the source task and reuses the knowledge in RL in the target task. As for what to transfer, two forms of knowledge deduced from the demonstration trajectories, which are the k-nearest neighbour of the current state in source samples and visit frequency of homologous states, are adopted. For how to transfer, the two forms of knowledge are respectively used to recommend a preferred action when random exploration is needed and to shape an instantaneous reward for RL. Simulation experiments of balancing Cart-Poles with different difficulties suggest that both the two forms of knowledge accelerate the learning process of RL obviously. What is more, the effect is even more significant when they are used in combination. In this case, the experimental results manifest the positive role of our framework in RL.

Type of Medium: Online Resource

ISSN: 0142-3312 , 1477-0369

URL: Article

DOI: 10.1177/0142331216649655

Language: English

Publisher: SAGE Publications

Publication Date: 2018

detail.hit.zdb_id: 2025882-3

SSG: 3,2