Two-stage reward allocation with decay for multi-agent coordinated behavior for sequential cooperative task by using deep reinforcement learning

Bibliographic Details
Title: Two-stage reward allocation with decay for multi-agent coordinated behavior for sequential cooperative task by using deep reinforcement learning
Authors: Yuki Miyashita, Toshiharu Sugawara
Source: Autonomous Intelligent Systems, Vol 2, Iss 1, Pp 1-18 (2022)
Publisher Information: Springer, 2022.
Publication Year: 2022
Collection: LCC:Electronic computers. Computer science
LCC:Computer engineering. Computer hardware
Subject Terms: Cooperation, Coordination, Divisional cooperation, Multi-agent deep reinforcement learning, Electronic computers. Computer science, QA75.5-76.95, Computer engineering. Computer hardware, TK7885-7895
More Details: Abstract We propose a two-stage reward allocation method with decay using an extension of replay memory to adapt this rewarding method for deep reinforcement learning (DRL), to generate coordinated behaviors for tasks that can be completed by executing a few subtasks sequentially by heterogeneous agents. An independent learner in cooperative multi-agent systems needs to learn its policies for effective execution of its own responsible subtask, as well as for coordinated behaviors under a certain coordination structure. Although the reward scheme is an issue for DRL, it is difficult to design it to learn both policies. Our proposed method attempts to generate these different behaviors in multi-agent DRL by dividing the timing of rewards into two stages and varying the ratio between them over time. By introducing the coordinated delivery and execution problem with an expiration time, where a task can be executed sequentially by two heterogeneous agents, we experimentally analyze the effect of using various ratios of the reward division in the two-stage allocations on the generated behaviors. The results demonstrate that the proposed method could improve the overall performance relative to those with the conventional one-time or fixed reward and can establish robust coordinated behavior.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 2730-616X
Relation: https://doaj.org/toc/2730-616X
DOI: 10.1007/s43684-022-00029-z
Access URL: https://doaj.org/article/6f38562147ca433b9b1f5be0d6eb883e
Accession Number: edsdoj.6f38562147ca433b9b1f5be0d6eb883e
Database: Directory of Open Access Journals
More Details
ISSN:2730616X
DOI:10.1007/s43684-022-00029-z
Published in:Autonomous Intelligent Systems
Language:English