Academic Journal

A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm

Bibliographic Details
Title:	A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm
Authors:	Zhengshun Fei, Yanping Wang, Jinglong Wang, Kangling Liu, Bingqiang Huang, Ping Tan
Source:	IET Cyber-systems and Robotics, Vol 4, Iss 3, Pp 175-188 (2022)
Publisher Information:	Wiley, 2022.
Publication Year:	2022
Collection:	LCC:Cybernetics LCC:Electronic computers. Computer science
Subject Terms:	asynchronous advantage actor‐critic (A3C), generalised advantage estimation (GAE), parallelisation, reinforcement learning, Cybernetics, Q300-390, Electronic computers. Computer science, QA75.5-76.95
More Details:	Abstract Asynchronous advantage actor‐critic (A3C) algorithm is a commonly used policy optimization algorithm in reinforcement learning, in which asynchronous is parallel interactive sampling and training, and advantage is a sampling multi‐step reward estimation method for computing weights. In order to address the problem of low efficiency and insufficient convergence caused by the traditional heuristic exploration of A3C algorithm in reinforcement learning, an improved A3C algorithm is proposed in this paper. In this algorithm, a noise network function, which updates the noise tensor in an explicit way is constructed to train the agent. Generalised advantage estimation (GAE) is also adopted to describe the dominance function. Finally, a new mean gradient parallelisation method is designed to update the parameters in both the primary and secondary networks by summing and averaging the gradients passed from all the sub‐processes to the main process. Simulation experiments were conducted in a gym environment using the PyTorch Agent Net (PTAN) advanced reinforcement learning library, and the results show that the method enables the agent to complete the learning training faster and its convergence during the training process is better. The improved A3C algorithm has a better performance than the original algorithm, which can provide new ideas for subsequent research on reinforcement learning algorithms.
Document Type:	article
File Description:	electronic resource
Language:	English
ISSN:	2631-6315
Relation:	https://doaj.org/toc/2631-6315
DOI:	10.1049/csy2.12059
Access URL:	https://doaj.org/article/aa080922baa346f999cf7aa32107eee0
Accession Number:	edsdoj.080922baa346f999cf7aa32107eee0
Database:	Directory of Open Access Journals

More Details
ISSN:	26316315
DOI:	10.1049/csy2.12059
Published in:	IET Cyber-systems and Robotics
Language:	English