Dual-Layer Reinforcement Learning for Quadruped Robot Locomotion and Speed Control in Complex Environments

Bibliographic Details
Title:	Dual-Layer Reinforcement Learning for Quadruped Robot Locomotion and Speed Control in Complex Environments
Authors:	Yilin Zhang, Jiayu Zeng, Huimin Sun, Honglin Sun, Kenji Hashimoto
Source:	Applied Sciences, Vol 14, Iss 19, p 8697 (2024)
Publisher Information:	MDPI AG, 2024.
Publication Year:	2024
Collection:	LCC:Technology LCC:Engineering (General). Civil engineering (General) LCC:Biology (General) LCC:Physics LCC:Chemistry
Subject Terms:	walking robots, dual-layer reinforcement learning, proximal policy optimization, deep double Q-network, adaptive control, dynamic speed adjustment, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
More Details:	Walking robots have been widely applied in complex terrains due to their good terrain adaptability and trafficability. However, in some environments (such as disaster relief, field navigation, etc.), although a single strategy can adapt to various environments, it is unable to strike a balance between speed and stability. Existing control schemes like model predictive control (MPC) and traditional incremental control can manage certain environments. However, they often cannot balance speed and stability well. These methods usually rely on a single strategy and lack adaptability for dynamic adjustment to different terrains. To address this limitation, this paper proposes an innovative double-layer reinforcement learning algorithm. This algorithm combines Deep Double Q-Network (DDQN) and Proximal Policy Optimization (PPO), leveraging their complementary strengths to achieve both fast adaptation and high stability in complex terrains. This algorithm utilizes terrain information and the robot’s state as observations, determines the walking speed command of the quadruped robot Unitree Go1 through DDQN, and dynamically adjusts the current walking speed in complex terrains based on the robot action control system of PPO. The speed command serves as a crucial link between the robot’s perception and movement, guiding how fast the robot should walk depending on the environment and its internal state. By using DDQN, the algorithm ensures that the robot can set an appropriate speed based on what it observes, such as changes in terrain or obstacles. PPO then executes this speed, allowing the robot to navigate in real time over difficult or uneven surfaces, ensuring smooth and stable movement. Then, the proposed model is verified in detail in Isaac Gym. Wecompare the distances walked by the robot using six different control methods within 10 s. The experimental results indicate that the method proposed in this paper demonstrates excellent speed adjustment ability in complex terrains. On the designed test route, the quadruped robot Unitree Go1 can not only maintain a high walking speed but also maintain a high degree of stability when switching between different terrains. Ouralgorithm helps the robot walk 25.5 m in 10 s, outperforming other methods.
Document Type:	article
File Description:	electronic resource
Language:	English
ISSN:	2076-3417
Relation:	https://www.mdpi.com/2076-3417/14/19/8697; https://doaj.org/toc/2076-3417
DOI:	10.3390/app14198697
Access URL:	https://doaj.org/article/1a4b9cbe24b44fe58192c694a334b420
Accession Number:	edsdoj.1a4b9cbe24b44fe58192c694a334b420
Database:	Directory of Open Access Journals
Full text is not displayed to guests.	Login for full access.

More Details
ISSN:	20763417
DOI:	10.3390/app14198697
Published in:	Applied Sciences
Language:	English