|
Research on Gait Switching Method Based on Speed Requirement
Weijun Tian, Kuiyue Zhou, Jian Song, Xu Li, Zhu Chen, Ziteng Sheng, Ruizhi Wang, Jiang Lei & Qian Cong
Journal of Bionic Engineering. 2024, 21 (6):
2817-2829.
DOI: 10.1007/s42235-024-00589-1
Real-time gait switching of quadruped robot with speed change is a difficult problem in the field of robot research. It is a novel solution to apply reinforcement learning method to the quadruped robot problem. In this paper, a quadruped robot simulation platform is built based on Robot Operating System (ROS). openai-gym is used as the RL framework, and Proximal Policy Optimization (PPO) algorithm is used for quadruped robot gait switching. The training task is to train different gait parameters according to different speed input, including gait type, gait cycle, gait offset, and gait interval. Then, the trained gait parameters are used as the input of the Model Predictive Control (MPC) controller, and the joint forces/torques are calculated by the MPC controller.The calculated joint forces are transmitted to the joint motor of the quadruped robot to control the joint rotation, and the gait switching of the quadruped robot under different speeds is realized. Thus, it can more realistically imitate the gait transformation of animals, walking at very low speed, trotting at medium speed and galloping at high speed. In this paper, a variety of factors affecting the gait training of quadruped robot are integrated, and many aspects of reward constraints are used, including velocity reward, time reward,energy reward and balance reward. Different weights are given to each reward, and the instant reward at each step of system training is obtained by multiplying each reward with its own weight, which ensures the reliability of training results. At the same time, multiple groups of comparative analysis simulation experiments are carried out. The results show that the priority of balance reward, velocity reward, energy reward and time reward decreases successively and the weight of each reward does not exceed 0.5.When the policy network and the value network are designed, a three-layer neural network is used, the number of neurons in each layer is 64 and the discount factor is 0.99, the training effect is better.
Related Articles |
Metrics
|