Multi-robot hierarchical safe reinforcement learning autonomous decision-making strategy based on uniformly ultimate boundedness constraints.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Hui Jiang, Sen Qian, Huihui Sun, Changlin Wu, Long Zhang

Ngôn ngữ: eng

Ký hiệu phân loại: 631.5 Cultivation and harvesting

Thông tin xuất bản: England : Scientific reports , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 203648

Thêm vào giỏ Liên kết toàn văn

Deep reinforcement learning has exhibited exceptional capabilities in a variety of sequential decision-making problems, providing a standardized learning paradigm for the development of intelligent multi-robot systems. Nevertheless, when confronted with dynamic and unstructured environments, the security of decision-making strategies encounters serious challenges. The absence of security will leave multi-robot susceptible to unknown risks and potential physical damage. To tackle the safety challenges in autonomous decision-making of multi-robot systems, this manuscripts concentrates on a uniformly ultimately bounded constrained hierarchical safety reinforcement learning strategy (UBSRL). Initially, the approach innovatively proposes an event-triggered hierarchical safety reinforcement learning framework based on the constrained Markov decision process. The integrated framework achieves a harmonious advancement in both decision-making security and efficiency, facilitated by the seamless collaboration between the upper-tier evolutionary network and the lower-tier restoration network. Subsequently, by incorporating supplementary Lyapunov safety cost networks, a comprehensive strategy optimization mechanism that includes multiple safety cost constraints is devised, and the Lagrange multiplier principle is employed to address the challenge of identifying the optimal strategy. Finally, leveraging the principles of uniformly ultimate boundedness, the stability of the autonomous decision-making system is scrutinized. This analysis reveals that the action trajectories of multiple robots can be reverted to a safe space within a finite time frame from any perilous state, thereby theoretically substantiating the efficacy of the safety constraints embedded within the proposed strategy. Subsequent to exhaustive training and meticulous evaluation within a multitude of standardized scenarios, the outcomes indicate that the UBSRL strategy can effectively restricts the safety indicators to remain below the threshold, markedly enhancing the stability and task completion rate of the motion strategy.

Tạo bộ sưu tập với mã QR