A general TD-Q learning control approach for discrete-time Markov jump systems.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Xiaoli Luan, Peng Shi, Jiwei Wen, Huiwen Xue

Ngôn ngữ: eng

Ký hiệu phân loại: 613.715 Aerobic dancing

Thông tin xuất bản: United States : ISA transactions , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 696019

Thêm vào giỏ Liên kết toàn văn

This paper develops a novel temporal difference Q (TD-Q) learning approach, designed to address the robust control challenge in discrete-time Markov jump systems (MJSs) which are characterized by entirely unknown dynamics and transition probabilities (TPs). The model-free TD-Q learning method is uniquely comprehensive, including two special cases: Q learning for MJSs with unknown dynamics, and TD learning for MJSs with undetermined TPs. We propose an innovative ternary policy iteration framework, which iteratively refines the control policies through a dynamic loop of alternating updates. This loop consists of three synergistic processes: firstly, aligning TD value functions with current policies
secondly, enhancing Q-function's matrix kernels (QFMKs) using these TD value functions
and thirdly, generating greedy policies based on the enhanced QFMKs. We demonstrate that, with a sufficient number of episodes, the TD value functions, QFMKs, and control policies converge optimally within this iterative loop. To illustrate efficiency of the developed approach, we introduce a numerical example that highlights its substantial benefits through a thorough comparison with current learning control methods for MJSs. Moreover, a structured population dynamics model for pests is utilized to validate the practical applicability.

Tạo bộ sưu tập với mã QR