On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Holger Boche, Haris Ceribasic, Ezra Tampubolon

Ngôn ngữ: eng

Ký hiệu phân loại: 003.54 Information theory

Thông tin xuất bản: 2020

Mô tả vật lý:

Bộ sưu tập: Metadata

ID: 165429

Comment: PreprintIn this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH