Rethinking exploration-exploitation trade-off in reinforcement learning via cognitive consistency.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Lin Li, Jiye Liang, Da Wang, Xin Wang, Wei Wei

Ngôn ngữ: eng

Ký hiệu phân loại: 133.592 Types or schools of astrology originating in or associated with a

Thông tin xuất bản: United States : Neural networks : the official journal of the International Neural Network Society , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 718420

The exploration-exploitation dilemma is one of the fundamental challenges in deep reinforcement learning (RL). Agents must strike a trade-off between making decisions based on current beliefs or gathering more information. Prior work mostly prefers devising sophisticated exploration methods to ensure accurate target Q-values or learn rewards and actions association, which may not be intelligent enough for sample efficiency. In this paper, we propose to rethink the trade-off between exploration and exploitation from the perspective of cognitive consistency: humans tend to think and behave in line with their existing knowledge structures (maintaining cognitive consistency), yielding satisfactory results within a brief timeframe. We argue that maintaining consistency, specifically through pessimistic exploration, within the context of optimal policy-oriented cognition, can improve efficiency without compromising performance. To this end, we propose a Cognitive Consistency (CoCo) framework. CoCo first leverages a self-imitating distribution correction approach to pursue cognition oriented toward the optimal policy. Then, it conservatively implements pessimistic exploration by extracting novel inconsistency-minimization objectives inspired by label distribution learning. We validate our framework across various standard off-policy RL tasks and show that maintaining cognitive consistency improves sample efficiency and performance. Code is available at https://github.com/DkING-lv6/CoCo.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH