Selective Reviews of Bandit Problems in AI via a Statistical View

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Haoyu Wei, Huiming Zhang, Pengjie Zhou

Ngôn ngữ: eng

Ký hiệu phân loại: 001.434 Experimental method

Thông tin xuất bản: 2024

Mô tả vật lý:

Bộ sưu tập: Metadata

ID: 205172

Comment: 52 pages, 5 figuresReinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration-exploitation trade-offs. Additionally, we explore K-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH