MAD Chairs: A new tool to evaluate AI

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Christopher M Homan, Chris Santos-Lang

Ngôn ngữ: eng

Ký hiệu phân loại: 749.3 Specific kinds of furniture

Thông tin xuất bản: 2025

Mô tả vật lý:

Bộ sưu tập: Metadata

ID: 226780

Comment: 16 pages, 3 figures, accepted at https://coin-workshop.github.io/coine-2025-detroit/This paper presents a new contribution to the problem of AI evaluation. Much as one might evaluate a machine in terms of its performance at chess, this approach involves evaluating a machine in terms of its performance at a game called "MAD Chairs." At the time of writing, evaluation with this game exposed opportunities to improve Claude, Gemini, ChatGPT, Qwen and DeepSeek. Furthermore, this paper sets a stage for future innovation in game theory and AI safety by providing an example of success with non-standard approaches to each: studying a game beyond the scope of previous game theoretic tools and mitigating a serious AI safety risk in a way that requires neither determination of values nor their enforcement.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH