Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Pengfei Li, Yuezhou Wu, Siling Zhang

Ngôn ngữ: eng

Ký hiệu phân loại:

Thông tin xuất bản: England : Scientific reports , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 712165

Thêm vào giỏ Liên kết toàn văn

With the widespread adoption of interactive machine applications, Emotion Recognition in Conversations (ERC) technology has garnered increasing attention. Although existing methods have improved recognition accuracy by integrating structured data, language barriers and the scarcity of non-English resources limit their cross-lingual applications. In light of this, the MERC-PLTAF method proposed in this paper innovatively focuses on multimodal emotion recognition in conversations, aiming to overcome the limitations of single modality and language barriers through refined feature extraction and a sophisticated cross-fusion strategy. We conducted extensive validation on multiple English and Chinese datasets, and the experimental results demonstrate that this method not only significantly improves emotion recognition accuracy but also exhibits exceptional performance on the Chinese M3ED dataset, paving a new path for cross-lingual emotion recognition. This research not only advances the boundaries of emotion recognition technology but also lays a solid theoretical foundation and practical framework for creating more intelligent and human-centric interactive experiences.

Tạo bộ sưu tập với mã QR