Active Supervised Cross-Modal Retrieval.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Fan Qi, Shengsheng Qian, Changsheng Xu, Yang Yang, Huaiwen Zhang

Ngôn ngữ: eng

Ký hiệu phân loại:

Thông tin xuất bản: United States : IEEE transactions on pattern analysis and machine intelligence , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 703716

Thêm vào giỏ Liên kết toàn văn

Supervised Cross-Modal Retrieval (SCMR) achieves significant performance with the supervision provided by substantial label annotations of multi-modal data. However, the requirement for large annotated multi-modal datasets restricts the use of supervised cross-modal retrieval in many practical scenarios. Active Learning (AL) has been proposed to reduce labeling costs while improving performance in various label-dependent tasks, in which the most informative unlabeled samples are selected for labeling and training. Directly exploiting the existing AL methods for supervised cross-modal retrieval may not be a good idea since they only focus on the uncertainty within each modality, ignoring the inter-modality relationship within the text-image pairs. Furthermore, existing methods focus exclusively on the informativeness of data during sample selection, leading to a biased, homogenized set where selected samples often contain nearly identical semantics and are densely distributed in a region of the feature space. Persistent training with such biased data selections can disturb multi-modal representation learning and substantially degrade the retrieval performance of SCMR. In this work, we propose an Active Supervised Cross-Modal Retrieval (ASCMR) framework, which effectively identifies informative multi-modal samples and generates unbiased sample selections. In particular, we propose a probabilistic multi-modal informativeness estimation that captures both the intra-modality and inter-modality uncertainty of multi-modal pairs within a unified representation. To ensure unbiased sample selection, we introduce a density-aware budget allocation strategy that constrains the active learning objective of maximizing the informativeness of selection with a novel semantic density regularization term. The proposed methods are evaluated on three widely used benchmark datasets, MS-COCO, NUS-WIDE, and MIRFlickr, demonstrating our effectiveness in significantly reducing the annotation cost while outperforming other baselines of active learning strategies. We could achieve over 95% of the fully supervised model's performance by only utilizing 6%, 3%, and 4% active selected samples for MS-COCO, NUS-WIDE, and MIRFlickr, respectively. Our source code is available at https://github.com/openimmc/ascmr.

Tạo bộ sưu tập với mã QR