Feature refinement and rethinking attention for remote sensing image captioning.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Yunpeng Li, Meng Liu, Chengjin Tao, Dabao Wang, Guanchun Wang, Tianyang Zhang, Xiangrong Zhang, Dong Zhao

Ngôn ngữ: eng

Ký hiệu phân loại: 306.47 *Art

Thông tin xuất bản: England : Scientific reports , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 711685

Thêm vào giỏ Liên kết toàn văn

Effectively recognizing different regions of interest with attention mechanisms plays an important role in remote sensing image captioning task. However, these attention-driven models implicitly hypothesize that the focused region information is correct, which is too restrictive. Furthermore, the visual feature extractors will fail when facing weak correlation between objects. To address these issues, we propose a feature refinement and rethinking attention framework. Specifically, we firstly construct a feature refinement module by interacting grid-level features using refinement gate. It is noticeable that the irrelevant visual features from remote sensing images are weakened. Moreover, different from one attentive vector for inferring one word, the rethinking attention with rethinking LSTM layer is developed to spontaneously focus on different regions, when rethinking confidence is desirable. Thus, there are more than one region for predicting one word. Besides, the confidence rectification strategy is adopted to model rethinking attention for learn strongly discriminative contextual representation. We validate the designed framework on four datasets (i.e., NWPU-Captions, RSICD, UCM-Captions and Sydney-Captions). Extensive experiments show that our approach have superior performance and achieved significant improvements on the NWPU-Captions dataset.

Tạo bộ sưu tập với mã QR