Exploring Fine-Grained Visual-Text Feature Alignment With Prompt Tuning for Domain-Adaptive Object Detection.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Jinhai Liu, Zhitao Wen, Huaguang Zhang, Fengyuan Zuo

Ngôn ngữ: eng

Ký hiệu phân loại: 363.1063 Public safety programs

Thông tin xuất bản: United States : IEEE transactions on cybernetics , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 747577

 Domain-adaptive object detection (DAOD) aims to generalize detectors trained in labeled source domains to unlabeled target domains by mitigating domain bias. Recent studies have confirmed that pretrained vision-language models (VLMs) are promising tools to enhance the generalizability of detectors. However, there exist paradigm discrepancies between single-domain detection in most existing works and DAOD tasks, which may hinder the fine-grained alignment of cross-domain visual-text features. In addition, some preliminary solutions to these discrepancies may potentially neglect relational reasoning in prompts and cross-modal information interactions, which are crucial for fine-grained alignment. To this end, this article explores fine-grained visual-text feature alignment in DAOD with prompt tuning and organizes a novel framework termed FGPro that contains three elaborated levels. First, at the prompt level, a learnable domain-adaptive prompt is organized and a prompt relation encoder is constructed to infer intertoken semantic relations in the prompt. At the model level, a bidirectional cross-modal attention is structured to fully interact visual and textual fine-grained information. In addition, we customize a prompt-guided cross-domain regularization strategy to inject domain-invariant and domain-specific information into prompts in a disentangled manner. The three designs effectively align the fine-grained visual-text features of the source-target domain to facilitate the capture of domain-aware information. Experiments on four cross-domain scenarios show that FGPro exhibits notable performance improvements over existing work (Cross-weather: +1.0% AP50
  Simulation-to-real: +1.2% AP50
  Cross-camera: +1.3% AP50
  Industry: +2.8% AP50), validating the effectiveness of its fine-grained alignment.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH