Task-to-Instance Prompt Learning for Vision-Language Models at Test Time.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Jiawang Bai, Xin Li, Zhihe Lu, Xinchao Wang, Zeyu Xiao

Ngôn ngữ: eng

Ký hiệu phân loại: 794.147 King

Thông tin xuất bản: United States : IEEE transactions on image processing : a publication of the IEEE Signal Processing Society , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 707155

Thêm vào giỏ Liên kết toàn văn

Prompt learning has been recently introduced into the adaption of pre-trained vision-language models (VLMs) by tuning a set of trainable tokens to replace hand-crafted text templates. Despite the encouraging results achieved, existing methods largely rely on extra annotated data for training. In this paper, we investigate a more realistic scenario, where only the unlabeled test data is available. Existing test-time prompt learning methods often separately learn a prompt for each test sample. However, relying solely on a single sample heavily limits the performance of the learned prompts, as it neglects the task-level knowledge that can be gained from multiple samples. To that end, we propose a novel test-time prompt learning method of VLMs, called Task-to-Instance PromPt LEarning (TIPPLE), which adopts a two-stage training strategy to leverage both task- and instance-level knowledge. Specifically, we reformulate the effective online pseudo-labeling paradigm along with two tailored components: an auxiliary text classification task and a diversity regularization term, to serve the task-oriented prompt learning. After that, the learned task-level prompt is further combined with a tunable residual for each test sample to integrate with instance-level knowledge. We demonstrate the superior performance of TIPPLE on 15 downstream datasets, e.g., the average improvement of 1.87% over the state-of-the-art method, using ViT-B/16 visual backbone. Our code is open-sourced at https://github.com/zhiheLu/TIPPLE.

Tạo bộ sưu tập với mã QR