Hyperbolic vision language representation learning on chest radiology images.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Linbin Han, Zhen Qian, Zhi Qiao, Jingxiang Wu, Hong Yang, Zuojing Zhang

Ngôn ngữ: eng

Ký hiệu phân loại:

Thông tin xuất bản: England : Health information science and systems , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 697394

Thêm vào giỏ Liên kết toàn văn

Given the visual-semantic hierarchy between images and texts, hyperbolic embeddings have been employed for visual-semantic representation learning, leveraging the advantages of hierarchy modeling in hyperbolic space. This approach demonstrates notable advantages in zero-shot learning tasks. However, unlike general image-text alignment tasks, textual data in the medical domain often comprises complex sentences describing various conditions or diseases, posing challenges for vision language models to comprehend free-text medical reports. Consequently, we propose a novel pretraining method specifically for medical image-text data in hyperbolic space. This method uses structured radiology reports, which consist of a set of triplets, and then converts these triplets into sentences through prompt engineering. To address the challenge that diseases or symptoms generally occur in local regions, we introduce a global + local image feature extraction module. By leveraging the hierarchy modeling advantages of hyperbolic space, we employ entailment loss to model the partial order relationship between images and texts. Experimental results show that our method exhibits better generalization and superior performance compared to baseline methods in various zero-shot tasks and different datasets.

Tạo bộ sưu tập với mã QR