Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained Visual Categorization.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Qi Bi, Wei Ji, Gui-Song Xia, Jingjun Yi, Haolan Zhan

Ngôn ngữ: eng

Ký hiệu phân loại: 133.594 Types or schools of astrology originating in or associated with a

Thông tin xuất bản: United States : IEEE transactions on image processing : a publication of the IEEE Signal Processing Society , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 745791

High-quality annotation of fine-grained visual categories demands great expert knowledge, which is taxing and time consuming. Alternatively, learning fine-grained visual representation from enormous unlabeled images (e.g., species, brands) by self-supervised learning becomes a feasible solution. However, recent investigations find that existing self-supervised learning methods are less qualified to represent fine-grained categories. The bottleneck lies in that the pre-trained class-agnostic representation is built from every patch-wise embedding, while fine-grained categories are only determined by several key patches of an image. In this paper, we propose a Cross-level Multi-instance Distillation (CMD) framework to tackle this challenge. Our key idea is to consider the importance of each image patch in determining the fine-grained representation by multiple instance learning. To comprehensively learn the relation between informative patches and fine-grained semantics, the multi-instance knowledge distillation is implemented on both the region/image crop pairs from the teacher and student net, and the region-image crops inside the teacher / student net, which we term as intra-level multi-instance distillation and inter-level multi-instance distillation. Extensive experiments on several commonly used datasets, including CUB-200-2011, Stanford Cars and FGVC Aircraft, demonstrate that the proposed method outperforms the contemporary methods by up to 10.14% and existing state-of-the-art self-supervised learning approaches by up to 19.78% on both top-1 accuracy and Rank-1 retrieval metric. Source code is available at https://github.com/BiQiWHU/CMD.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH