Enhancing Food Image Recognition by Multi-Level Fusion and the Attention Mechanism.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Zengzheng Chen, Jianxin Wang, Yeru Wang

Ngôn ngữ: eng

Ký hiệu phân loại: 070.48346 Journalism

Thông tin xuất bản: Switzerland : Foods (Basel, Switzerland) , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 76451

Thêm vào giỏ Liên kết toàn văn

As a pivotal area of research in the field of computer vision, the technology for food identification has become indispensable across diverse domains including dietary nutrition monitoring, intelligent service provision in restaurants, and ensuring quality control within the food industry. However, recognizing food images falls within the domain of Fine-Grained Visual Classification (FGVC), which presents challenges such as inter-class similarity, intra-class variability, and the complexity of capturing intricate local features. Researchers have primarily focused on deep information in deep convolutional neural networks for fine-grained visual classification, often neglecting shallow and detailed information. Taking these factors into account, we propose a Multi-level Attention Feature Fusion Network (MAF-Net). Specifically, we use feature maps generated by the Convolutional Neural Networks (CNNs) backbone network at different stages as inputs. We apply a self-attention mechanism to identify local features on these feature maps and then stack them together. The feature vectors obtained through the attention mechanism are then integrated with the original input to enhance data augmentation. Simultaneously, to capture as many local features as possible, we encourage multi-scale features to concentrate on distinct local regions at each stage by maximizing the Kullback-Leibler Divergence (KL-divergence) between the different stages. Additionally, we present a novel approach called subclass center loss (SCloss) to implement label smoothing, minimize intra-class feature distribution differences, and enhance the model's generalization capability. Experiments conducted on three food image datasets-CETH Food-101, Vireo Food-172, and UEC Food-100-demonstrated the superiority of the proposed model. The model achieved Top-1 accuracies of 90.22%, 89.86%, and 90.61% on CETH Food-101, Vireo Food-172, and UEC Food-100, respectively. Notably, our method not only outperformed other methods in terms of the Top-5 accuracy of Vireo Food-172 but also achieved the highest performance in the Top-1 accuracies of UEC Food-100.

Tạo bộ sưu tập với mã QR