This study aims to address the diagnostic challenges in distinguishing gastric polyps from protrusions, emphasizing the need for accurate and cost-effective diagnosis strategies. It explores the application of Convolutional Neural Networks (CNNs) to improve diagnostic accuracy. This research introduces MultiAttentiveScopeNet, a deep learning model that incorporates multi-layer feature ensemble and attention mechanisms to enhance gastroscopy image analysis accuracy. A weakly supervised labeling strategy was employed to construct a large multi-class gastroscopy image dataset for training and validation. MultiAttentiveScopeNet demonstrates significant improvements in prediction accuracy and interpretability. The integrated attention mechanism effectively identifies critical areas in images to aid clinical decisions. Its multi-layer feature ensemble enables robust analysis of complex gastroscopy images. Comparative testing against human experts shows exceptional diagnostic performance, with accuracy, micro and macro precision, micro and macro recall, and micro and macro AUC reaching 0.9308, 0.9312, 0.9325, 0.9283, 0.9308, 0.9847 and 0.9853 respectively. This highlights its potential as an effective tool for primary healthcare settings. This study provides a comprehensive solution to address diagnostic challenges differentiating gastric polyps and protrusions. MultiAttentiveScopeNet improves accuracy and interpretability, demonstrating the potential of deep learning for gastroscopy image analysis. The constructed dataset facilitates continued model optimization and validation. The model shows promise in enhancing diagnostic outcomes in primary care.