Multiple imaging modalities and specific proteins in the cerebrospinal fluid, providing a comprehensive understanding of neurodegenerative disorders, have been widely used for computer-aided diagnosis of Alzheimer's disease (AD). Given the proven effectiveness of contrastive learning in aligning multimodal representation, in this paper, we investigate effective contrastive learning strategies to learn better cross-modal representations for the integration of multi-modal complementary information. To enhance the overall performance in AD diagnosis, we construct a unified hybrid network that integrates feature learning and classifier learning into an end-to-end framework. Specifically, we propose a weighted multi-modal contrastive learning based on hybrid network (WMCL-HN) method. Firstly, an adaptive weighted strategy is implemented on the multi-modal contrastive learning to dynamically regulate the degree of information exchange across modalities. It assigns higher weights to more important modality pair, thus the most important underlying relationships across modalities can be captured. Secondly, we construct a hybrid network, which employs a curriculum learning strategy that gradually transitions the training from feature learning to classifier learning, ensuring that the learned features are tailored to the diagnostic task. Experimental results on ADNI dataset demonstrate the effectiveness of the proposed WMCL-HN in AD-related diagnosis tasks. The source code is available at https://github.com/pcehnago/WMCL-HN.