BACKGROUND AND OBJECTIVE: Integrating multimodal data, such as pathology images and genomics, is crucial for understanding cancer heterogeneity, personalized treatment complexity, and enhancing survival prediction. However, most current prognostic methods are limited to a single domain of histopathology or genomics, inevitably reducing their potential for accurate patient outcome prediction. Despite advancements in the concurrent analysis of pathology and genomic data, existing approaches inadequately address the intricate intermodal relationships. METHODS: This paper introduces the CPathomic method for multimodal data-based survival prediction. By leveraging whole slide pathology images to guide local pathological features, the method effectively mitigates significant intermodal differences through a cross-modal representational contrastive learning module. Furthermore, it facilitates interactive learning between different modalities through cross-modal and gated attention modules. RESULTS: The extensive experiments on five public TCGA datasets demonstrate that CPathomic framework effectively bridges modality gaps, consistently outperforming alternative multimodal survival prediction methods. CONCLUSION: The model we propose, CPathomic, unveils the potential of contrastive learning and cross-modal attention in the representation and fusion of multimodal data, enhancing the performance of patient survival prediction.