BACKGROUND: This study aims to develop a novel predictive model for determining human papillomavirus (HPV) presence in oropharyngeal cancer using computed tomography (CT). Current image-based HPV prediction methods are hindered by high computational demands or suboptimal performance. METHODS: To address these issues, we propose a methodology that employs a Siamese Neural Network architecture, integrating multi-modality off-the-shelf features-handcrafted features and 3D deep features-to enhance the representation of information. We assessed the incremental benefit of combining 3D deep features from various networks and introduced manufacturer normalization. Our method was also designed for computational efficiency, utilizing transfer learning and allowing for model execution on a single-CPU platform. A substantial dataset comprising 1453 valid samples was used as internal validation, a separate independent dataset for external validation. RESULTS: Our proposed model achieved superior performance compared to other methods, with an average area under the receiver operating characteristic curve (AUC) of 0.791 [95% (confidence interval, CI), 0.781-0.809], an average recall of 0.827 [95% CI, 0.798-0.858], and an average accuracy of 0.741 [95% CI, 0.730-0.752], indicating promise for clinical application. In the external validation, proposed method attained an AUC of 0.581 [95% CI, 0.560-0.603] and same network architecture with pure deep features achieved an AUC of 0.700 [95% CI, 0.682-0.717]. An ablation study confirmed the effectiveness of incorporating manufacturer normalization and the synergistic effect of combining different feature sets. CONCLUSION: Overall, our proposed model not only outperforms existing counterparts for HPV status prediction but is also computationally accessible for use on a single-CPU platform, which reduces resource requirements and enhances clinical usability.