This paper presents a novel image fusion method designed to enhance the integration of infrared and visible images through the use of a residual attention mechanism. The primary objective is to generate a fused image that effectively combines the thermal radiation information from infrared images with the detailed texture and background information from visible images. To achieve this, we propose a multi-level feature extraction and fusion framework that encodes both shallow and deep image features. In this framework, deep features are utilized as queries, while shallow features function as keys and values within a residual cross-attention module. This architecture enables a more refined fusion process by selectively attending to and integrating relevant information from different feature levels. Additionally, we introduce a dynamic feature preservation loss function to optimize the fusion process, ensuring the retention of critical details from both source images. Experimental results demonstrate that the proposed method outperforms existing fusion techniques across various quantitative metrics and delivers superior visual quality.