Recently, deep learning models have demonstrated impressive performance in Automatic Joint Lesion Detection (AJLD), yet balancing accuracy and efficiency remains a significant challenge. This paper focuses on achieving end-to-end lesion detection while improving accuracy to meet clinical requirements. To enhance the overall performance of AJLD, we propose novel modules: Local Attention Feature Fusion (LAFF) and Gaussian Positional Encoding (GPE). These modules are extensively integrated into YOLO, resulting in an improved YOLO model by enhancing Local Feature interaction, named YOLO