PURPOSE: Depth estimation is a powerful tool for navigation in laparoscopic surgery. Previous methods utilize predicted depth maps and the relative poses of the camera to accomplish self-supervised depth estimation. However, the smooth surfaces of organs with textureless regions and the laparoscope's complex rotations make depth and pose estimation difficult in laparoscopic scenes. Therefore, we propose a novel and effective self-supervised monocular depth estimation method with self-attention-guided pose estimation and a joint depth-pose loss function for laparoscopic images. METHODS: We extract feature maps and calculate the minimum re-projection error as a feature-metric loss to establish constraints based on feature maps with more meaningful representations. Moreover, we introduce the self-attention block in the pose estimation network to predict rotations and translations of the relative poses. In addition, we minimize the difference between predicted relative poses as the pose loss. We combine all of the losses as a joint depth-pose loss. RESULTS: The proposed method is extensively evaluated using SCARED and Hamlyn datasets. Quantitative results show that the proposed method achieves improvements of about 18.07 CONCLUSION: This study considers the characteristics of laparoscopic datasets and presents a simple yet effective self-supervised monocular depth estimation. We propose a joint depth-pose loss function based on the extracted feature for depth estimation on laparoscopic images guided by a self-attention block. The experimental results prove that all of the proposed components contribute to the proposed method. Furthermore, the proposed method strikes an efficient balance between computational efficiency and performance.