Augmented reality is a key technology that will facilitate a major paradigm shift in the way users interact with data and has only just recently been recognized as a viable solution for solving many critical needs. In practical terms, this innovation can be used to visualize data from hundreds of sensors simultaneously, overlaying relevant and actionable information over your environment through a headset. Semantic 3D reconstruction unlocks the promise of AR technology, possessing a far greater availability of semantic information. Although, there are several methods currently available as post-processing approaches to extract semantic information from the reconstructed 3D models, the results obtained results have been uncertain and evenly incorrect. Thus, it is necessary to explore or develop a novel 3D reconstruction approach to automatically recover 3D geometry model and obtained semantic information simultaneously. The rapid advent of deep learning brought new opportunities to the field of semantic 3D reconstruction from photo collections. Deep learning-based methods are not only able to extract semantic information but can also enhance fundamental techniques in semantic 3D reconstruction, techniques which include feature matching or tracking, stereo matching, camera pose estimation, and use of multi-view stereo methods. Moreover, deep learning techniques can be used to extract priors from photo collections, and this obtained information can in turn improve the quality of 3D reconstruction.