MOTIVATION: Spatial transcriptomic (ST) technologies, such as GeoMx Digital Spatial Profiler, are increasingly utilized to investigate the role of diverse tumor microenvironment components, particularly in relation to cancer progression, treatment response, and therapeutic resistance. However, in many ST studies, the spatial information obtained from immunofluorescence imaging is primarily used for identifying regions of interest (ROIs) rather than as an integral part of downstream transcriptomic data analysis and interpretation. RESULTS: We developed ROICellTrack, a deep learning-based framework that better integrates cellular imaging with spatial transcriptomic profiling. By analyzing 56 ROIs from urothelial carcinoma of the bladder and upper tract urothelial carcinoma, ROICellTrack identified distinct cancer-immune cell mixtures, characterized by specific transcriptomic and morphological signatures and receptor-ligand interactions linked to tumor content and immune infiltrations. Our findings demonstrate the value of integrating imaging with transcriptomics to analyze spatial omics data, improving our understanding of tumor heterogeneity and its relevance to personalized and targeted therapies. AVAILABILITY AND IMPLEMENTATION: ROICellTrack is publicly available at https://github.com/wanglab1/ROICellTrack.