PURPOSE: To develop a deep learning method exploiting active learning and source-free domain adaptation for gross tumor volume delineation in nasopharyngeal carcinoma (NPC), addressing the variability and inaccuracy when deploying segmentation models in multicenter and multirater settings. METHODS AND MATERIALS: One thousand fifty-seven magnetic resonance imaging scans of patients with NPC from 5 hospitals were retrospectively collected and annotated by experts from the same medical group with consensus for multicenter adaptation evaluation. One data set was used for model development (source domain), with the remaining 4 for adaptation testing (target domains). Meanwhile, another set of 170 patients with NPC, with annotations delineated by 4 independent experts, was created for multirater adaptation evaluation. We evaluated the pretrained model's migration ability to the 4 multicenter and 4 multirater target domains. Dice similarity coefficient (DSC), 95% Hausdorff distance (HD95), and other metrics were used for quantitative evaluations. RESULTS: In the adaptation of dataset5 to other data sets, our source-free active learning adaptation method only requires limited labeled target samples (only 20%) to achieve a median DSC ranging from 0.70 to 0.86 and a median HD95 ranging from 3.16 to 7.21 mm for 4 target centers, and 0.78 to 0.85 and 3.64 to 6.00 mm for 4 multirater data sets. For DSC, our results for 3 of 4 multicenter data sets and all multirater data sets showed no statistical difference compared to the fully supervised U-Net model (P values >
0.05) and significantly surpassed comparison models for 3 multicenter data sets and all multirater data sets (P values <
0.05). Clinical assessment showed that our method-generated delineations can be used both in multicenter and multirater scenarios after minor refinement (revision ratio <
10% and median time <
2 minutes). CONCLUSIONS: The proposed method effectively minimizes domain gaps and delivers encouraging performance compared with fully supervised learning models with limited labeled training samples, offering a promising and practical solution for accurate and generalizable gross tumor volume segmentation in NPC.