INTRODUCTION: Effective management of pediatric myopia, which includes treatments like corrective lenses and low-dose atropine, requires accurate clinical decisions. However, the complexity of pediatric refractive data, such as variations in visual acuity, axial length, and patient-specific factors, pose challenges to determining optimal treatment. This study aims to evaluate the performance of three large language models in analyzing these refractive data. METHODS: A dataset of 100 pediatric refractive records, including parameters like visual acuity and axial length, was analyzed using ChatGPT-3.5, ChatGPT-4o, and Wenxin Yiyan, respectively. Each model was tasked with determining whether intervention was needed and subsequently recommending a treatment (eyeglasses, orthokeratology lens, or low-dose atropine). The recommendations were compared to professional optometrists' consensus, rated on a 1-5 Global Quality Score (GQS) scale, and evaluated for clinical safety utilizing a three-tier accuracy assessment. RESULTS: ChatGPT-4o outperformed both ChatGPT-3.5 and Wenxin Yiyan in determining intervention needs, with an accuracy of 90%, significantly higher than Wenxin Yiyan (p <
0.05). It also achieved the highest GQS of 4.4 ± 0.55, surpassing the other models (p <
0.001), with 85% of responses rated as "good" ahead of ChatGPT-3.5 (82%) and Wenxin Yiyan (74%). ChatGPT-4o made only eight errors in recommending interventions, fewer than ChatGPT-3.5 (12) and Wenxin Yiyan (15). Additionally, it performed better with incomplete or abnormal data, maintaining higher quality scores. CONCLUSION: ChatGPT-4o showed better accuracy and clinical safety, making it a promising tool for decision support in pediatric ophthalmology, although expert oversight is still necessary.