BACKGROUND: To evaluate the performance of two AI systems, ChatGPT 4.0 and Algor, in generating concept maps from validated otolaryngology clinical practice guidelines. METHODS: Concept maps were generated by ChatGPT 4.0 and Algor from four American Academy of Otolaryngology-Head and Neck Surgery Foundation (AAO-HNSF) clinical practice guidelines. Eight otolaryngology specialists evaluated the generated concept maps using the AI-Map questionnaire, covering concept identification, relationship establishment, hierarchical structure representation, and visual presentation. Chi-square tests and Kendall's tau coefficient were used for statistical analysis. RESULTS: While no consistent superiority was observed across all guidelines, both AI systems demonstrated unique strengths. ChatGPT excelled in representing cross-connections between concepts and layout optimization, particularly for the Rhinoplasty guidelines (χ²=6.000, p = 0.050 for cross-connections). Algor showed strengths in capturing main themes and distinguishing general/abstract concepts, especially in the BPVV and Tympanostomy Tube guidelines (χ²=8.000, p = 0.046 for main themes in BPVV). Statistically significant differences were found in representing dynamic nature (favouring H&NMass-GPT, χ²=7.571, p = 0.023) and overall value and usefulness (favouring H&NMass-Algor, χ²=7.905, p = 0.019) for the H&N Masses guidelines. CONCLUSION: AI systems showed potential in automating concept map creation from otolaryngology guidelines, with performance varying across different medical topics and evaluation criteria. Further research is required to optimize AI systems for medical education and knowledge representation, highlighting their promise and current limitations.