PURPOSE: Treatment deintensification for human papillomavirus-positive (HPV+)-associated oropharyngeal cancer (OPC) has been the catalyst of experts worldwide. In situ hybridization is optimal to identify HPV+ OPC, but immunohistochemistry for its surrogate p16INK4a (p16) is standard-of-care given its availability and sensitivity. HPV testing is not required for clinical management, so treatments are often administered on the basis of p16 status alone. However, the prognosis of p16/HPV discordant tumors is uncertain. MATERIALS AND METHODS: This cohort study included 727 consecutive patients with OPC with digitized unstructured pathology reports receiving curative radiation therapy at an academic cancer center. Natural language processing (NLP) methods were used to classify biomarker status and compared against manually derived classification. Patients were excluded if either p16 or HPV testing was not performed or equivocal. Primary end points were progression-free survival (PFS), cancer-specific survival (CSS), and overall survival. RESULTS: NLP classified p16 and HPV status from a majority (91%) of reports. Accuracy, positive predictive value, sensitivity, and CONCLUSION: NLP classified head and neck cancer pathology reports with high concordance with gold-standard categorization, but a conspicuous portion of reports could not be interpreted. p16/HPV discordant OPC constitutes a noteworthy minority of patients. The inferior prognosis of p16+/HPV- suggests that p16 alone for prognostication is insufficient-especially when considering treatment de-escalation.