Differential diagnosis (DDx) is crucial for medicine as it helps healthcare providers systematically distinguish between conditions that share similar symptoms. This study evaluates the influence of lab test results on DDx accuracy generated by large language models (LLMs). Clinical vignettes from 50 randomly selected case reports from PMC-Patients were created, incorporating demographics, symptoms, and lab data. Five LLMs-GPT-4, GPT-3.5, Llama-2-70b, Claude-2, and Mixtral-8x7B-were tested to generate Top 10, Top 5, and Top 1 DDx with and without lab data. Results show that incorporating lab data enhances accuracy by up to 30% across models. GPT-4 achieved the highest performance, with Top 1 accuracy of 55% (0.41-0.69) and lenient accuracy reaching 79% (0.68-0.90). Statistically significant improvements (Holm-adjusted p values <
0.05) were observed, with GPT-4 and Mixtral excelling. Lab tests, including liver function, metabolic/toxicology panels, and serology, were generally interpreted correctly by LLMs for DDx.