Within the evolving landscape of artificial intelligence (AI), large language models (LLMs) have the potential to enhance the efficiency, breadth, and validity of the collection, processing, and analysis of text as data in evaluation practice. However, LLMs do not always generate aligned, authoritative, or accurate responses, indicating that their responses must be validated before use in our work. Furthermore, the importance of analytical rigor in our practice, combined with our institutions' ability to affect the lives of people around the world, makes it clear that we must take a thoughtful approach to integrating such tools. How can we realize the potential of LLMs while maintaining rigor? This guidance note aims to answer that question by demonstrating good practices for experimenting with LLMs based on a frequently occurring use case in our evaluations: structured literature review (SLR). This use case serves as a concrete example of how LLMs can be thoughtfully integrated into evaluation workflows. This publication is jointly produced by the Independent Evaluation Group (IEG) of the World Bank (WB) and the Independent Office of Evaluation (IOE) of the International Fund for Agricultural Development (IFAD).