Predicting the power and loads of wind turbines in waked inflow conditions still presents a major modelling challenge. It requires the accurate modelling of the atmospheric flow conditions, wakes of upstream turbines and the response of the turbine of interest. Rigorous validations of model frameworks against measurements of utility-scale wind turbines in such scenarios remain limited to date. In this study, six models of different fidelity are compared against measurements from the DanAero experiment. The two benchmark cases feature a full-wake and partial-wake scenario, respectively. The simulations are compared against local pressure forces and inflow velocities measured on several blade sections of the downstream turbine, as well as met mast measurements and standard SCADA data. Regardless of the model fidelity, reasonable agreements are found in terms of the wake characteristics and turbine response. For instance, the azimuth variation of the mean aerodynamic forces acting on the blade was captured with a mean relative error of 15-20%. Additionally, while various model-specific deficiencies could be identified, the study highlights the need for further full-scale measurement campaigns with even more extensive instrumentation. Furthermore, it is concluded that validations should not be limited to integrated and/or time-averaged quantities that conceal characteristic spatial or temporal variations.