The Gearbox Reliability Collaborative (GRC) has conducted extensive field and dynamometer test campaigns on two heavily instrumented wind turbine gearboxes. In this paper, data from the planetary stage is used to evaluate the accuracy and computation time of numerical models of the gearbox. First, planet-bearing load and motion data is analyzed to characterize planetary stage behavior indifferent environments and to derive requirements for gearbox models and life calculations. Second, a set of models are constructed that represent different levels of fidelity. Simulations of the test conditions are compared to the test data and the computational cost of the models are compared. The test data suggests that the planet-bearing life calculations should be made separately for eachbearing on a row due to unequal load distribution. It also shows that tilting of the gear axes is related to planet load share. The modeling study concluded that fully flexible models were needed to predict planet-bearing loading in some cases, although less complex models were able to achieve good correlation in the field-loading case. Significant differences in planet load share were found insimulation and were dependent on the scope of the model and the bearing stiffness model used.