Supervised machine learning has been used to detect fine-scale animal behaviour from accelerometer data, but a standardised protocol for implementing this workflow is currently lacking. As the application of machine learning to ecological problems expands, it is essential to establish technical protocols and validation standards that align with those in other 'big data' fields. Overfitting is a prevalent and often misunderstood challenge in machine learning. Overfit models overly adapt to the training data to memorise specific instances rather than to discern the underlying signal. Associated results can indicate high performance on the training set, yet these models are unlikely to generalise to new data. Overfitting can be detected through rigorous validation using independent test sets. Our systematic review of 119 studies using accelerometer-based supervised machine learning to classify animal behaviour reveals that 79% (94 papers) did not validate their models sufficiently well to robustly identify potential overfitting. Although this does not inherently imply that these models are overfit, the absence of independent test sets limits the interpretability of their results. To address these challenges, we provide a theoretical overview of overfitting in the context of animal accelerometry and propose guidelines for optimal validation techniques. Our aim is to equip ecologists with the tools necessary to adapt general machine learning validation theory to the specific requirements of biologging, facilitating reliable overfitting detection and advancing the field.