Online participant recruitment is a cornerstone of modern psychology research. While this offers clear benefits for studying individual differences in cognitive abilities, test performance can vary across lab-based and web-based settings. Here we assess the stability of normative test scores across popular online recruitment platforms and in-person testing, for three standard measures of face identity processing ability: the GFMT2, CFMT+ , and MFMT. Participants recruited via Amazon Mechanical Turk (MTurk) scored approximately 10 percentage points lower in all tests compared to those recruited through Prolific and university students tested in the lab. Applying stricter exclusion criteria based on attention checks resulted in notably higher exclusion rates for the MTurk group (~ 62%) compared to the Prolific group (~ 22%), yet even after exclusion, some test scores remained lower for MTurk participants. Given that the GFMT2 subtests were developed using MTurk participants, we provide updated normative scores for all subtests (GFMT2-Short, GFMT2-Low, GFMT2-High) and further recommendations for their use. We also confirm the robust psychometric properties of the GFMT2-Short and GFMT2-High, demonstrating strong test-retest reliability, convergent validity with other established tests, and high diagnostic value in identifying super-recognisers. The GFMT2 subtests are freely available for use in both online and in-person research via www.gfmt2.org .