Propensity score adjustment addresses confounding by balancing covariates in subject treatment groups through matching, stratification, or weighting. Diagnostics test the success of adjustment. For example, if the standardized mean difference (SMD) for a relevant covariate exceeds a threshold like 0.1, the covariate is considered imbalanced and the study may be invalid. Unfortunately, for studies with small or moderate numbers of subjects, the probability of falsely rejecting the validity of a study because of chance imbalance-the probability of asserting imbalance by using a cutoff for SMD when no underlying imbalance exists-can be grossly larger than a given nominal level like 0.05. In this paper, we illustrate that chance imbalance is operative in real-world settings even for moderate sample sizes of 2000. We identify a previously unrecognized challenge that as meta-analyses increase the precision of an effect estimate, the diagnostics must also undergo meta-analysis for a corresponding increase in precision. We propose an alternative diagnostic that checks whether the standardized mean difference statistically significantly exceeds the threshold. Through simulation and real-world data, we find that this diagnostic achieves a better trade-off of type 1 error rate and power than standard nominal threshold tests and not testing for sample sizes from 250 to 4000 and for 20 to 100,000 covariates. We confirm that in network studies, meta-analysis of effect estimates must be accompanied by meta-analysis of the diagnostics or else systematic confounding may overwhelm the estimated effect. Our procedure supports the review of large numbers of covariates, enabling more rigorous diagnostics.