OBJECTIVE: To quantify and improve the performance of standard rheumatoid arthritis (RA) algorithms in a biobank setting. METHODS: This retrospective cohort study within the Mayo Clinic (MC) Biobank and MC Tapestry Study identified RA cases by presence of at least two RA codes OR positive anti-cyclic citrullinated peptide antibodies (CCP) plus disease-modifying anti-rheumatic drug (DMARD) prescription as of 7/18/2022. Rheumatology physicians manually verified all RA cases using RA criteria and/or rheumatology physician diagnosis plus DMARD use. All other biobank participants served as non-RA controls. We defined seropositivity as rheumatoid factor and/or anti-CCP positivity. We assessed rules-based and Electronic Medical Records and Genomics (eMERGE) RA algorithms using positive predictive value (PPV). Finally, we developed a novel RA algorithm using a LASSO-based machine learning approach with five-fold cross validation. RESULTS: We identified 1,316 confirmed RA cases (968 MC Biobank, 348 Tapestry, 70 % seropositive) and 82,123 non-RA controls (mean age 65, 61 % female). The PPV of 3 RA codes was 43 %, codes plus DMARD was 54 %, and codes plus DMARD plus seropositivity was 85 %. The PPV of eMERGE was 77 %. Available in the MC Biobank, self-reported RA (PPV 10 %) only minimally improved algorithm performance (PPV from 83 % to 85 %), whereas family history of RA (PPV 3 %) worsened performance. At 90 % PPV, the novel RA algorithm incorporating key variables such as anti-CCP and DMARD use increased sensitivity by 4-11 % compared to eMERGE. CONCLUSION: Rules-based and eMERGE RA algorithms had worse performance in biobank than administrative settings. Our novel RA algorithm outperformed these standard algorithms.