Heterogeneities such as point defects, inherent to material systems, can profoundly influence material functionalities critical for numerous energy applications. This influence in principle can be identified and quantified through development of large defect data sets which we call the defect genome, employing high-throughput <
em>
ab initio<
/em>
calculations. However, high-throughput screening of material models with point defects dramatically increases the computational complexity and chemical search space, creating major impediments toward developing a defect genome. In this paper, we overcome these impediments by employing computationally tractable <
em>
ab initio<
/em>
models driven by highly scalable workflows, to study formation and interaction of various point defects (e.g., O vacancies, H interstitials, and Y substitutional dopant), in over 80 cubic perovskites, for potential proton-conducting ceramic fuel cell (PCFC) applications. The resulting defect data sets identify several promising perovskite compounds that can exhibit high proton conductivity. Furthermore, the data sets also enable us to identify and explain, insightful and novel correlations among defect energies, material identities, and defect-induced local structural distortions. Finally, such defect data sets and resultant correlations are necessary to build statistical machine learning models, which are required to accelerate discovery of new materials.