Particulate matter emissions negatively affect public health and global climate, yet newer fuel-efficient gasoline direct injection engines tend to produce more soot than their port-fuel injection counterparts. Fortunately, the search for sustainable biomass-based fuel blendstocks provides an opportunity to develop fuels that suppress soot formation in more efficient engine designs. However, as emissions tests are experimentally cumbersome and the search space for potential bioblendstocks is vast, new techniques are needed to estimate the sooting tendency of a diverse range of compounds. In this study, we develop a quantitative structure-activity relationship (QSAR) model of sooting tendency based on the experimental yield sooting index (YSI), which ranks molecules on a scale from n-hexane, 0, to benzene, 100. The model includes a rigorously defined applicability domain, and the predictive performance is checked using both internal and external validation. Model predictions for compounds in the external test set had a median absolute error of ~3 YSI units. An investigation of compounds that are poorly predicted by the model lends new insight into the complex mechanisms governing soot formation. Predictive models of soot formation can therefore be expected to play an increasingly important role in the screening and development of next-generation biofuels.