Particulate matter (PM) is a critical component of overall pollutant exposure, but monitoring at the individual level remains impractical for large cohorts. This study aimed to identify PM sources in a highly polluted area in Taiwan and develop generalizable predictive models. We collected daily average PM data from Environmental Protection Administration (EPA) air quality monitoring stations, AirBox sensors, and EPA micro-stations in highly polluted area of Taiwan, recorded between 2018 and 2020. Predictors were derived from various datasets, including EPA environmental resources, meteorological data, land use, road traffic facilities, social information, geospatial data, and landmark databases. Employing ensemble techniques, such as land-use regression (LUR), inverse distance weighting, and three machine learning algorithms (support vector machine, random forest, and multilayer perceptron), we predicted PM