Harnessing large language models for data-scarce learning of polymer properties.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Siavash Jafarzadeh, Brian Y Lattimer, Ning Liu, Jim Lua, Shuna Ni, Yue Yu

Ngôn ngữ: eng

Ký hiệu phân loại: 678.54 Properties

Thông tin xuất bản: United States : Nature computational science , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 49666

Thêm vào giỏ Liên kết toàn văn

Large language models (LLMs) bear promise as a fast and accurate material modeling paradigm for evaluation, analysis and design. Their vast number of trainable parameters necessitates a wealth of data to achieve accuracy and mitigate overfitting. However, experimental measurements are often limited and costly to obtain in sufficient quantities for fine-tuning. To this end, here we present a physics-based training pipeline that tackles the pathology of data scarcity. The core enabler is a physics-based modeling framework that generates a multitude of synthetic data to align the LLM to a physically consistent initial state before fine-tuning. Our framework features a two-phase training strategy: utilizing the large-in-amount but less accurate synthetic data for supervised pretraining, and fine-tuning the phase-1 model with limited experimental data. We empirically demonstrate that supervised pretraining is vital to obtaining accurate fine-tuned LLMs, via the lens of learning polymer flammability metrics where cone calorimeter data are sparse.

Tạo bộ sưu tập với mã QR