3D markerless tracking of speech movements with submillimeter accuracy.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Arielle Borovsky, Kwang S Kim, James Liu, Austin Lovell, Raymond A Yeh

Ngôn ngữ: eng

Ký hiệu phân loại: 653.4 Handwritten systems

Thông tin xuất bản: United States : bioRxiv : the preprint server for biology , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 674048

UNLABELLED: Speech movements are highly complex and require precise tuning of both spatial and timing of oral articulators to support intelligible communication. These properties also make measurement of speech movements challenging, often requiring extensive physical sensors placed around the mouth and face that are not easily tolerated by certain populations such as young children. Recent progress in machine learning-based markerless facial landmark tracking technology demonstrated its potential to provide lip tracking without the need for physical sensors, but whether such technology can provide submillimeter precision and accuracy in 3D remains unknown. Moreover, it is also unclear whether such technology can be applied to track speech movements in young children. Here, we developed a novel approach that integrates Shape Preserving Facial Landmarks with Graph Attention Networks (SPIGA), a facial landmark detector, and CoTracker, a transformer-based neural network model that jointly tracks dense points across a video sequence. We further examined and validated this novel approach by assessing its tracking precision and accuracy. The findings revealed that our approach that integrates SPIGA and CoTracker was more precise (≈ 0.15 mm in standard deviation) than SPIGA alone (≈ 0.35 mm). In addition, its 3D tracking performance was comparable to electromagnetic articulography (≈ 0.29 mm RMSE against simultaneously recorded articulograph data). Importantly, the approach performed similarly well across adults and young children ( AUTHOR SUMMARY: In this work, we examined whether machine learning based markerless tracking is feasible for tracking 3D lip movements in adults and young children. We developed a novel approach that integrates a landmark detection model (SPIGA) with a tracker model (CoTracker). Our combined CoTracker-based approach demonstrated submillimeter precision and accuracy desired for speech kinematic recording. In addition, our approach does not involve training and validation for each population (
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH