Two-stage data augmentation for improved ASR performance for dysarthric speech.

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Chitralekha Bhat, Helmer Strik

Ngôn ngữ: eng

Ký hiệu phân loại: 025.393 *Recataloging

Thông tin xuất bản: United States : Computers in biology and medicine , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 708009

Thêm vào giỏ Liên kết toàn văn

Machine learning (ML) and Deep Neural Networks (DNN) have greatly aided the problem of Automatic Speech Recognition (ASR). However, accurate ASR for dysarthric speech remains a serious challenge. The dearth of usable data remains a problem in applying ML and DNN techniques for dysarthric speech recognition. In the current research, we address this challenge using a novel two-stage data augmentation scheme, a combination of static and dynamic data augmentation techniques, designed by leveraging an understanding of the characteristics of dysarthric speech. We explore speaker-independent ASR using modifications to healthy speech using various perturbations, devoicing of consonants, and voice conversion, comprising stage one or static augmentations. Subsequent to the first stage, a modified SpecAugment algorithm tailored for dysarthric speech is employed. This variant, termed Dysarthric SpecAugment, leverages the characteristics of dysarthric speech and forms the second stage of the two-stage augmentation approach. This acoustic model is used to pre-train a speaker-dependent ASR using dysarthric speech. The objective of this work is to improve the ASR performance for dysarthric speech using the two-stage data augmentation scheme. An end-to-end ASR using a Transformer acoustic model is used to evaluate the data augmentation scheme on speech from the UA dysarthric speech corpus. We achieve an absolute improvement of 10.7% and a relative improvement of 29.2% in word error rate (WER) over a baseline with no augmentation, with a final WER of 25.9% for the speaker-dependent system.

Tạo bộ sưu tập với mã QR