Vocal Tract Length Warped Features for Spoken Keyword Spotting

Bibliographic Details
Title: Vocal Tract Length Warped Features for Spoken Keyword Spotting
Authors: Sarkar, Achintya kr., Dwivedi, Priyanka, Tan, Zheng-Hua
Publication Year: 2025
Collection: Computer Science
Subject Terms: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
More Details: In this paper, we propose several methods that incorporate vocal tract length (VTL) warped features for spoken keyword spotting (KWS). The first method, VTL-independent KWS, involves training a single deep neural network (DNN) that utilizes VTL features with various warping factors. During training, a specific VTL feature is randomly selected per epoch, allowing the exploration of VTL variations. During testing, the VTL features with different warping factors of a test utterance are scored against the DNN and combined with equal weight. In the second method scores the conventional features of a test utterance (without VTL warping) against the DNN. The third method, VTL-concatenation KWS, concatenates VTL warped features to form high-dimensional features for KWS. Evaluations carried out on the English Google Command dataset demonstrate that the proposed methods improve the accuracy of KWS.
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2501.03523
Accession Number: edsarx.2501.03523
Database: arXiv
More Details
Description not available.