Title: |
Vocal Tract Length Warped Features for Spoken Keyword Spotting |
Authors: |
Sarkar, Achintya kr., Dwivedi, Priyanka, Tan, Zheng-Hua |
Publication Year: |
2025 |
Collection: |
Computer Science |
Subject Terms: |
Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing |
More Details: |
In this paper, we propose several methods that incorporate vocal tract length (VTL) warped features for spoken keyword spotting (KWS). The first method, VTL-independent KWS, involves training a single deep neural network (DNN) that utilizes VTL features with various warping factors. During training, a specific VTL feature is randomly selected per epoch, allowing the exploration of VTL variations. During testing, the VTL features with different warping factors of a test utterance are scored against the DNN and combined with equal weight. In the second method scores the conventional features of a test utterance (without VTL warping) against the DNN. The third method, VTL-concatenation KWS, concatenates VTL warped features to form high-dimensional features for KWS. Evaluations carried out on the English Google Command dataset demonstrate that the proposed methods improve the accuracy of KWS. |
Document Type: |
Working Paper |
Access URL: |
http://arxiv.org/abs/2501.03523 |
Accession Number: |
edsarx.2501.03523 |
Database: |
arXiv |