VANPY: Voice Analysis Framework

Bibliographic Details
Title:	VANPY: Voice Analysis Framework
Authors:	Koushnir, Gregory, Fire, Michael, Alpert, Galit Fuhrmann, Kagan, Dima
Publication Year:	2025
Collection:	Computer Science
Subject Terms:	Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
More Details:	Voice data is increasingly being used in modern digital communications, yet there is still a lack of comprehensive tools for automated voice analysis and characterization. To this end, we developed the VANPY (Voice Analysis in Python) framework for automated pre-processing, feature extraction, and classification of voice data. The VANPY is an open-source end-to-end comprehensive framework that was developed for the purpose of speaker characterization from voice data. The framework is designed with extensibility in mind, allowing for easy integration of new components and adaptation to various voice analysis applications. It currently incorporates over fifteen voice analysis components - including music/speech separation, voice activity detection, speaker embedding, vocal feature extraction, and various classification models. Four of the VANPY's components were developed in-house and integrated into the framework to extend its speaker characterization capabilities: gender classification, emotion classification, age regression, and height regression. The models demonstrate robust performance across various datasets, although not surpassing state-of-the-art performance. As a proof of concept, we demonstrate the framework's ability to extract speaker characteristics on a use-case challenge of analyzing character voices from the movie "Pulp Fiction." The results illustrate the framework's capability to extract multiple speaker characteristics, including gender, age, height, emotion type, and emotion intensity measured across three dimensions: arousal, dominance, and valence.
Document Type:	Working Paper
Access URL:	http://arxiv.org/abs/2502.17579
Accession Number:	edsarx.2502.17579
Database:	arXiv

FullText	Text: Availability: 0 CustomLinks: – Url: http://arxiv.org/abs/2502.17579 Name: EDS - Arxiv Category: fullText Text: View this record from Arxiv MouseOverText: View this record from Arxiv – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20250217&spage=&pages=&title=VANPY: Voice Analysis Framework&atitle=VANPY%3A%20Voice%20Analysis%20Framework&aulast=Koushnir%2C%20Gregory&id=DOI: Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries
Header	DbId: edsarx DbLabel: arXiv An: edsarx.2502.17579 RelevancyScore: 1147 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 1146.56518554688
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: VANPY: Voice Analysis Framework – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Koushnir%2C+Gregory%22">Koushnir, Gregory</searchLink><br /><searchLink fieldCode="AR" term="%22Fire%2C+Michael%22">Fire, Michael</searchLink><br /><searchLink fieldCode="AR" term="%22Alpert%2C+Galit+Fuhrmann%22">Alpert, Galit Fuhrmann</searchLink><br /><searchLink fieldCode="AR" term="%22Kagan%2C+Dima%22">Kagan, Dima</searchLink> – Name: DatePubCY Label: Publication Year Group: Date Data: 2025 – Name: Subset Label: Collection Group: HoldingsInfo Data: Computer Science – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Sound%22">Computer Science - Sound</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink><br /><searchLink fieldCode="DE" term="%22Electrical+Engineering+and+Systems+Science+-+Audio+and+Speech+Processing%22">Electrical Engineering and Systems Science - Audio and Speech Processing</searchLink> – Name: Abstract Label: Description Group: Ab Data: Voice data is increasingly being used in modern digital communications, yet there is still a lack of comprehensive tools for automated voice analysis and characterization. To this end, we developed the VANPY (Voice Analysis in Python) framework for automated pre-processing, feature extraction, and classification of voice data. The VANPY is an open-source end-to-end comprehensive framework that was developed for the purpose of speaker characterization from voice data. The framework is designed with extensibility in mind, allowing for easy integration of new components and adaptation to various voice analysis applications. It currently incorporates over fifteen voice analysis components - including music/speech separation, voice activity detection, speaker embedding, vocal feature extraction, and various classification models. Four of the VANPY's components were developed in-house and integrated into the framework to extend its speaker characterization capabilities: gender classification, emotion classification, age regression, and height regression. The models demonstrate robust performance across various datasets, although not surpassing state-of-the-art performance. As a proof of concept, we demonstrate the framework's ability to extract speaker characteristics on a use-case challenge of analyzing character voices from the movie "Pulp Fiction." The results illustrate the framework's capability to extract multiple speaker characteristics, including gender, age, height, emotion type, and emotion intensity measured across three dimensions: arousal, dominance, and valence. – Name: TypeDocument Label: Document Type Group: TypDoc Data: Working Paper – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2502.17579" linkWindow="_blank">http://arxiv.org/abs/2502.17579</link> – Name: AN Label: Accession Number Group: ID Data: edsarx.2502.17579
PLink	https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2502.17579
RecordInfo	BibRecord: BibEntity: Subjects: – SubjectFull: Computer Science - Sound Type: general – SubjectFull: Computer Science - Machine Learning Type: general – SubjectFull: Electrical Engineering and Systems Science - Audio and Speech Processing Type: general Titles: – TitleFull: VANPY: Voice Analysis Framework Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Koushnir, Gregory – PersonEntity: Name: NameFull: Fire, Michael – PersonEntity: Name: NameFull: Alpert, Galit Fuhrmann – PersonEntity: Name: NameFull: Kagan, Dima IsPartOfRelationships: – BibEntity: Dates: – D: 17 M: 02 Type: published Y: 2025
ResultId	1