VANPY: Voice Analysis Framework

Bibliographic Details
Title: VANPY: Voice Analysis Framework
Authors: Koushnir, Gregory, Fire, Michael, Alpert, Galit Fuhrmann, Kagan, Dima
Publication Year: 2025
Collection: Computer Science
Subject Terms: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
More Details: Voice data is increasingly being used in modern digital communications, yet there is still a lack of comprehensive tools for automated voice analysis and characterization. To this end, we developed the VANPY (Voice Analysis in Python) framework for automated pre-processing, feature extraction, and classification of voice data. The VANPY is an open-source end-to-end comprehensive framework that was developed for the purpose of speaker characterization from voice data. The framework is designed with extensibility in mind, allowing for easy integration of new components and adaptation to various voice analysis applications. It currently incorporates over fifteen voice analysis components - including music/speech separation, voice activity detection, speaker embedding, vocal feature extraction, and various classification models. Four of the VANPY's components were developed in-house and integrated into the framework to extend its speaker characterization capabilities: gender classification, emotion classification, age regression, and height regression. The models demonstrate robust performance across various datasets, although not surpassing state-of-the-art performance. As a proof of concept, we demonstrate the framework's ability to extract speaker characteristics on a use-case challenge of analyzing character voices from the movie "Pulp Fiction." The results illustrate the framework's capability to extract multiple speaker characteristics, including gender, age, height, emotion type, and emotion intensity measured across three dimensions: arousal, dominance, and valence.
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2502.17579
Accession Number: edsarx.2502.17579
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/2502.17579
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20250217&spage=&pages=&title=VANPY: Voice Analysis Framework&atitle=VANPY%3A%20Voice%20Analysis%20Framework&aulast=Koushnir%2C%20Gregory&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.2502.17579
RelevancyScore: 1147
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 1146.56518554688
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: VANPY: Voice Analysis Framework
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Koushnir%2C+Gregory%22">Koushnir, Gregory</searchLink><br /><searchLink fieldCode="AR" term="%22Fire%2C+Michael%22">Fire, Michael</searchLink><br /><searchLink fieldCode="AR" term="%22Alpert%2C+Galit+Fuhrmann%22">Alpert, Galit Fuhrmann</searchLink><br /><searchLink fieldCode="AR" term="%22Kagan%2C+Dima%22">Kagan, Dima</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2025
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Computer Science
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Sound%22">Computer Science - Sound</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink><br /><searchLink fieldCode="DE" term="%22Electrical+Engineering+and+Systems+Science+-+Audio+and+Speech+Processing%22">Electrical Engineering and Systems Science - Audio and Speech Processing</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Voice data is increasingly being used in modern digital communications, yet there is still a lack of comprehensive tools for automated voice analysis and characterization. To this end, we developed the VANPY (Voice Analysis in Python) framework for automated pre-processing, feature extraction, and classification of voice data. The VANPY is an open-source end-to-end comprehensive framework that was developed for the purpose of speaker characterization from voice data. The framework is designed with extensibility in mind, allowing for easy integration of new components and adaptation to various voice analysis applications. It currently incorporates over fifteen voice analysis components - including music/speech separation, voice activity detection, speaker embedding, vocal feature extraction, and various classification models. Four of the VANPY's components were developed in-house and integrated into the framework to extend its speaker characterization capabilities: gender classification, emotion classification, age regression, and height regression. The models demonstrate robust performance across various datasets, although not surpassing state-of-the-art performance. As a proof of concept, we demonstrate the framework's ability to extract speaker characteristics on a use-case challenge of analyzing character voices from the movie "Pulp Fiction." The results illustrate the framework's capability to extract multiple speaker characteristics, including gender, age, height, emotion type, and emotion intensity measured across three dimensions: arousal, dominance, and valence.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2502.17579" linkWindow="_blank">http://arxiv.org/abs/2502.17579</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.2502.17579
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2502.17579
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Computer Science - Sound
        Type: general
      – SubjectFull: Computer Science - Machine Learning
        Type: general
      – SubjectFull: Electrical Engineering and Systems Science - Audio and Speech Processing
        Type: general
    Titles:
      – TitleFull: VANPY: Voice Analysis Framework
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Koushnir, Gregory
      – PersonEntity:
          Name:
            NameFull: Fire, Michael
      – PersonEntity:
          Name:
            NameFull: Alpert, Galit Fuhrmann
      – PersonEntity:
          Name:
            NameFull: Kagan, Dima
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 17
              M: 02
              Type: published
              Y: 2025
ResultId 1