Comparative analysis of feature selection techniques for COVID-19 dataset.

Bibliographic Details
Title: Comparative analysis of feature selection techniques for COVID-19 dataset.
Authors: Mohtasham, Farideh1 f-mohtasham@sbmu.ac.ir, Pourhoseingholi, MohamadAmin2, Hashemi Nazari, Seyed Saeed3, Kavousi, Kaveh4 kkavousi@ut.ac.ir, Zali, Mohammad Reza1
Source: Scientific Reports. 8/15/2024, Vol. 14 Issue 1, p1-20. 20p.
Subject Terms: *FEATURE selection, *RANDOM forest algorithms, *EARLY diagnosis, *OXYGEN saturation, *KIDNEY physiology, *MACHINE learning
Geographic Terms: IRAN
Abstract: In the context of early disease detection, machine learning (ML) has emerged as a vital tool. Feature selection (FS) algorithms play a crucial role in ensuring the accuracy of predictive models by identifying the most influential variables. This study, focusing on a retrospective cohort of 4778 COVID-19 patients from Iran, explores the performance of various FS methods, including filter, embedded, and hybrid approaches, in predicting mortality outcomes. The researchers leveraged 115 routine clinical, laboratory, and demographic features and employed 13 ML models to assess the effectiveness of these FS methods based on classification accuracy, predictive accuracy, and statistical tests. The results indicate that a Hybrid Boruta-VI model combined with the Random Forest algorithm demonstrated superior performance, achieving an accuracy of 0.89, an F1 score of 0.76, and an AUC value of 0.95 on test data. Key variables identified as important predictors of adverse outcomes include age, oxygen saturation levels, albumin levels, neutrophil counts, platelet levels, and markers of kidney function. These findings highlight the potential of advanced FS techniques and ML models in enhancing early disease detection and informing clinical decision-making. [ABSTRACT FROM AUTHOR]
Copyright of Scientific Reports is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Academic Search Complete
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
Text:
  Availability: 1
CustomLinks:
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:a9h&genre=article&issn=20452322&ISBN=&volume=14&issue=1&date=20240815&spage=1&pages=1-20&title=Scientific Reports&atitle=Comparative%20analysis%20of%20feature%20selection%20techniques%20for%20COVID-19%20dataset.&aulast=Mohtasham%2C%20Farideh&id=DOI:10.1038/s41598-024-69209-6
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: a9h
DbLabel: Academic Search Complete
An: 179040649
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Comparative analysis of feature selection techniques for COVID-19 dataset.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Mohtasham%2C+Farideh%22">Mohtasham, Farideh</searchLink><relatesTo>1</relatesTo><i> f-mohtasham@sbmu.ac.ir</i><br /><searchLink fieldCode="AR" term="%22Pourhoseingholi%2C+MohamadAmin%22">Pourhoseingholi, MohamadAmin</searchLink><relatesTo>2</relatesTo><br /><searchLink fieldCode="AR" term="%22Hashemi+Nazari%2C+Seyed+Saeed%22">Hashemi Nazari, Seyed Saeed</searchLink><relatesTo>3</relatesTo><br /><searchLink fieldCode="AR" term="%22Kavousi%2C+Kaveh%22">Kavousi, Kaveh</searchLink><relatesTo>4</relatesTo><i> kkavousi@ut.ac.ir</i><br /><searchLink fieldCode="AR" term="%22Zali%2C+Mohammad+Reza%22">Zali, Mohammad Reza</searchLink><relatesTo>1</relatesTo>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Scientific+Reports%22">Scientific Reports</searchLink>. 8/15/2024, Vol. 14 Issue 1, p1-20. 20p.
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: *<searchLink fieldCode="DE" term="%22FEATURE+selection%22">FEATURE selection</searchLink><br />*<searchLink fieldCode="DE" term="%22RANDOM+forest+algorithms%22">RANDOM forest algorithms</searchLink><br />*<searchLink fieldCode="DE" term="%22EARLY+diagnosis%22">EARLY diagnosis</searchLink><br />*<searchLink fieldCode="DE" term="%22OXYGEN+saturation%22">OXYGEN saturation</searchLink><br />*<searchLink fieldCode="DE" term="%22KIDNEY+physiology%22">KIDNEY physiology</searchLink><br />*<searchLink fieldCode="DE" term="%22MACHINE+learning%22">MACHINE learning</searchLink>
– Name: SubjectGeographic
  Label: Geographic Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22IRAN%22">IRAN</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: In the context of early disease detection, machine learning (ML) has emerged as a vital tool. Feature selection (FS) algorithms play a crucial role in ensuring the accuracy of predictive models by identifying the most influential variables. This study, focusing on a retrospective cohort of 4778 COVID-19 patients from Iran, explores the performance of various FS methods, including filter, embedded, and hybrid approaches, in predicting mortality outcomes. The researchers leveraged 115 routine clinical, laboratory, and demographic features and employed 13 ML models to assess the effectiveness of these FS methods based on classification accuracy, predictive accuracy, and statistical tests. The results indicate that a Hybrid Boruta-VI model combined with the Random Forest algorithm demonstrated superior performance, achieving an accuracy of 0.89, an F1 score of 0.76, and an AUC value of 0.95 on test data. Key variables identified as important predictors of adverse outcomes include age, oxygen saturation levels, albumin levels, neutrophil counts, platelet levels, and markers of kidney function. These findings highlight the potential of advanced FS techniques and ML models in enhancing early disease detection and informing clinical decision-making. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Scientific Reports is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=a9h&AN=179040649
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1038/s41598-024-69209-6
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 20
        StartPage: 1
    Subjects:
      – SubjectFull: IRAN
        Type: general
      – SubjectFull: FEATURE selection
        Type: general
      – SubjectFull: RANDOM forest algorithms
        Type: general
      – SubjectFull: EARLY diagnosis
        Type: general
      – SubjectFull: OXYGEN saturation
        Type: general
      – SubjectFull: KIDNEY physiology
        Type: general
      – SubjectFull: MACHINE learning
        Type: general
    Titles:
      – TitleFull: Comparative analysis of feature selection techniques for COVID-19 dataset.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Mohtasham, Farideh
      – PersonEntity:
          Name:
            NameFull: Pourhoseingholi, MohamadAmin
      – PersonEntity:
          Name:
            NameFull: Hashemi Nazari, Seyed Saeed
      – PersonEntity:
          Name:
            NameFull: Kavousi, Kaveh
      – PersonEntity:
          Name:
            NameFull: Zali, Mohammad Reza
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 15
              M: 08
              Text: 8/15/2024
              Type: published
              Y: 2024
          Identifiers:
            – Type: issn-print
              Value: 20452322
          Numbering:
            – Type: volume
              Value: 14
            – Type: issue
              Value: 1
          Titles:
            – TitleFull: Scientific Reports
              Type: main
ResultId 1