GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

Bibliographic Details
Title: GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
Authors: Li, Tianbin, Su, Yanzhou, Li, Wei, Fu, Bin, Chen, Zhe, Huang, Ziyan, Wang, Guoan, Ma, Chenglong, Chen, Ying, Hu, Ming, Li, Yanjun, Chen, Pengcheng, Hu, Xiaowei, Deng, Zhongying, Ji, Yuanfeng, Ye, Jin, Qiao, Yu, He, Junjun
Publication Year: 2024
Collection: Computer Science
Subject Terms: Computer Science - Computer Vision and Pattern Recognition
More Details: Despite significant advancements in general AI, its effectiveness in the medical domain is limited by the lack of specialized medical knowledge. To address this, we formulate GMAI-VL-5.5M, a multimodal medical dataset created by converting hundreds of specialized medical datasets with various annotations into high-quality image-text pairs. This dataset offers comprehensive task coverage, diverse modalities, and rich image-text data. Building upon this dataset, we develop GMAI-VL, a general medical vision-language model, with a three-stage training strategy that enhances the integration of visual and textual information. This approach significantly improves the model's ability to process multimodal data, supporting accurate diagnoses and clinical decision-making. Experiments show that GMAI-VL achieves state-of-the-art performance across various multimodal medical tasks, including visual question answering and medical image diagnosis.
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2411.14522
Accession Number: edsarx.2411.14522
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/2411.14522
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20241121&spage=&pages=&title=GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI&atitle=GMAI-VL%20%26%20GMAI-VL-5.5M%3A%20A%20Large%20Vision-Language%20Model%20and%20A%20Comprehensive%20Multimodal%20Dataset%20Towards%20General%20Medical%20AI&aulast=Li%2C%20Tianbin&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.2411.14522
RelevancyScore: 1128
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 1128.03259277344
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Li%2C+Tianbin%22">Li, Tianbin</searchLink><br /><searchLink fieldCode="AR" term="%22Su%2C+Yanzhou%22">Su, Yanzhou</searchLink><br /><searchLink fieldCode="AR" term="%22Li%2C+Wei%22">Li, Wei</searchLink><br /><searchLink fieldCode="AR" term="%22Fu%2C+Bin%22">Fu, Bin</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Zhe%22">Chen, Zhe</searchLink><br /><searchLink fieldCode="AR" term="%22Huang%2C+Ziyan%22">Huang, Ziyan</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Guoan%22">Wang, Guoan</searchLink><br /><searchLink fieldCode="AR" term="%22Ma%2C+Chenglong%22">Ma, Chenglong</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Ying%22">Chen, Ying</searchLink><br /><searchLink fieldCode="AR" term="%22Hu%2C+Ming%22">Hu, Ming</searchLink><br /><searchLink fieldCode="AR" term="%22Li%2C+Yanjun%22">Li, Yanjun</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Pengcheng%22">Chen, Pengcheng</searchLink><br /><searchLink fieldCode="AR" term="%22Hu%2C+Xiaowei%22">Hu, Xiaowei</searchLink><br /><searchLink fieldCode="AR" term="%22Deng%2C+Zhongying%22">Deng, Zhongying</searchLink><br /><searchLink fieldCode="AR" term="%22Ji%2C+Yuanfeng%22">Ji, Yuanfeng</searchLink><br /><searchLink fieldCode="AR" term="%22Ye%2C+Jin%22">Ye, Jin</searchLink><br /><searchLink fieldCode="AR" term="%22Qiao%2C+Yu%22">Qiao, Yu</searchLink><br /><searchLink fieldCode="AR" term="%22He%2C+Junjun%22">He, Junjun</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2024
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Computer Science
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Computer+Vision+and+Pattern+Recognition%22">Computer Science - Computer Vision and Pattern Recognition</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Despite significant advancements in general AI, its effectiveness in the medical domain is limited by the lack of specialized medical knowledge. To address this, we formulate GMAI-VL-5.5M, a multimodal medical dataset created by converting hundreds of specialized medical datasets with various annotations into high-quality image-text pairs. This dataset offers comprehensive task coverage, diverse modalities, and rich image-text data. Building upon this dataset, we develop GMAI-VL, a general medical vision-language model, with a three-stage training strategy that enhances the integration of visual and textual information. This approach significantly improves the model's ability to process multimodal data, supporting accurate diagnoses and clinical decision-making. Experiments show that GMAI-VL achieves state-of-the-art performance across various multimodal medical tasks, including visual question answering and medical image diagnosis.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2411.14522" linkWindow="_blank">http://arxiv.org/abs/2411.14522</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.2411.14522
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2411.14522
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Computer Science - Computer Vision and Pattern Recognition
        Type: general
    Titles:
      – TitleFull: GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Li, Tianbin
      – PersonEntity:
          Name:
            NameFull: Su, Yanzhou
      – PersonEntity:
          Name:
            NameFull: Li, Wei
      – PersonEntity:
          Name:
            NameFull: Fu, Bin
      – PersonEntity:
          Name:
            NameFull: Chen, Zhe
      – PersonEntity:
          Name:
            NameFull: Huang, Ziyan
      – PersonEntity:
          Name:
            NameFull: Wang, Guoan
      – PersonEntity:
          Name:
            NameFull: Ma, Chenglong
      – PersonEntity:
          Name:
            NameFull: Chen, Ying
      – PersonEntity:
          Name:
            NameFull: Hu, Ming
      – PersonEntity:
          Name:
            NameFull: Li, Yanjun
      – PersonEntity:
          Name:
            NameFull: Chen, Pengcheng
      – PersonEntity:
          Name:
            NameFull: Hu, Xiaowei
      – PersonEntity:
          Name:
            NameFull: Deng, Zhongying
      – PersonEntity:
          Name:
            NameFull: Ji, Yuanfeng
      – PersonEntity:
          Name:
            NameFull: Ye, Jin
      – PersonEntity:
          Name:
            NameFull: Qiao, Yu
      – PersonEntity:
          Name:
            NameFull: He, Junjun
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 21
              M: 11
              Type: published
              Y: 2024
ResultId 1