GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

Bibliographic Details
Title:	GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
Authors:	Li, Tianbin, Su, Yanzhou, Li, Wei, Fu, Bin, Chen, Zhe, Huang, Ziyan, Wang, Guoan, Ma, Chenglong, Chen, Ying, Hu, Ming, Li, Yanjun, Chen, Pengcheng, Hu, Xiaowei, Deng, Zhongying, Ji, Yuanfeng, Ye, Jin, Qiao, Yu, He, Junjun
Publication Year:	2024
Collection:	Computer Science
Subject Terms:	Computer Science - Computer Vision and Pattern Recognition
More Details:	Despite significant advancements in general AI, its effectiveness in the medical domain is limited by the lack of specialized medical knowledge. To address this, we formulate GMAI-VL-5.5M, a multimodal medical dataset created by converting hundreds of specialized medical datasets with various annotations into high-quality image-text pairs. This dataset offers comprehensive task coverage, diverse modalities, and rich image-text data. Building upon this dataset, we develop GMAI-VL, a general medical vision-language model, with a three-stage training strategy that enhances the integration of visual and textual information. This approach significantly improves the model's ability to process multimodal data, supporting accurate diagnoses and clinical decision-making. Experiments show that GMAI-VL achieves state-of-the-art performance across various multimodal medical tasks, including visual question answering and medical image diagnosis.
Document Type:	Working Paper
Access URL:	http://arxiv.org/abs/2411.14522
Accession Number:	edsarx.2411.14522
Database:	arXiv

FullText	Text: Availability: 0 CustomLinks: – Url: http://arxiv.org/abs/2411.14522 Name: EDS - Arxiv Category: fullText Text: View this record from Arxiv MouseOverText: View this record from Arxiv – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20241121&spage=&pages=&title=GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI&atitle=GMAI-VL%20%26%20GMAI-VL-5.5M%3A%20A%20Large%20Vision-Language%20Model%20and%20A%20Comprehensive%20Multimodal%20Dataset%20Towards%20General%20Medical%20AI&aulast=Li%2C%20Tianbin&id=DOI: Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries
Header	DbId: edsarx DbLabel: arXiv An: edsarx.2411.14522 RelevancyScore: 1128 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 1128.03259277344
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Li%2C+Tianbin%22">Li, Tianbin</searchLink><br /><searchLink fieldCode="AR" term="%22Su%2C+Yanzhou%22">Su, Yanzhou</searchLink><br /><searchLink fieldCode="AR" term="%22Li%2C+Wei%22">Li, Wei</searchLink><br /><searchLink fieldCode="AR" term="%22Fu%2C+Bin%22">Fu, Bin</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Zhe%22">Chen, Zhe</searchLink><br /><searchLink fieldCode="AR" term="%22Huang%2C+Ziyan%22">Huang, Ziyan</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Guoan%22">Wang, Guoan</searchLink><br /><searchLink fieldCode="AR" term="%22Ma%2C+Chenglong%22">Ma, Chenglong</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Ying%22">Chen, Ying</searchLink><br /><searchLink fieldCode="AR" term="%22Hu%2C+Ming%22">Hu, Ming</searchLink><br /><searchLink fieldCode="AR" term="%22Li%2C+Yanjun%22">Li, Yanjun</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Pengcheng%22">Chen, Pengcheng</searchLink><br /><searchLink fieldCode="AR" term="%22Hu%2C+Xiaowei%22">Hu, Xiaowei</searchLink><br /><searchLink fieldCode="AR" term="%22Deng%2C+Zhongying%22">Deng, Zhongying</searchLink><br /><searchLink fieldCode="AR" term="%22Ji%2C+Yuanfeng%22">Ji, Yuanfeng</searchLink><br /><searchLink fieldCode="AR" term="%22Ye%2C+Jin%22">Ye, Jin</searchLink><br /><searchLink fieldCode="AR" term="%22Qiao%2C+Yu%22">Qiao, Yu</searchLink><br /><searchLink fieldCode="AR" term="%22He%2C+Junjun%22">He, Junjun</searchLink> – Name: DatePubCY Label: Publication Year Group: Date Data: 2024 – Name: Subset Label: Collection Group: HoldingsInfo Data: Computer Science – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Computer+Vision+and+Pattern+Recognition%22">Computer Science - Computer Vision and Pattern Recognition</searchLink> – Name: Abstract Label: Description Group: Ab Data: Despite significant advancements in general AI, its effectiveness in the medical domain is limited by the lack of specialized medical knowledge. To address this, we formulate GMAI-VL-5.5M, a multimodal medical dataset created by converting hundreds of specialized medical datasets with various annotations into high-quality image-text pairs. This dataset offers comprehensive task coverage, diverse modalities, and rich image-text data. Building upon this dataset, we develop GMAI-VL, a general medical vision-language model, with a three-stage training strategy that enhances the integration of visual and textual information. This approach significantly improves the model's ability to process multimodal data, supporting accurate diagnoses and clinical decision-making. Experiments show that GMAI-VL achieves state-of-the-art performance across various multimodal medical tasks, including visual question answering and medical image diagnosis. – Name: TypeDocument Label: Document Type Group: TypDoc Data: Working Paper – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2411.14522" linkWindow="_blank">http://arxiv.org/abs/2411.14522</link> – Name: AN Label: Accession Number Group: ID Data: edsarx.2411.14522
PLink	https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2411.14522
RecordInfo	BibRecord: BibEntity: Subjects: – SubjectFull: Computer Science - Computer Vision and Pattern Recognition Type: general Titles: – TitleFull: GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Li, Tianbin – PersonEntity: Name: NameFull: Su, Yanzhou – PersonEntity: Name: NameFull: Li, Wei – PersonEntity: Name: NameFull: Fu, Bin – PersonEntity: Name: NameFull: Chen, Zhe – PersonEntity: Name: NameFull: Huang, Ziyan – PersonEntity: Name: NameFull: Wang, Guoan – PersonEntity: Name: NameFull: Ma, Chenglong – PersonEntity: Name: NameFull: Chen, Ying – PersonEntity: Name: NameFull: Hu, Ming – PersonEntity: Name: NameFull: Li, Yanjun – PersonEntity: Name: NameFull: Chen, Pengcheng – PersonEntity: Name: NameFull: Hu, Xiaowei – PersonEntity: Name: NameFull: Deng, Zhongying – PersonEntity: Name: NameFull: Ji, Yuanfeng – PersonEntity: Name: NameFull: Ye, Jin – PersonEntity: Name: NameFull: Qiao, Yu – PersonEntity: Name: NameFull: He, Junjun IsPartOfRelationships: – BibEntity: Dates: – D: 21 M: 11 Type: published Y: 2024
ResultId	1