Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation

Bibliographic Details
Title:	Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation
Authors:	Lee, Changsun, Park, Sangjoon, Shin, Cheong-Il, Choi, Woo Hee, Park, Hyun Jeong, Lee, Jeong Eun, Ye, Jong Chul
Publication Year:	2024
Collection:	Computer Science
Subject Terms:	Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
More Details:	Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric features. Such process introduces overly correlated representations along the z-axis that neglect slice-specific clinical details, particularly for 3D medical images where adjacent slices have low redundancy. To address this limitation, we introduce MS-VLM that mimic radiologists' workflow in 3D medical image interpretation. Specifically, radiologists analyze 3D medical images by examining individual slices sequentially and synthesizing information across slices and views. Likewise, MS-VLM leverages self-supervised 2D transformer encoders to learn a volumetric representation that capture inter-slice dependencies from a sequence of slice-specific features. Unbound by sub-volumetric patchification, MS-VLM is capable of obtaining useful volumetric representations from 3D medical images with any slice length and from multiple images acquired from different planes and phases. We evaluate MS-VLM on publicly available chest CT dataset CT-RATE and in-house rectal MRI dataset. In both scenarios, MS-VLM surpasses existing methods in radiology report generation, producing more coherent and clinically relevant reports. These findings highlight the potential of MS-VLM to advance 3D medical image interpretation and improve the robustness of medical VLMs.
Document Type:	Working Paper
Access URL:	http://arxiv.org/abs/2412.13558
Accession Number:	edsarx.2412.13558
Database:	arXiv

FullText	Text: Availability: 0 CustomLinks: – Url: http://arxiv.org/abs/2412.13558 Name: EDS - Arxiv Category: fullText Text: View this record from Arxiv MouseOverText: View this record from Arxiv – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20241218&spage=&pages=&title=Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation&atitle=Read%20Like%20a%20Radiologist%3A%20Efficient%20Vision-Language%20Model%20for%203D%20Medical%20Imaging%20Interpretation&aulast=Lee%2C%20Changsun&id=DOI: Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries
Header	DbId: edsarx DbLabel: arXiv An: edsarx.2412.13558 RelevancyScore: 1128 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 1128.04553222656
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Lee%2C+Changsun%22">Lee, Changsun</searchLink><br /><searchLink fieldCode="AR" term="%22Park%2C+Sangjoon%22">Park, Sangjoon</searchLink><br /><searchLink fieldCode="AR" term="%22Shin%2C+Cheong-Il%22">Shin, Cheong-Il</searchLink><br /><searchLink fieldCode="AR" term="%22Choi%2C+Woo+Hee%22">Choi, Woo Hee</searchLink><br /><searchLink fieldCode="AR" term="%22Park%2C+Hyun+Jeong%22">Park, Hyun Jeong</searchLink><br /><searchLink fieldCode="AR" term="%22Lee%2C+Jeong+Eun%22">Lee, Jeong Eun</searchLink><br /><searchLink fieldCode="AR" term="%22Ye%2C+Jong+Chul%22">Ye, Jong Chul</searchLink> – Name: DatePubCY Label: Publication Year Group: Date Data: 2024 – Name: Subset Label: Collection Group: HoldingsInfo Data: Computer Science – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Electrical+Engineering+and+Systems+Science+-+Image+and+Video+Processing%22">Electrical Engineering and Systems Science - Image and Video Processing</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Computation+and+Language%22">Computer Science - Computation and Language</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Computer+Vision+and+Pattern+Recognition%22">Computer Science - Computer Vision and Pattern Recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink> – Name: Abstract Label: Description Group: Ab Data: Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric features. Such process introduces overly correlated representations along the z-axis that neglect slice-specific clinical details, particularly for 3D medical images where adjacent slices have low redundancy. To address this limitation, we introduce MS-VLM that mimic radiologists' workflow in 3D medical image interpretation. Specifically, radiologists analyze 3D medical images by examining individual slices sequentially and synthesizing information across slices and views. Likewise, MS-VLM leverages self-supervised 2D transformer encoders to learn a volumetric representation that capture inter-slice dependencies from a sequence of slice-specific features. Unbound by sub-volumetric patchification, MS-VLM is capable of obtaining useful volumetric representations from 3D medical images with any slice length and from multiple images acquired from different planes and phases. We evaluate MS-VLM on publicly available chest CT dataset CT-RATE and in-house rectal MRI dataset. In both scenarios, MS-VLM surpasses existing methods in radiology report generation, producing more coherent and clinically relevant reports. These findings highlight the potential of MS-VLM to advance 3D medical image interpretation and improve the robustness of medical VLMs. – Name: TypeDocument Label: Document Type Group: TypDoc Data: Working Paper – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2412.13558" linkWindow="_blank">http://arxiv.org/abs/2412.13558</link> – Name: AN Label: Accession Number Group: ID Data: edsarx.2412.13558
PLink	https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2412.13558
RecordInfo	BibRecord: BibEntity: Subjects: – SubjectFull: Electrical Engineering and Systems Science - Image and Video Processing Type: general – SubjectFull: Computer Science - Computation and Language Type: general – SubjectFull: Computer Science - Computer Vision and Pattern Recognition Type: general – SubjectFull: Computer Science - Machine Learning Type: general Titles: – TitleFull: Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Lee, Changsun – PersonEntity: Name: NameFull: Park, Sangjoon – PersonEntity: Name: NameFull: Shin, Cheong-Il – PersonEntity: Name: NameFull: Choi, Woo Hee – PersonEntity: Name: NameFull: Park, Hyun Jeong – PersonEntity: Name: NameFull: Lee, Jeong Eun – PersonEntity: Name: NameFull: Ye, Jong Chul IsPartOfRelationships: – BibEntity: Dates: – D: 18 M: 12 Type: published Y: 2024
ResultId	1