Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation

Bibliographic Details
Title: Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation
Authors: Lee, Changsun, Park, Sangjoon, Shin, Cheong-Il, Choi, Woo Hee, Park, Hyun Jeong, Lee, Jeong Eun, Ye, Jong Chul
Publication Year: 2024
Collection: Computer Science
Subject Terms: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
More Details: Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric features. Such process introduces overly correlated representations along the z-axis that neglect slice-specific clinical details, particularly for 3D medical images where adjacent slices have low redundancy. To address this limitation, we introduce MS-VLM that mimic radiologists' workflow in 3D medical image interpretation. Specifically, radiologists analyze 3D medical images by examining individual slices sequentially and synthesizing information across slices and views. Likewise, MS-VLM leverages self-supervised 2D transformer encoders to learn a volumetric representation that capture inter-slice dependencies from a sequence of slice-specific features. Unbound by sub-volumetric patchification, MS-VLM is capable of obtaining useful volumetric representations from 3D medical images with any slice length and from multiple images acquired from different planes and phases. We evaluate MS-VLM on publicly available chest CT dataset CT-RATE and in-house rectal MRI dataset. In both scenarios, MS-VLM surpasses existing methods in radiology report generation, producing more coherent and clinically relevant reports. These findings highlight the potential of MS-VLM to advance 3D medical image interpretation and improve the robustness of medical VLMs.
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2412.13558
Accession Number: edsarx.2412.13558
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/2412.13558
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20241218&spage=&pages=&title=Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation&atitle=Read%20Like%20a%20Radiologist%3A%20Efficient%20Vision-Language%20Model%20for%203D%20Medical%20Imaging%20Interpretation&aulast=Lee%2C%20Changsun&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.2412.13558
RelevancyScore: 1128
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 1128.04553222656
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Lee%2C+Changsun%22">Lee, Changsun</searchLink><br /><searchLink fieldCode="AR" term="%22Park%2C+Sangjoon%22">Park, Sangjoon</searchLink><br /><searchLink fieldCode="AR" term="%22Shin%2C+Cheong-Il%22">Shin, Cheong-Il</searchLink><br /><searchLink fieldCode="AR" term="%22Choi%2C+Woo+Hee%22">Choi, Woo Hee</searchLink><br /><searchLink fieldCode="AR" term="%22Park%2C+Hyun+Jeong%22">Park, Hyun Jeong</searchLink><br /><searchLink fieldCode="AR" term="%22Lee%2C+Jeong+Eun%22">Lee, Jeong Eun</searchLink><br /><searchLink fieldCode="AR" term="%22Ye%2C+Jong+Chul%22">Ye, Jong Chul</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2024
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Computer Science
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Electrical+Engineering+and+Systems+Science+-+Image+and+Video+Processing%22">Electrical Engineering and Systems Science - Image and Video Processing</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Computation+and+Language%22">Computer Science - Computation and Language</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Computer+Vision+and+Pattern+Recognition%22">Computer Science - Computer Vision and Pattern Recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric features. Such process introduces overly correlated representations along the z-axis that neglect slice-specific clinical details, particularly for 3D medical images where adjacent slices have low redundancy. To address this limitation, we introduce MS-VLM that mimic radiologists' workflow in 3D medical image interpretation. Specifically, radiologists analyze 3D medical images by examining individual slices sequentially and synthesizing information across slices and views. Likewise, MS-VLM leverages self-supervised 2D transformer encoders to learn a volumetric representation that capture inter-slice dependencies from a sequence of slice-specific features. Unbound by sub-volumetric patchification, MS-VLM is capable of obtaining useful volumetric representations from 3D medical images with any slice length and from multiple images acquired from different planes and phases. We evaluate MS-VLM on publicly available chest CT dataset CT-RATE and in-house rectal MRI dataset. In both scenarios, MS-VLM surpasses existing methods in radiology report generation, producing more coherent and clinically relevant reports. These findings highlight the potential of MS-VLM to advance 3D medical image interpretation and improve the robustness of medical VLMs.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2412.13558" linkWindow="_blank">http://arxiv.org/abs/2412.13558</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.2412.13558
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2412.13558
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Electrical Engineering and Systems Science - Image and Video Processing
        Type: general
      – SubjectFull: Computer Science - Computation and Language
        Type: general
      – SubjectFull: Computer Science - Computer Vision and Pattern Recognition
        Type: general
      – SubjectFull: Computer Science - Machine Learning
        Type: general
    Titles:
      – TitleFull: Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Lee, Changsun
      – PersonEntity:
          Name:
            NameFull: Park, Sangjoon
      – PersonEntity:
          Name:
            NameFull: Shin, Cheong-Il
      – PersonEntity:
          Name:
            NameFull: Choi, Woo Hee
      – PersonEntity:
          Name:
            NameFull: Park, Hyun Jeong
      – PersonEntity:
          Name:
            NameFull: Lee, Jeong Eun
      – PersonEntity:
          Name:
            NameFull: Ye, Jong Chul
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 18
              M: 12
              Type: published
              Y: 2024
ResultId 1