MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

Bibliographic Details
Title: MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Authors: Fleming, Scott L., Lozano, Alejandro, Haberkorn, William J., Jindal, Jenelle A., Reis, Eduardo P., Thapa, Rahul, Blankemeier, Louis, Genkins, Julian Z., Steinberg, Ethan, Nayak, Ashwin, Patel, Birju S., Chiang, Chia-Chun, Callahan, Alison, Huo, Zepeng, Gatidis, Sergios, Adams, Scott J., Fayanju, Oluseyi, Shah, Shreya J., Savage, Thomas, Goh, Ethan, Chaudhari, Akshay S., Aghaeepour, Nima, Sharp, Christopher, Pfeffer, Michael A., Liang, Percy, Chen, Jonathan H., Morse, Keith E., Brunskill, Emma P., Fries, Jason A., Shah, Nigam H.
Publication Year: 2023
Collection: Computer Science
Subject Terms: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
More Details: The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and an 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. We make MedAlign available under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences.
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2308.14089
Accession Number: edsarx.2308.14089
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/2308.14089
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20230827&spage=&pages=&title=MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records&atitle=MedAlign%3A%20A%20Clinician-Generated%20Dataset%20for%20Instruction%20Following%20with%20Electronic%20Medical%20Records&aulast=Fleming%2C%20Scott%20L.&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.2308.14089
RelevancyScore: 1065
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 1065.24621582031
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Fleming%2C+Scott+L%2E%22">Fleming, Scott L.</searchLink><br /><searchLink fieldCode="AR" term="%22Lozano%2C+Alejandro%22">Lozano, Alejandro</searchLink><br /><searchLink fieldCode="AR" term="%22Haberkorn%2C+William+J%2E%22">Haberkorn, William J.</searchLink><br /><searchLink fieldCode="AR" term="%22Jindal%2C+Jenelle+A%2E%22">Jindal, Jenelle A.</searchLink><br /><searchLink fieldCode="AR" term="%22Reis%2C+Eduardo+P%2E%22">Reis, Eduardo P.</searchLink><br /><searchLink fieldCode="AR" term="%22Thapa%2C+Rahul%22">Thapa, Rahul</searchLink><br /><searchLink fieldCode="AR" term="%22Blankemeier%2C+Louis%22">Blankemeier, Louis</searchLink><br /><searchLink fieldCode="AR" term="%22Genkins%2C+Julian+Z%2E%22">Genkins, Julian Z.</searchLink><br /><searchLink fieldCode="AR" term="%22Steinberg%2C+Ethan%22">Steinberg, Ethan</searchLink><br /><searchLink fieldCode="AR" term="%22Nayak%2C+Ashwin%22">Nayak, Ashwin</searchLink><br /><searchLink fieldCode="AR" term="%22Patel%2C+Birju+S%2E%22">Patel, Birju S.</searchLink><br /><searchLink fieldCode="AR" term="%22Chiang%2C+Chia-Chun%22">Chiang, Chia-Chun</searchLink><br /><searchLink fieldCode="AR" term="%22Callahan%2C+Alison%22">Callahan, Alison</searchLink><br /><searchLink fieldCode="AR" term="%22Huo%2C+Zepeng%22">Huo, Zepeng</searchLink><br /><searchLink fieldCode="AR" term="%22Gatidis%2C+Sergios%22">Gatidis, Sergios</searchLink><br /><searchLink fieldCode="AR" term="%22Adams%2C+Scott+J%2E%22">Adams, Scott J.</searchLink><br /><searchLink fieldCode="AR" term="%22Fayanju%2C+Oluseyi%22">Fayanju, Oluseyi</searchLink><br /><searchLink fieldCode="AR" term="%22Shah%2C+Shreya+J%2E%22">Shah, Shreya J.</searchLink><br /><searchLink fieldCode="AR" term="%22Savage%2C+Thomas%22">Savage, Thomas</searchLink><br /><searchLink fieldCode="AR" term="%22Goh%2C+Ethan%22">Goh, Ethan</searchLink><br /><searchLink fieldCode="AR" term="%22Chaudhari%2C+Akshay+S%2E%22">Chaudhari, Akshay S.</searchLink><br /><searchLink fieldCode="AR" term="%22Aghaeepour%2C+Nima%22">Aghaeepour, Nima</searchLink><br /><searchLink fieldCode="AR" term="%22Sharp%2C+Christopher%22">Sharp, Christopher</searchLink><br /><searchLink fieldCode="AR" term="%22Pfeffer%2C+Michael+A%2E%22">Pfeffer, Michael A.</searchLink><br /><searchLink fieldCode="AR" term="%22Liang%2C+Percy%22">Liang, Percy</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Jonathan+H%2E%22">Chen, Jonathan H.</searchLink><br /><searchLink fieldCode="AR" term="%22Morse%2C+Keith+E%2E%22">Morse, Keith E.</searchLink><br /><searchLink fieldCode="AR" term="%22Brunskill%2C+Emma+P%2E%22">Brunskill, Emma P.</searchLink><br /><searchLink fieldCode="AR" term="%22Fries%2C+Jason+A%2E%22">Fries, Jason A.</searchLink><br /><searchLink fieldCode="AR" term="%22Shah%2C+Nigam+H%2E%22">Shah, Nigam H.</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2023
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Computer Science
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Computation+and+Language%22">Computer Science - Computation and Language</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Artificial+Intelligence%22">Computer Science - Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and an 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. We make MedAlign available under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2308.14089" linkWindow="_blank">http://arxiv.org/abs/2308.14089</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.2308.14089
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2308.14089
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Computer Science - Computation and Language
        Type: general
      – SubjectFull: Computer Science - Artificial Intelligence
        Type: general
      – SubjectFull: Computer Science - Machine Learning
        Type: general
    Titles:
      – TitleFull: MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Fleming, Scott L.
      – PersonEntity:
          Name:
            NameFull: Lozano, Alejandro
      – PersonEntity:
          Name:
            NameFull: Haberkorn, William J.
      – PersonEntity:
          Name:
            NameFull: Jindal, Jenelle A.
      – PersonEntity:
          Name:
            NameFull: Reis, Eduardo P.
      – PersonEntity:
          Name:
            NameFull: Thapa, Rahul
      – PersonEntity:
          Name:
            NameFull: Blankemeier, Louis
      – PersonEntity:
          Name:
            NameFull: Genkins, Julian Z.
      – PersonEntity:
          Name:
            NameFull: Steinberg, Ethan
      – PersonEntity:
          Name:
            NameFull: Nayak, Ashwin
      – PersonEntity:
          Name:
            NameFull: Patel, Birju S.
      – PersonEntity:
          Name:
            NameFull: Chiang, Chia-Chun
      – PersonEntity:
          Name:
            NameFull: Callahan, Alison
      – PersonEntity:
          Name:
            NameFull: Huo, Zepeng
      – PersonEntity:
          Name:
            NameFull: Gatidis, Sergios
      – PersonEntity:
          Name:
            NameFull: Adams, Scott J.
      – PersonEntity:
          Name:
            NameFull: Fayanju, Oluseyi
      – PersonEntity:
          Name:
            NameFull: Shah, Shreya J.
      – PersonEntity:
          Name:
            NameFull: Savage, Thomas
      – PersonEntity:
          Name:
            NameFull: Goh, Ethan
      – PersonEntity:
          Name:
            NameFull: Chaudhari, Akshay S.
      – PersonEntity:
          Name:
            NameFull: Aghaeepour, Nima
      – PersonEntity:
          Name:
            NameFull: Sharp, Christopher
      – PersonEntity:
          Name:
            NameFull: Pfeffer, Michael A.
      – PersonEntity:
          Name:
            NameFull: Liang, Percy
      – PersonEntity:
          Name:
            NameFull: Chen, Jonathan H.
      – PersonEntity:
          Name:
            NameFull: Morse, Keith E.
      – PersonEntity:
          Name:
            NameFull: Brunskill, Emma P.
      – PersonEntity:
          Name:
            NameFull: Fries, Jason A.
      – PersonEntity:
          Name:
            NameFull: Shah, Nigam H.
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 27
              M: 08
              Type: published
              Y: 2023
ResultId 1