MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Title: | MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records |
---|---|
Authors: | Fleming, Scott L., Lozano, Alejandro, Haberkorn, William J., Jindal, Jenelle A., Reis, Eduardo P., Thapa, Rahul, Blankemeier, Louis, Genkins, Julian Z., Steinberg, Ethan, Nayak, Ashwin, Patel, Birju S., Chiang, Chia-Chun, Callahan, Alison, Huo, Zepeng, Gatidis, Sergios, Adams, Scott J., Fayanju, Oluseyi, Shah, Shreya J., Savage, Thomas, Goh, Ethan, Chaudhari, Akshay S., Aghaeepour, Nima, Sharp, Christopher, Pfeffer, Michael A., Liang, Percy, Chen, Jonathan H., Morse, Keith E., Brunskill, Emma P., Fries, Jason A., Shah, Nigam H. |
Publication Year: | 2023 |
Collection: | Computer Science |
Subject Terms: | Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning |
More Details: | The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and an 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. We make MedAlign available under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences. |
Document Type: | Working Paper |
Access URL: | http://arxiv.org/abs/2308.14089 |
Accession Number: | edsarx.2308.14089 |
Database: | arXiv |
FullText | Text: Availability: 0 CustomLinks: – Url: http://arxiv.org/abs/2308.14089 Name: EDS - Arxiv Category: fullText Text: View this record from Arxiv MouseOverText: View this record from Arxiv – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20230827&spage=&pages=&title=MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records&atitle=MedAlign%3A%20A%20Clinician-Generated%20Dataset%20for%20Instruction%20Following%20with%20Electronic%20Medical%20Records&aulast=Fleming%2C%20Scott%20L.&id=DOI: Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries |
---|---|
Header | DbId: edsarx DbLabel: arXiv An: edsarx.2308.14089 RelevancyScore: 1065 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 1065.24621582031 |
IllustrationInfo | |
Items | – Name: Title Label: Title Group: Ti Data: MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Fleming%2C+Scott+L%2E%22">Fleming, Scott L.</searchLink><br /><searchLink fieldCode="AR" term="%22Lozano%2C+Alejandro%22">Lozano, Alejandro</searchLink><br /><searchLink fieldCode="AR" term="%22Haberkorn%2C+William+J%2E%22">Haberkorn, William J.</searchLink><br /><searchLink fieldCode="AR" term="%22Jindal%2C+Jenelle+A%2E%22">Jindal, Jenelle A.</searchLink><br /><searchLink fieldCode="AR" term="%22Reis%2C+Eduardo+P%2E%22">Reis, Eduardo P.</searchLink><br /><searchLink fieldCode="AR" term="%22Thapa%2C+Rahul%22">Thapa, Rahul</searchLink><br /><searchLink fieldCode="AR" term="%22Blankemeier%2C+Louis%22">Blankemeier, Louis</searchLink><br /><searchLink fieldCode="AR" term="%22Genkins%2C+Julian+Z%2E%22">Genkins, Julian Z.</searchLink><br /><searchLink fieldCode="AR" term="%22Steinberg%2C+Ethan%22">Steinberg, Ethan</searchLink><br /><searchLink fieldCode="AR" term="%22Nayak%2C+Ashwin%22">Nayak, Ashwin</searchLink><br /><searchLink fieldCode="AR" term="%22Patel%2C+Birju+S%2E%22">Patel, Birju S.</searchLink><br /><searchLink fieldCode="AR" term="%22Chiang%2C+Chia-Chun%22">Chiang, Chia-Chun</searchLink><br /><searchLink fieldCode="AR" term="%22Callahan%2C+Alison%22">Callahan, Alison</searchLink><br /><searchLink fieldCode="AR" term="%22Huo%2C+Zepeng%22">Huo, Zepeng</searchLink><br /><searchLink fieldCode="AR" term="%22Gatidis%2C+Sergios%22">Gatidis, Sergios</searchLink><br /><searchLink fieldCode="AR" term="%22Adams%2C+Scott+J%2E%22">Adams, Scott J.</searchLink><br /><searchLink fieldCode="AR" term="%22Fayanju%2C+Oluseyi%22">Fayanju, Oluseyi</searchLink><br /><searchLink fieldCode="AR" term="%22Shah%2C+Shreya+J%2E%22">Shah, Shreya J.</searchLink><br /><searchLink fieldCode="AR" term="%22Savage%2C+Thomas%22">Savage, Thomas</searchLink><br /><searchLink fieldCode="AR" term="%22Goh%2C+Ethan%22">Goh, Ethan</searchLink><br /><searchLink fieldCode="AR" term="%22Chaudhari%2C+Akshay+S%2E%22">Chaudhari, Akshay S.</searchLink><br /><searchLink fieldCode="AR" term="%22Aghaeepour%2C+Nima%22">Aghaeepour, Nima</searchLink><br /><searchLink fieldCode="AR" term="%22Sharp%2C+Christopher%22">Sharp, Christopher</searchLink><br /><searchLink fieldCode="AR" term="%22Pfeffer%2C+Michael+A%2E%22">Pfeffer, Michael A.</searchLink><br /><searchLink fieldCode="AR" term="%22Liang%2C+Percy%22">Liang, Percy</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Jonathan+H%2E%22">Chen, Jonathan H.</searchLink><br /><searchLink fieldCode="AR" term="%22Morse%2C+Keith+E%2E%22">Morse, Keith E.</searchLink><br /><searchLink fieldCode="AR" term="%22Brunskill%2C+Emma+P%2E%22">Brunskill, Emma P.</searchLink><br /><searchLink fieldCode="AR" term="%22Fries%2C+Jason+A%2E%22">Fries, Jason A.</searchLink><br /><searchLink fieldCode="AR" term="%22Shah%2C+Nigam+H%2E%22">Shah, Nigam H.</searchLink> – Name: DatePubCY Label: Publication Year Group: Date Data: 2023 – Name: Subset Label: Collection Group: HoldingsInfo Data: Computer Science – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Computation+and+Language%22">Computer Science - Computation and Language</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Artificial+Intelligence%22">Computer Science - Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink> – Name: Abstract Label: Description Group: Ab Data: The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and an 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. We make MedAlign available under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences. – Name: TypeDocument Label: Document Type Group: TypDoc Data: Working Paper – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2308.14089" linkWindow="_blank">http://arxiv.org/abs/2308.14089</link> – Name: AN Label: Accession Number Group: ID Data: edsarx.2308.14089 |
PLink | https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2308.14089 |
RecordInfo | BibRecord: BibEntity: Subjects: – SubjectFull: Computer Science - Computation and Language Type: general – SubjectFull: Computer Science - Artificial Intelligence Type: general – SubjectFull: Computer Science - Machine Learning Type: general Titles: – TitleFull: MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Fleming, Scott L. – PersonEntity: Name: NameFull: Lozano, Alejandro – PersonEntity: Name: NameFull: Haberkorn, William J. – PersonEntity: Name: NameFull: Jindal, Jenelle A. – PersonEntity: Name: NameFull: Reis, Eduardo P. – PersonEntity: Name: NameFull: Thapa, Rahul – PersonEntity: Name: NameFull: Blankemeier, Louis – PersonEntity: Name: NameFull: Genkins, Julian Z. – PersonEntity: Name: NameFull: Steinberg, Ethan – PersonEntity: Name: NameFull: Nayak, Ashwin – PersonEntity: Name: NameFull: Patel, Birju S. – PersonEntity: Name: NameFull: Chiang, Chia-Chun – PersonEntity: Name: NameFull: Callahan, Alison – PersonEntity: Name: NameFull: Huo, Zepeng – PersonEntity: Name: NameFull: Gatidis, Sergios – PersonEntity: Name: NameFull: Adams, Scott J. – PersonEntity: Name: NameFull: Fayanju, Oluseyi – PersonEntity: Name: NameFull: Shah, Shreya J. – PersonEntity: Name: NameFull: Savage, Thomas – PersonEntity: Name: NameFull: Goh, Ethan – PersonEntity: Name: NameFull: Chaudhari, Akshay S. – PersonEntity: Name: NameFull: Aghaeepour, Nima – PersonEntity: Name: NameFull: Sharp, Christopher – PersonEntity: Name: NameFull: Pfeffer, Michael A. – PersonEntity: Name: NameFull: Liang, Percy – PersonEntity: Name: NameFull: Chen, Jonathan H. – PersonEntity: Name: NameFull: Morse, Keith E. – PersonEntity: Name: NameFull: Brunskill, Emma P. – PersonEntity: Name: NameFull: Fries, Jason A. – PersonEntity: Name: NameFull: Shah, Nigam H. IsPartOfRelationships: – BibEntity: Dates: – D: 27 M: 08 Type: published Y: 2023 |
ResultId | 1 |